Adding a "0/1" factor based on another variable -- Doing a proper reprex

danbard · July 8, 2022, 6:09pm

Dear all,

The screenshot displays a spreadsheet with three variables.

I wish to add a fourth variable called "presence" filled with observations related to the column "segment".

For each of the observations in "segment" that start with the letter "c", the character "1" would be written in "presence".

On the other hand, "0" would be added for each observation in "segment" that starts with the letter "e".

As I am new to RStudio, I would appreciate your help to understand how to approach this task.

Thank you,

Best,
Dan

dvetsch75 · July 8, 2022, 6:34pm

dplyr::case_when would be a good choice for this. The way to use this is like so:

data %>% 
    mutate(
        newvar = case_when(
            x == y ~ 'result1',
            x != y ~ 'result2'
        )
    )

If you are able to provide a reprex of your data, we can help you dig into extracting the e and c characters.

danbard · July 9, 2022, 10:39am

Dear Derek,

Thank you very much.

I would really appreciate receiving more help.

This is the first time I tried to do a reprex.

Is the reprex below right?

I hope I got it right. If not, I am sorry. I am learning about this new thing.

Best,
Dan

setwd("~/exps_analyses/2022_exp_analyses/2022_7_durations_js_analyses/js_fr")
data <- read.csv("duration_js_tier_4.csv", header = T)
head(data)
#>               ï..filename  segment duration
#> 1 stim_mic-1561130818_277 cd1bsf12   24.565
#> 2 stim_mic-1561130818_277 cd2bsf12   16.880
#> 3 stim_mic-1561130818_277 cp1bsf12   51.714
#> 4 stim_mic-1561130818_277 ck2bsf12   77.509
#> 5 stim_mic-1561130818_277  cabsf12  356.750
#> 6 stim_mic-1561130827_267 cd1blf12  298.663

^{Created on 2022-07-09 by the reprex package (v2.0.1)}

Dissipation123 · July 9, 2022, 11:55am

Hello this is not quite a reprex as we can't reproduce your dataset based on reading a csv file that we don't have access too.

You can run dput(df) on your data frame to get code to reproduce your dataset.

As for extracting the c and e letters, substr() along with case_when can help you do that:


df<- structure(list(segment= c("event1a", "cevent2a", "cevent1a", "event1b")),row.names = c(NA, 4L), class = "data.frame")

library(dplyr)

df %>% 
mutate(
        newvar = case_when(
            substr(segment,1,1) == 'e' ~ 0, #first letter from index 1
            substr(segment,1,1) == 'c' ~ 1, #first letter from index 1
            TRUE ~ -1
        )
    )

danbard · July 9, 2022, 5:59pm

Dear Darshan,

Thank you so much for taking the time to reply to my post.

I am sorry that my reprex was not correct.

I watched a video about dput(), I hope this time I got it right. Here is the result:

structure(list(ï..filename = c("stim_mic-1561130818_277", "stim_mic-1561130818_277",
"stim_mic-1561130818_277", "stim_mic-1561130818_277", "stim_mic-1561130818_277"
), segment = c("cd1bsf12", "cd2bsf12", "cp1bsf12", "ck2bsf12",
"cabsf12"), duration = c(24.565, 16.88, 51.714, 77.509, 356.75
)), row.names = c(NA, 5L), class = "data.frame")

In the meantime, I am practicing the solution that you suggested.

Thank you.
Best,
Dan

andresrcs · July 9, 2022, 6:59pm

I want to round up the solution in the form of a proper reproducible example using the sample data (in a proper format) you have provided and adding an alternative solution to detect the starting letter. Please pay close attention to the way I'm posting the example, this would be a propper reprex for your question.

# Library calls for the used packages
library(dplyr)
library(stringr)

# Sample data in a copy/paste friendly format
sample_data <- structure(list(
    ï..filename = c(
    "stim_mic-1561130818_277",
    "stim_mic-1561130818_277",
    "stim_mic-1561130818_277",
    "stim_mic-1561130818_277",
    "stim_mic-1561130818_277"),
    segment = c("cd1bsf12", "cd2bsf12", "cp1bsf12", "ck2bsf12",
                "cabsf12"),
    duration = c(24.565, 16.88, 51.714, 77.509, 356.75)
), row.names = c(NA, 5L), class = "data.frame")

# Relevant code
sample_data %>% 
    mutate(presence = case_when(
        str_detect(segment, "^c") ~ 1,
        str_detect(segment, "^e") ~ 0
    ))
#>               ï..filename  segment duration presence
#> 1 stim_mic-1561130818_277 cd1bsf12   24.565        1
#> 2 stim_mic-1561130818_277 cd2bsf12   16.880        1
#> 3 stim_mic-1561130818_277 cp1bsf12   51.714        1
#> 4 stim_mic-1561130818_277 ck2bsf12   77.509        1
#> 5 stim_mic-1561130818_277  cabsf12  356.750        1

^{Created on 2022-07-09 by the reprex package (v2.0.1)}

danbard · July 10, 2022, 10:18am

Dear all,

Thank you for your generous help. I am learning a lot here.
As an exercise, I am reproducing the solution.

I accepted Andrés' reply as a solution, but this was indeed a collective effort and I am grateful to Derek and Darshan for their contribution.

# Reproducing Andrés' solution

# Libraries

library(dplyr)
library(stringr)

# Sample data

sample_data <- structure(list(
  ï..filename = c(
    "stim_mic-1561130818_277", 
    "stim_mic-1561130818_277",
    "stim_mic-1561130818_277", 
    "stim_mic-1561130818_277", 
    "stim_mic-1561130818_277"), 
    segment = c("cd1bsf12", "cd2bsf12", "cp1bsf12", "ck2bsf12",
               "cabsf12"), 
    duration = c(24.565, 16.88, 51.714, 77.509, 356.75)
), row.names = c(NA, 5L), class = "data.frame")

# Relevant code

sample_data %>% 
  mutate(presence = case_when(
    str_detect(segment, "^c") ~ 1,
    str_detect(segment, "^e") ~ 0
  ))
#>               ï..filename  segment duration presence
#> 1 stim_mic-1561130818_277 cd1bsf12   24.565        1
#> 2 stim_mic-1561130818_277 cd2bsf12   16.880        1
#> 3 stim_mic-1561130818_277 cp1bsf12   51.714        1
#> 4 stim_mic-1561130818_277 ck2bsf12   77.509        1
#> 5 stim_mic-1561130818_277  cabsf12  356.750        1

system · July 17, 2022, 10:19am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.