I'm almost new to RStudio. Can someone help?
I have a Data Frame in named "Data2", where i would like to use few find & replace, also split a column into 2 columns.
I'm mentioning current structure and expected new structure.
Current Structure
ID   Group
1     XXXXXX_MaleFR_YY
2     XXXXXX_FemaleFR_YY
3     XXXXXX_FemaleFR_YY
4     XXXXXX_UnknownNL_YY
...     ...
500  XXXXXX_MaleNL_YY
Expected Structure
ID   Gender         Language
1     Male             FR
2     Female         FR
3     Female         FR
4     Unknown      NL
...     ...
500  Male            NL
Thoughts on my mind:
- Find & Replace "XXXXXX_" to ""
- Find & Replace "_YY" to ""
- Find & Replace "Male" to "Male,"
- Find & Replace "Female" to "Female,"
- Find & Replace "Unknown" to "NA,"
- Then the split the Group column into 2 (Gender, Language) and delimit the values using ",".
Hope someone can help.
             
            
              
              
              
            
           
          
            
              
                valeri
                
              
              
                  
                  
              2
              
             
            
              This might be one way to go:
library(tidyverse)
data <- tibble(
	ID = c(1, 2),
	Group = c("XXX_MaleAB_YY", "XXX_FemaleCD_YY")
)
data2 <- data %>% separate(Group, c("remove_this1", "keep_this_and_split", "remove_this2"), sep = "_") %>% 
	select(-contains('remove')) %>% 
	mutate( gender = substr(keep_this_and_split, 1, nchar(keep_this_and_split) - 2),
					country = substr(keep_this_and_split, nchar(keep_this_and_split)-1, nchar(keep_this_and_split)) ) %>% 
	select(-keep_this_and_split)
             
            
              
              
              
            
           
          
            
            
              This would be another way to do it
library(tidyverse)
df <- data.frame(stringsAsFactors=FALSE,
                 ID = c(1, 2, 3, 4, 500),
                 Group = c("XXXXXX_MaleFR_YY", "XXXXXX_FemaleFR_YY",
                           "XXXXXX_FemaleFR_YY", "XXXXXX_UnknownNL_YY",
                           "XXXXXX_MaleNL_YY")
)
df %>% 
    transmute(ID = ID,
              Gender = str_extract(Group, "(?<=_).+(?=[:upper:]{2}_)"),
              Language = str_extract(Group, "[:upper:]{2}(?=_[:upper:]{2})"))
#>    ID  Gender Language
#> 1   1    Male       FR
#> 2   2  Female       FR
#> 3   3  Female       FR
#> 4   4 Unknown       NL
#> 5 500    Male       NL
             
            
              
              
              
            
           
          
            
              
                system
                
                  Closed 
              
              
                  
                  
              4
              
             
            
              This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.