I'm almost new to RStudio. Can someone help?
I have a Data Frame in named "Data2", where i would like to use few find & replace, also split a column into 2 columns.
I'm mentioning current structure and expected new structure.
Current Structure
ID Group
1 XXXXXX_MaleFR_YY
2 XXXXXX_FemaleFR_YY
3 XXXXXX_FemaleFR_YY
4 XXXXXX_UnknownNL_YY
... ...
500 XXXXXX_MaleNL_YY
Expected Structure
ID Gender Language
1 Male FR
2 Female FR
3 Female FR
4 Unknown NL
... ...
500 Male NL
Thoughts on my mind:
- Find & Replace "XXXXXX_" to ""
- Find & Replace "_YY" to ""
- Find & Replace "Male" to "Male,"
- Find & Replace "Female" to "Female,"
- Find & Replace "Unknown" to "NA,"
- Then the split the Group column into 2 (Gender, Language) and delimit the values using ",".
Hope someone can help.
valeri
2
This might be one way to go:
library(tidyverse)
data <- tibble(
ID = c(1, 2),
Group = c("XXX_MaleAB_YY", "XXX_FemaleCD_YY")
)
data2 <- data %>% separate(Group, c("remove_this1", "keep_this_and_split", "remove_this2"), sep = "_") %>%
select(-contains('remove')) %>%
mutate( gender = substr(keep_this_and_split, 1, nchar(keep_this_and_split) - 2),
country = substr(keep_this_and_split, nchar(keep_this_and_split)-1, nchar(keep_this_and_split)) ) %>%
select(-keep_this_and_split)
This would be another way to do it
library(tidyverse)
df <- data.frame(stringsAsFactors=FALSE,
ID = c(1, 2, 3, 4, 500),
Group = c("XXXXXX_MaleFR_YY", "XXXXXX_FemaleFR_YY",
"XXXXXX_FemaleFR_YY", "XXXXXX_UnknownNL_YY",
"XXXXXX_MaleNL_YY")
)
df %>%
transmute(ID = ID,
Gender = str_extract(Group, "(?<=_).+(?=[:upper:]{2}_)"),
Language = str_extract(Group, "[:upper:]{2}(?=_[:upper:]{2})"))
#> ID Gender Language
#> 1 1 Male FR
#> 2 2 Female FR
#> 3 3 Female FR
#> 4 4 Unknown NL
#> 5 500 Male NL
system
Closed
4
This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.