March 23, 2022, 2:13pm
I need to create a third variable deleting the common strings between the first two variables:
Can you please help with an easy function?
a<- data.frame(V1= c("carlos rodrigo", "sarah", "patricia raquel", "leonardo"), V2= c("rodrigo", "patri", "raquel", "oscar leonardo"),
Result = c( "carlos", "patri sarah", "patricia", "oscar"))
Created on 2022-03-23 by the reprex package (v2.0.1)
There are probably many ways to do it. Here is one.
a<- data.frame(V1= c("carlos rodrigo", "sarah", "patricia raquel", "leonardo"), V2= c("rodrigo", "patri", "raquel", "oscar leonardo"),
Result = c( "carlos", "patri sarah", "patricia", "oscar"))
a %>% rowwise() %>%
mutate(common_content = list(intersect(x=str_split(V1," ",simplify = TRUE),
y=str_split(V2," ",simplify = TRUE))),
V1_unique = list(setdiff(str_split(V1," ",simplify = TRUE),common_content)),
V2_unique = list(setdiff(str_split(V2," ",simplify = TRUE),common_content)),
result_by_code = trimws(paste(V2_unique,V1_unique,collapse="")))
And here's another way:
a<- data.frame(V1= c("carlos rodrigo", "sarah", "patricia raquel", "leonardo"), V2= c("rodrigo", "patri", "raquel", "oscar leonardo"),
Result = c( "carlos", "patri sarah", "patricia", "oscar"))
symdiff <- function( x, y) { setdiff( union(x, y), intersect(x, y))}
a %>%
rowwise() %>%
V1sp=str_split(V1, "\\s+"),
V2sp=str_split(V2, "\\s+"),
Result2=str_c(symdiff(V1sp, V2sp), collapse = " ")
#> # A tibble: 4 × 6
#> # Rowwise:
#> V1 V2 Result V1sp V2sp Result2
#> <chr> <chr> <chr> <list> <list> <chr>
#> 1 carlos rodrigo rodrigo carlos <chr [2]> <chr [1]> carlos
#> 2 sarah patri patri sarah <chr [1]> <chr [1]> sarah patri
#> 3 patricia raquel raquel patricia <chr [2]> <chr [1]> patricia
#> 4 leonardo oscar leonardo oscar <chr [1]> <chr [2]> oscar
Created on 2022-03-23 by the reprex package (v2.0.1)
March 30, 2022, 9:38am
Thank you very much all people sent a solution. Do you know where I can start to learn UDFs? and how to apply that? many thanks, magnificent solution.
I also think it would be possible make it in the way below:
a<- data.frame(V1= c("carlos rodrigo", "sarah", "patricia raquel", "leonardo"), V2= c("rodrigo", "patri", "raquel", "oscar leonardo"),
Result = c( "carlos", "patri sarah", "patricia", "oscar"))
symdiff <- function( x, y) { setdiff( union(x, y), intersect(x, y))}
a %>%
rowwise() %>%
V1sp=strsplit((tolower(V1), " "),
V2sp=strsplit((tolower(V2), " "),
Result2=str_c(symdiff(V1sp, V2sp), collapse = " ")
The chapter on functions in R4DS is here : 19 Functions | R for Data Science (had.co.nz)
Also I remember that package 'swirl' interactive R lessons had good coverage of functions.
swirl | Students (swirlstats.com)
You have some extra parentheses. There's one difference with using " " instead of "\s+" - look at row 3 of the results. patricia has a trailing space. because "patricia raquel" has two spaces in the middle.
a<- data.frame(V1= c("carlos rodrigo", "sarah", "patricia raquel", "leonardo"), V2= c("rodrigo", "patri", "raquel", "oscar leonardo"),
Result = c( "carlos", "patri sarah", "patricia", "oscar"))
symdiff <- function( x, y) { setdiff( union(x, y), intersect(x, y))}
a %>%
rowwise() %>%
V1sp=strsplit(tolower(V1), " "),
V2sp=strsplit(tolower(V2), " "),
Result2=str_c(symdiff(V1sp, V2sp), collapse = " ")
#> # A tibble: 4 x 6
#> # Rowwise:
#> V1 V2 Result V1sp V2sp Result2
#> <chr> <chr> <chr> <list> <list> <chr>
#> 1 carlos rodrigo rodrigo carlos <chr [2]> <chr [1]> "carlos"
#> 2 sarah patri patri sarah <chr [1]> <chr [1]> "sarah patri"
#> 3 patricia raquel raquel patricia <chr [3]> <chr [1]> "patricia "
#> 4 leonardo oscar leonardo oscar <chr [1]> <chr [2]> "oscar"
Created on 2022-03-30 by the reprex package (v2.0.1)
April 6, 2022, 4:00pm
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed. If you have a query related to it or one of the replies, start a new topic and refer back with a link.