I'm trying to create a running total of only the unique values of concatenated strings. I can't figure out how to do it using tidyverse functions and principles.
Given the below data frame that initially contains columns col1
and col2
how do I apply functions to col2
that will return the desired output as shown in col3
In pseudo-code the general idea of what I want is this:
paste0(col3[i-1], ',', col3[i][which(!(col3[i] %in% col3[i-1]))])
but in tidyverse principles and that actually works.
df1 <- data.frame(col1= c(1, 2, 3, 4, 5),
col2 = c('apple,carrot', 'banana,carrot', 'grape,blueberry,mango', 'coconut', 'grape,apple'),
col3 = c('apple,carrot', 'apple,carrot,banana', 'apple,carrot,banana,grape,blueberry,mango', 'apple,carrot,banana,grape,blueberry,mango,coconut', 'apple,carrot,banana,grape,blueberry,mango,coconut'))