Hi Team,
I can use strsplit function to split a string with multiple delimiter (see the sample code below). the problem is: the results is not included the delimiter. But, I need know the result string is following which delimiter. So I want to add the delimiter into the string.
I upload a pdf file, the black color is the current results, the red result is what I want to get.
Below is one approach to achieving your desired output. All keywords are first extracted into one column, and then the string is separated into multiple columns (split by each keyword). Finally, the keywords are pasted back into each appropriate column.
Hi scottyd22,
Thank you for your help. I tried your sample code. It seems the columns of X1, ... X5 are not kept in the results. I cannot find any problem of your sample code. Confuse me....
Kai
I can see the result. This is what I wanted. After ran this code, I did try to put the result into a dataframe. I add "try2 <- as.data.frame(try)", at the end of the code, but it still keep the original value.
Could you please tell me how to transfer it into a dataframe?
Instead of splitting by or extracting keywords, you can add an anchor before each of them and separate by that anchor:
try %>%
as_tibble() %>%
mutate(testing_str = str_replace_all(
testing_str,
"(?=keyword1|keyword2|keyword3|keyword4)",
"-_-"
)) %>%
separate(testing_str, sep = "-_-", into = paste0("X", 1:5)) %>%
mutate(across(starts_with("X"), trimws)) # to remove heading and trailing whitespace
If you don't know the number of new columns to create in advance, I'd do as follows:
k <- str_replace_all(try$testing_str, '(?=keyword1|keyword2|keyword3|keyword4)', "-_-")
s <- strsplit(k, '-_-')
tibble(data = s) %>%
unnest_wider(col = data, names_sep = "") %>%
# the last 2 lines are optional
# you can use bind_cols() to add the columns from
# the original dataset
mutate(across(starts_with("data"), trimws)) %>%
rowid_to_column("id")
Hi arangaca,
this is better solution by reduce number code. but in the final, I still want to keep the original testing_str for double checking and delete it later.
How can I do this in your sample?
Thank you,
Kai