I'm back seeking your help again. I have messy data and I have cleaned and put in a structured format as much as I could. I need help from this point onward.
Solution based on tidyverse and without writing up new function would be great but all solution is still awesomely appreciated.
library(tidyverse)
#sample dataset used for all examples
sample <- tibble::tribble(
~Config, ~Input,
"name", "BMS",
"id", "1234567",
"postSetupProcessing", "SplitbyLine",
"lines", "2",
"DrugGroup", "123,235,531,987",
"report", "Market",
"report", "Waterfall",
"report", "KM",
"name", "Intel",
"id", "434976",
"DrugGroup", "123,498,412,999",
"report", "Market",
"report", "Waterfall",
"report", "KM",
"report", "sankey diagram",
"name", "J and J",
"id", "18745",
"new", "Warfin",
"old", "Warfin2",
"report", "Market",
"report", "Waterfall"
)
- I want to insert the same index number between the certain text of another column. Column Config has text "name'" reoccurring at various interval of row. I want to insert index starting from 1 to all rows that begin from the row which has "name" text and end just before another row which has "name" text.
expected_output_1 <- tibble::tribble(
~Index, ~Config, ~Input,
1L, "name", "BMS",
1L, "id", "1234567",
1L, "postSetupProcessing", "SplitbyLine",
1L, "lines", "2",
1L, "DrugGroup", "123,235,531,987",
1L, "report", "Market",
1L, "report", "Waterfall",
1L, "report", "KM",
2L, "name", "Intel",
2L, "id", "434976",
2L, "DrugGroup", "123,498,412,999",
2L, "report", "Market",
2L, "report", "Waterfall",
2L, "report", "KM",
2L, "report", "sankey diagram",
3L, "name", "J and J",
3L, "id", "18745",
3L, "new", "Warfin",
3L, "old", "Warfin2",
3L, "report", "Market",
3L, "report", "Waterfall"
)
- Using the above expected_output_1 result, I'd like to collapse any repeated text in Config column for each unique index and put them in one cell. For e.g. for index 1, report is repeated 3 times, therefore, it is collapsed into just 1 row and its input filled are all collapsed into one cell.
expected_output_2 <- tibble::tribble(
~Index, ~Config, ~Input,
1L, "name", "BMS",
1L, "id", "1234567",
1L, "postSetupProcessing", "SplitbyLine",
1L, "lines", "2",
1L, "DrugGroup", "123,235,531,987",
1L, "report", "Market,Waterfall,KM",
2L, "name", "Intel",
2L, "id", "434976",
2L, "DrugGroup", "123,498,412,999",
2L, "report", "Market,Waterfall,KM,sankey diagram",
3L, "name", "J and J",
3L, "id", "18745",
3L, "new", "Warfin",
3L, "old", "Warfin2",
3L, "report", "Market,Waterfall"
)
Any help will be Greatly appreciated