Split contents wrapped in individual values

vinayprakash808 · September 6, 2020, 6:39pm

Hi all

Is there a way to convert/split the the contents wrapped in a column into individual values as shown in expected output

df
A	  B
da	  list(off= c("Ove","fdsdf"), rou = "Int", br = "MS", act = list(list(name = "Pen", dost = "Unk", dor = "1", do1r = "1", dor_unit = "Unknown")), usel = "fse", d = "Soon", code = "QJ009", mn = "MSK", mnr = "5010")
fsd	  list(rou = "Int1", br = "MS1", act = list(list(name = "Pe1n", dost = "U1nk", dor = "11", do1r = "11", dor_unit = "Un1known")), usel = "1fse", d = "1Soon", code = "Q1J009", mn = "MK")

Expected Output

A	  B-off	  B-rou	 B- br   B-act-name	    B-act-dost	   B-act-dor	B-act-do1r	B-act-dor_unit	     B-usel	      B-d	    B-code	        B-mn	  B-mnr
da	  Ove	  Int	  MS	    Pen         	Unk	           1	      1	         Unknown]	         fse	     Soon	    QJ009	         MSK	  5010
da	  fdsdf	  Int	  MS	    Pen         	Unk	           1	      1	         Unknown]	         fse	     Soon	    QJ009	         MSK	  5010
fsd	  NA	  Int1	  MS1       Pe1n	        U1nk	       11	      11	    Un1Known	         1fse	     1Soon	    Q1J009	         MK	        NA

Actually there are multiple columns similar to Column B. The sample here is just for illustration. Appreciate if anyone could help

Here class of
class(df$B) is Character

woodward · September 6, 2020, 8:55pm

I think it's awkward to work with this list column inside the dataframe, and maybe easier to extract it first. Then you can bind what you want line by line. There might be a more elegant way though. Also, this will fail if the new column names are not unique.

library(dplyr)
library(tibble)
library(stringr)

df <- data.frame(
  A = c("da", "fsd"),
  B = I(list(
    list(off= c("Ove","fdsdf"), rou = "Int", br = "MS", act = list(list(name = "Pen", dost = "Unk", dor = "1", do1r = "1", dor_unit = "Unknown")), usel = "fse", d = "Soon", code = "QJ009", mn = "MSK", mnr = "5010"),
    list(rou = "Int1", br = "MS1", act = list(list(name = "Pe1n", dost = "U1nk", dor = "11", do1r = "11", dor_unit = "Un1known")), usel = "1fse", d = "1Soon", code = "Q1J009", mn = "MK")
    ))
  )

B <-  df$B # remove list column
df <- df %>% select(-"B")

result <- vector("list", nrow(df))
for (i in 1:nrow(df)){
  temp <- as.data.frame(B[[i]])
  names(temp) <- str_replace_all(paste("B", names(temp)), "[\\s\\.]", "-")
  result[[i]] <- bind_cols(slice(df, i), temp)
}
result <- bind_rows(result)
result
#>     A B-off B-rou B-br B-act-name B-act-dost B-act-dor B-act-do1r
#> 1  da   Ove   Int   MS        Pen        Unk         1          1
#> 2  da fdsdf   Int   MS        Pen        Unk         1          1
#> 3 fsd  <NA>  Int1  MS1       Pe1n       U1nk        11         11
#>   B-act-dor_unit B-usel   B-d B-code B-mn B-mnr
#> 1        Unknown    fse  Soon  QJ009  MSK  5010
#> 2        Unknown    fse  Soon  QJ009  MSK  5010
#> 3       Un1known   1fse 1Soon Q1J009   MK  <NA>

^{Created on 2020-09-07 by the reprex package (v0.3.0)}

technocrat · September 6, 2020, 9:08pm

Never let data reach this state.

suppressPackageStartupMessages({library(dplyr)
                                library(readr)})

input <- list(off= c("Ove","fdsdf"), rou = "Int", br = "MS", act = list(list(name = "Pen", dost = "Unk", dor = "1", do1r = "1", dor_unit = "Unknown")), usel = "fse", d = "Soon", code = "QJ009", mn = "MSK", mnr = "5010")
input
#> $off
#> [1] "Ove"   "fdsdf"
#> 
#> $rou
#> [1] "Int"
#> 
#> $br
#> [1] "MS"
#> 
#> $act
#> $act[[1]]
#> $act[[1]]$name
#> [1] "Pen"
#> 
#> $act[[1]]$dost
#> [1] "Unk"
#> 
#> $act[[1]]$dor
#> [1] "1"
#> 
#> $act[[1]]$do1r
#> [1] "1"
#> 
#> $act[[1]]$dor_unit
#> [1] "Unknown"
#> 
#> 
#> 
#> $usel
#> [1] "fse"
#> 
#> $d
#> [1] "Soon"
#> 
#> $code
#> [1] "QJ009"
#> 
#> $mn
#> [1] "MSK"
#> 
#> $mnr
#> [1] "5010"
repaired <- c(input[1][[1]][2],input[2][[1]],input[3][[1]],input[4][[1]][[1]][[1]],input[5][[1]],input[6][[1]],input[7][[1]],input[8][[1]],input[9][[1]],input[10][[1]],input[11][[1]],input[12][[1]])
repaired
#> [1] "fdsdf" "Int"   "MS"    "Pen"   "fse"   "Soon"  "QJ009" "MSK"   "5010"

^{Created on 2020-09-06 by the reprex package (v0.3.0)}

elmstedt · September 6, 2020, 9:54pm

Here's a Base R solution for this particular case, but to reiterate what others have said, data should never look like this to begin with.

df <- data.frame(
  A = c("da", "fsd"),
  B = I(list(list(off = c("Ove","fdsdf"),
                  rou = "Int",
                  br = "MS",
                  act = list(list(name = "Pen",
                                  dost = "Unk",
                                  dor = "1",
                                  do1r = "1",
                                  dor_unit = "Unknown")),
                  usel = "fse",
                  d = "Soon",
                  code = "QJ009",
                  mn = "MSK",
                  mnr = "5010"),
             list(rou = "Int1",
                  br = "MS1",
                  act = list(list(name = "Pe1n",
                                  dost = "U1nk",
                                  dor = "11",
                                  do1r = "11",
                                  dor_unit = "Un1known")),
                  usel = "1fse",
                  d = "1Soon",
                  code = "Q1J009",
                  mn = "MK"))))

dfs <- mapply(merge, df$A, df$B)

combine_dfs <- function(dfs) {
  cols <- Reduce(union, sapply(dfs, names))
  dfs <- lapply(dfs, function(x) {
    x[setdiff(cols, names(x))] <- NA
    x
  })
  Reduce(rbind, dfs)
}
df <- combine_dfs(dfs)

^{Created on 2020-09-06 by the reprex package (v0.3.0)}

vinayprakash808 · September 7, 2020, 5:40am

OH thanks a lot. The solutions here are cool But the problem is . I have this datasets in a excel. When I import it, Column B is actually a List/Character/Factor data types here.

In your case, when I create a dataframe, Column B datatype is Asis and hence it is working.

In my dataset, the solution is not working. Since it "asis"

Here is the dput. (Sorry I should have put earlier only)

structure(list(A = c("da", "fsd"), B = c("list(off= c(\"Ove\",\"fdsdf\"), rou = \"Int\", br = \"MS\", act = list(list(name = \"Pen\", dost = \"Unk\", dor = \"1\", do1r = \"1\", dor_unit = \"Unknown\")), usel = \"fse\", d = \"Soon\", code = \"QJ009\", mn = \"MSK\", mnr = \"5010\")", 
"list(rou = \"Int1\", br = \"MS1\", act = list(list(name = \"Pe1n\", dost = \"U1nk\", dor = \"11\", do1r = \"11\", dor_unit = \"Un1known\")), usel = \"1fse\", d = \"1Soon\", code = \"Q1J009\", mn = \"MK\")"
)), row.names = c(NA, -2L), class = c("tbl_df", "tbl", "data.frame"
))

nirgrahamuk · September 7, 2020, 10:12am

you can preprocess with

library(purrr)
library(rlang)
df$C <- purrr::map(df$B,
           ~eval(parse_expr(.)))

and then use elmstedt's solution with C rather than B

vinayprakash808 · September 7, 2020, 12:00pm

Awesome Nir and everyone here. This is bind blowing. Purrr package is very useful. Appreciate if you share any material that gives good info about purrr. Google has lots and its confusing

technocrat · September 7, 2020, 7:48pm

Tutorial
Video
Documentation with cheatsheet

system · September 28, 2020, 7:48pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.