# How to convert a heterogeneous list of vectors to a 2 column data frame?

I have a heterogeneous list of vectors that I want to convert to a two column data frame. I have found a few solutions, but they are all quite complex. And I’m having trouble searching for ideas because I only find
solutions for lists of vectors of the same length. Is there a simpler way to accomplish this task?

Here is a minimal example of the starting list. The vectors vary in length and can also empty vectors or even `NULL`.

``````input <- list(A = letters[1:3], B = letters[3:4], C = NULL, D = character(0))
input

## \$A
## [1] "a" "b" "c"
##
## \$B
## [1] "c" "d"
##
## \$C
## NULL
##
## \$D
## character(0)
``````

And here is my desired output data frame. Each row corresponds to one of the elements of the vectors in the list of vectors, i.e. the first column is the name of the list element and the second column is the element of the vector. List elements with no data (e.g. `NULL` or `character(0)`) are omitted:

``````output <- data.frame(name = c(rep("A", length(input\$A)), rep("B", length(input\$B))),
item = c(input\$A, input\$B), stringsAsFactors = FALSE)
output

##   name item
## 1    A    a
## 2    A    b
## 3    A    c
## 4    B    c
## 5    B    d
``````

I tried `unlist()`, which properly omits the empty list elements. But unfortunately it appends numbers to the names, which would require writing a fragile regex to remove them (e.g. what if the names of the list elements ended in numbers?).

``````list2df_unlist <- function(x) {
tmp <- unlist(x)
data.frame(name = names(tmp), item = tmp, stringsAsFactors = FALSE)
}
list2df_unlist(input)

##    name item
## A1   A1    a
## A2   A2    b
## A3   A3    c
## B1   B1    c
## B2   B2    d
``````

My solution using base R used `mapply()` + `do.call()` and also required a separate helper function to properly filter the empty list elements.

``````list2df_mapply <- function(x) {
list_to_df <- function(name, vec) {
if (is.null(vec) || length(vec) == 0) return(NULL)

data.frame(name = name, item = vec, stringsAsFactors = FALSE)
}

tmp <- mapply(list_to_df, as.list(names(x)), x)
do.call(rbind, tmp)
}
list2df_mapply(input)

##   name item
## 1    A    a
## 2    A    b
## 3    A    c
## 4    B    c
## 5    B    d
``````

My solution with purrr is simpler by replacing `mapply()` + `do.call()` with a single call to `map2_dfr()`, but it still required the helper function.

``````list2df_purrr <- function(x) {
list_to_df <- function(name, vec) {
if (is.null(vec) || length(vec) == 0) return(NULL)

data.frame(name = name, item = vec, stringsAsFactors = FALSE)
}
purrr::map2_dfr(names(input), input, list_to_df)
}
list2df_purrr(input)

##   name item
## 1    A    a
## 2    A    b
## 3    A    c
## 4    B    c
## 5    B    d
``````

I also explored `purrr::imap_dfr()`, but couldn’t get it to work. Any ideas on how to make this transformation code more readable? Thanks!

Almost there. Just the name of column 2 is ugly.

``````library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#>     filter, lag
#> The following objects are masked from 'package:base':
#>
#>     intersect, setdiff, setequal, union
input <- list(A = letters[1:3], B = letters[3:4], C = NULL, D = character(0))

input2 <- lapply(input, as.data.frame, stringsAsFactors = FALSE)
DF <- bind_rows(input2, .id = "Name")
DF
#>   Name X[[i]]
#> 1    A      a
#> 2    A      b
#> 3    A      c
#> 4    B      c
#> 5    B      d
``````

Created on 2019-09-12 by the reprex package (v0.2.1)

2 Likes

Using `as_tibble` avoid this problem, as can be seen here. Just changing the definition of `input2` should be enough.

Modification of code by @FJCC
``````library(magrittr)

input <- list(A = letters[1:3],
B = letters[3:4],
C = NULL,
D = character(0))

input %>%
purrr::map(.f = tibble::as_tibble) %>%
dplyr::bind_rows(.id = "name")
#> # A tibble: 5 x 2
#>   name  value
#>   <chr> <chr>
#> 1 A     a
#> 2 A     b
#> 3 A     c
#> 4 B     c
#> 5 B     d
``````

Alternative solution:

``````library(purrr)
library(tibble)

input <- list(A = letters[1:3],
B = letters[3:4],
C = NULL,
D = character(0))

map_dfr(.x = input,
.f = ~ enframe(x = .x,
name = NULL,
value = "Value does matter"),
.id = "What's in a name")
#> # A tibble: 5 x 2
#>   `What's in a name` `Value does matter`
#>   <chr>              <chr>
#> 1 A                  a
#> 2 A                  b
#> 3 A                  c
#> 4 B                  c
#> 5 B                  d
``````
1 Like

@FJCC @Yarnabrina Thanks to both of you for your help! Below I've converted your suggestions into the function format I was using:

Solution from @FJCC:

``````list2df_dplyr <- function(x) {
tmp <- lapply(x, as.data.frame, stringsAsFactors = FALSE)
tmp <- dplyr::bind_rows(tmp, .id = "name")
colnames(tmp)[2] <-  "item"
tmp
}
list2df_dplyr(input)

##   name item
## 1    A    a
## 2    A    b
## 3    A    c
## 4    B    c
## 5    B    d
``````

Solutions from @Yarnabrina:

``````list2df_tibble <- function(x) {
tmp <- purrr::map(x, tibble::as_tibble)
dplyr::bind_rows(tmp, .id = "name")
}
list2df_tibble(input)

## # A tibble: 5 x 2
##   name  value
##   <chr> <chr>
## 1 A     a
## 2 A     b
## 3 A     c
## 4 B     c
## 5 B     d

list2df_enframe <- function(x) {
purrr::map_dfr(x, ~ tibble::enframe(x = .x, name = NULL, value = "item"),
.id = "name")
}
list2df_enframe(input)

## # A tibble: 5 x 2
##   name  item
##   <chr> <chr>
## 1 A     a
## 2 A     b
## 3 A     c
## 4 B     c
## 5 B     d
``````

I like the succinctness of this final approach. The main confusion I see with it (e.g. when returning to the code months later) is that you have to set `name = NULL` in the call to `enframe()` because the `name` column is instead added by `map_dfr()`.

And here is a solution using data.table. It is analogous to dplyr solution, replacing `bind_rows()` with `rbindlist()`.

``````list2df_dt <- function(x) {
tmp <- lapply(x, as.data.frame, stringsAsFactors = FALSE)
tmp <- data.table::rbindlist(tmp, idcol = "name")
colnames(tmp)[2] <-  "item"
tmp
}
list2df_dt(input)

##    name item
## 1:    A    a
## 2:    A    b
## 3:    A    c
## 4:    B    c
## 5:    B    d
``````

It seems that the reason that this is so much more cumbersome using only base R is that the `do.call(rbind, list)` paradigm doesn't provide a mechanism for adding an ID column.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.