Understanding unnest and its expectations

I've run into a problem with my understanding of unnest on a particular dataset and after numerous attempts to resolve it via Stack Overflow, blogs and tons of Googling, I step away from the problem. I'm certain its my lack of understanding but I'm hoping asking this here, helps others.

I have a dataframe -- simplified here as I haven't figured out a simple way to provide code to create it. It has a column of names and then two columns of lists representing input and output observations with their timestamps (seconds since the epoch) and values:

> str(unnestquestion)
'data.frame':	3 obs. of  3 variables:
 $ Input:List of 3
  ..$ : num [1:3, 1:2] 1.51e+09 1.51e+09 1.51e+09 5.00e+05 NA ...
  ..$ : num [1:3, 1:2] 1.51e+09 1.51e+09 1.51e+09 3.39e+05 NA ...
  ..$ : num [1:3, 1:2] 1.51e+09 1.51e+09 1.51e+09 4.60e+06 NA ...
 $ Ouput:List of 3
  ..$ : num [1:3, 1:2] 1.51e+09 1.51e+09 1.51e+09 4.22e+06 NA ...
  ..$ : num [1:3, 1:2] 1.51e+09 1.51e+09 1.51e+09 7.46e+06 NA ...
  ..$ : num [1:3, 1:2] 1.51e+09 1.51e+09 1.51e+09 2.39e+07 NA ...
 $ name : chr  "CIR0019209" "CIR0019431" "CIR0006077"

I've dreamt I resolved this in the past but my RStudio had 50 open tabs and I over aggressively cleaned it up recently.

Right now you have list columns of matrices, which don't unnest well. You can use purrr::map to iterate over each list column and coerce each matrix to a data.frame, which can be unnested properly:

library(tidyverse)

df_of_matrices <- data_frame(name = c('a', 'b', 'c'), 
                             input = list(matrix(1:6, 3)),          # recycles 3x
                             output = list(matrix(rnorm(6), 3)))    # recycles 3x

df_of_matrices
#> # A tibble: 3 x 3
#>    name         input        output
#>   <chr>        <list>        <list>
#> 1     a <int [3 x 2]> <dbl [3 x 2]>
#> 2     b <int [3 x 2]> <dbl [3 x 2]>
#> 3     c <int [3 x 2]> <dbl [3 x 2]>

df_of_matrices %>% 
    mutate_if(is.list, map, as_data_frame) %>% 
    unnest()
#> # A tibble: 9 x 5
#>    name    V1    V2       V11        V21
#>   <chr> <int> <int>     <dbl>      <dbl>
#> 1     a     1     4 0.5319480  1.1047462
#> 2     a     2     5 1.9041804 -0.6874434
#> 3     a     3     6 0.5646727  0.2721582
#> 4     b     1     4 0.5319480  1.1047462
#> 5     b     2     5 1.9041804 -0.6874434
#> 6     b     3     6 0.5646727  0.2721582
#> 7     c     1     4 0.5319480  1.1047462
#> 8     c     2     5 1.9041804 -0.6874434
#> 9     c     3     6 0.5646727  0.2721582

The limitation, obviously, is that by default it will make a mess of names, but they can be set to something more useful in the same fashion or afterwards.

9 Likes

This response is fantastic, I learned so much from a small number of lines. Thanks for helping me understand how to recreate the example data, a great use of mutate_if and purrr::map! I'll be spending more time with purrr.

anyone knows why I get the following error, when executing the code?

df_of_matrices %>% 
+     mutate_if(is.list, map, as_data_frame)
  # Additional arguments should be named

Can you tell us what version of dplyr and purrr are you using? :slight_smile:

1 Like

a good point, that solved it :slight_smile:

  • Forgot that I had not updated my laptop :confused:
1 Like