How to use tidy eval to create a tibble from a list of lists?

Hlynur · March 27, 2018, 12:39pm

So I've found myself wanting to create a tibble from a list of lists. After using the inconcievably lovely transpose() function from the purrr package I've ended up with something structured like this, albeit much, much longer.

listi <- list("id"       = list(list(708107),           list(780583)),
              "name"     = list(list("Nessprettur"),    list("Brekka")),
              "distance" = list(list(444.46),           list(902.376)))

Looking at this, I could of course create the tibble by hand, which wouldn't be a big deal for a very small list. Let's say my list is in fact only three variables and two observations.

It would be straight forward to do something like this:

library(tidyverse)
tidy_frame <- data_frame("id" =       unlist(listi$id),
                         "name" =     unlist(listi$name),
                         "distance" = unlist(listi$distance))

tidy_frame
# A tibble: 2 x 3
#id      name        distance
#<dbl>   <chr>          <dbl>
#708107. Nessprettur     444.
#780583. Brekka          902.

But of course, that is very problematic for a list that's, say, ten times longer than that. Perhaps I should do what I want to achieve through do.call() and cbind()?

list_column_frame <- data.frame(do.call(cbind, listi)) %>% 
  as_data_frame()

list_column_frame
# A tibble: 2 x 3
#id         name       distance  
#<list>     <list>     <list>    
#<list [1]> <list [1]> <list [1]>
#<list [1]> <list [1]> <list [1]>

list_column_frame %>% map_df(~unlist(.x))
# A tibble: 2 x 3
#id      name        distance
#<dbl>   <chr>          <dbl>
#708107. Nessprettur     444.
#780583. Brekka          902.

This certainly gets the job done and perhaps I should just be happy with getting the result I wanted. However, having watched Hadley's 5 minute intro to tidy eval, and Lionel's webinar on the subject, curiousity has gotten the better of me and I really want to know how I would write a function using tidy eval so that I could perhaps finally get tidy eval, even if just a tiny, tiny bit

I've tried countless variations on something like this (and none of them work - which makes sense as I well and truly don't know what I'm doing, despite everything in the aforementioned videos making sense to me when I see, for instance, Lionel doing it).

library(rlang)
library(glue)

listi_create <- function(x){
  y <- sym(x)
  eval_tidy(`$`(listi, y))
}

listi_create("id")
#NULL

names(listi) %>% 
  map_df(~data_frame(!!.x := listi_create(.x)))
#Error: Column `id` must be a 1d atomic vector or a list

What would be the correct approach to something like this? Any and all help appreciated.

Edwin · March 27, 2018, 1:24pm

Hi Hlynur,

I think you won't have much luck with tidy evaluation in this setting. This is because you don't use, or ought to use, functions that quote their input. Such as the main verbs from dplyr do. If you want to get your feet wet with tidy evaluation, I would suggest to try to write wrappers around functions that do quote their input, such as those in dplyr.

Regards,
Edwin

nwerth · March 27, 2018, 1:45pm

Easy way to do what you want with an arbitrary number of columns:

library(dplyr)

tidy_frame <- listi %>%
  lapply(FUN = unlist) %>%
  as_data_frame()

It's not powered by tidy eval, but as @Edwin mentions, that's fine. If you're just using this to practice tidy eval and will never use this code for actual analysis, then I wish you good luck. But I worry too many people see tidy eval (and non-standard evaluation in general) as being a universal tool (like a power drill) instead of a tool for very specific cases (like an impact driver).

Hlynur · March 27, 2018, 3:27pm

Thanks so much for your replies. I have been experimenting with creating simple mutate() wrapper functions. While I'm nowhere near comfortable enough with that to have it a consistent part of my workflow, I figured I'd expand the scope of what I'm fiddling around with while trying to understand tidy eval. So I started wondering if there is a kosher way of doing evaluation like this:


listi_create <- function(x){
  y <- glue("listi${x}")
  unlist(eval(parse(text = y)))
}

names(listi) %>% 
  map(~data_frame(!!.x := listi_create(.x))) %>% 
  bind_cols()
#> # A tibble: 2 x 3
#> id        name       distance
#> <dbl>     <chr>         <dbl>
#> 708107.   Nessprettur     444.
#> 780583.   Brekka          902.

But rather than using the base::eval() function and parse(), I could use the tidy eval syntax / methodology. And that was the reason for this question. Do I understand you correctly, that this kind of thing is outside the scope of tidy eval?

Edwin · March 27, 2018, 5:05pm

Of course you can always make a detour to fit in the use of tidyeval. If you want to apply tidyeval for the sake of applying it, I am sure you can eventually find a way. But since you want to convert a multi-layered list to a data frame, purrr is your friend here (as you have found out yourself). Since the purrr::map* functions don't take quoted arguments, using tidyeval here is very unnatural. Applying tidyeval here does not enhance your understanding of it, imho, rather find a real application.

Hlynur · March 27, 2018, 6:56pm

Makes sense. Thanks.

alistaire · March 27, 2018, 6:57pm

Variations on a theme:

library(tidyverse)

listi <- list("id"       = list(list(708107),           list(780583)),
              "name"     = list(list("Nessprettur"),    list("Brekka")),
              "distance" = list(list(444.46),           list(902.376)))

map_dfc(listi, ~simplify(flatten(.x)))
#> # A tibble: 2 x 3
#>        id name        distance
#>     <dbl> <chr>          <dbl>
#> 1 708107. Nessprettur     444.
#> 2 780583. Brekka          902.

map_dfc(listi, unlist)
#> # A tibble: 2 x 3
#>        id name        distance
#>     <dbl> <chr>          <dbl>
#> 1 708107. Nessprettur     444.
#> 2 780583. Brekka          902.

listi %>% as_data_frame() %>% unnest() %>% unnest()
#> # A tibble: 2 x 3
#>        id name        distance
#>     <dbl> <chr>          <dbl>
#> 1 708107. Nessprettur     444.
#> 2 780583. Brekka          902.

Strategy-wise, it's mostly a question of whether to convert to a data frame at the beginning or the end. For this data it doesn't matter, but sometimes it's better to convert to a data frame early, because less-deeply nested variables will be handled well without special treatment.

One last note—usually data like this originates in JSON, and there's a good chance the data can be read in in a cleaner format to start with, e.g. by setting jsonlite::fromJSON's simplify* parameters.