Extract Data from Lists of Lists

I am trying to move away from subsetting R objects with [] syntax and instead use tidyverse functions in support of tidy models. Here are two examples of objects with lists and lists in lists. I can see what I want to extract but can't seem to get the functions unnest(), pluck(), pull() or map to output a tibble or dataframe.

Example 1 - I would like to return a tibble of the parameter ranges of a list of parameters, and have these in a tibble:

p <- list(min_n(), trees(), learn_rate())
p1 <- dials::parameters(p)

the result I'd like is x <- as.tibble(p1[[6]][[1]][[2]]) , is there a way to represent this in tidyverse syntax?

Example 2 - I would like to return a row bound output of resampling of df as a tibble showing the split for each fold.
Target <- as.factor(sample(c("A", "B"), 100, replace = TRUE))
Other <- as.factor(sample(c("AA", "BB", "CCC", "DDD"), 100, replace = TRUE))
Numb1 <- sample(1:100, 100, replace = TRUE)
Numb2 <- sample(1:100, 100, replace = TRUE)
df <- data.frame(Target, Other, Numb1, Numb2)

res <- vfold_cv(df, v = 5, repeats = 1, strata = Target)

the result I'd like is a tibble with all fold data stacked, y <- res[[1]][[1]][["data"]] for each of the 5 folds.

I see the extractor help functions but would much rather understand and use base tidyverse functions if possible.

Suggestions are appreciated.

I don't think you can avoid using [] when dealing with complex objects containing nested lists. There are ways of iterating through such objects within tidyverse, but it very much depends on the object in question.

As for example 1, could you show a reproducible code here, so we can have a look at the data?

Here is how I would approach example 2 using nested data.

library(tidyverse)
library(rsample)

# Create your data
Target <- as.factor(sample(c("A", "B"), 100, replace = TRUE))
Other <- as.factor(sample(c("AA", "BB", "CCC", "DDD"), 100, replace = TRUE))
Numb1 <- sample(1:100, 100, replace = TRUE)
Numb2 <- sample(1:100, 100, replace = TRUE)
df <- data.frame(Target, Other, Numb1, Numb2)

res <- vfold_cv(df, v = 5, repeats = 1, strata = Target)

# Extract "data" from "splits" and convert it to a nested data frame
tb <- res |> 
  mutate(nested = list(splits[[1]]$data)) |>
  select(-splits)
tb
#> # A tibble: 5 × 2
#>   id    nested        
#>   <chr> <list>        
#> 1 Fold1 <df [100 × 4]>
#> 2 Fold2 <df [100 × 4]>
#> 3 Fold3 <df [100 × 4]>
#> 4 Fold4 <df [100 × 4]>
#> 5 Fold5 <df [100 × 4]>

tb contains a column of nested data frames. Each cell in this column contains a data frame with 100 rows and 4 columns. We can unnest all of them with one command:

tb |>
  unnest(nested)
# A tibble: 500 × 5
   id    Target Other Numb1 Numb2
   <chr> <fct>  <fct> <int> <int>
 1 Fold1 A      DDD      10    90
 2 Fold1 B      AA       43    59
 3 Fold1 B      BB       49    11
 4 Fold1 A      CCC       1    88
 5 Fold1 A      DDD      51    14
 6 Fold1 B      DDD       3    52
 7 Fold1 A      BB       60    11
 8 Fold1 B      DDD      79   100
 9 Fold1 B      AA        8    47
10 Fold1 A      DDD      12    93
# ℹ 490 more rows

You can read more about nested data here and here.

1 Like

you might try


expanded_res <- split(res,~id) |> 
  map(\(x){
    pull(x,splits) |> 
      pluck(1) |> 
      pluck("data")}) |>
  bind_rows(.id = "fold_id")
1 Like

perhaps

pluck(p1,6) |> pluck(1) |> pluck(2)  |> unlist() |> enframe()
1 Like

fyi, for example 2 I found collect_predictions() from tune and this meets my requirements

for example 1, I failed to mention I would like get all of the range values for all parameters into a tibble. The code you provided is helpful and gets me one set of ranges without labels. How can I get all ranges with their labels into a dataframe?

thanks

btw example 1 requires the tidymodel library, it is a reprex otherwise

ok, since my main focus is tidymodels and the list constructs used to manage data, I found that chapters 12-14 of Tidy Modeling with R answers all my questions here. The examples are explanatory and there are functions to extract out data required by the ML practitioner.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.