A list in a tibble is restored to a different data structure after a nest()/unnest() operation

When I create a list within a tibble, using summarise( zzz=list(yyy)), then nest() that tibble, then later unnest() it, the structure of that list has changed. Is there someway to preserve the original structure?

Example:

  TestData = 
    tibble::tribble(
      ~letter, ~value,
      "a",   1,
      "a",   3,
      "b",   10,
      "b",   30
    )

  stats =
    TestData %>% 
    group_by( letter  ) %>% 
    summarise( .groups = "keep",
               mean = mean( value ),
               rawGroupData = list( value )
               ) 
  
  stats2 = nest(stats, .key="statsdata")
  stats3 = unnest_wider(stats2, col = statsdata)

str(stats)
str(stats3)

>   str(stats)
gropd_df [2 × 3] (S3: grouped_df/tbl_df/tbl/data.frame)
 $ letter      : chr [1:2] "a" "b"
 $ mean        : num [1:2] 2 20
 $ rawGroupData:List of 2
  ..$ : num [1:2] 1 3
  ..$ : num [1:2] 10 30
 - attr(*, "groups")= tibble [2 × 2] (S3: tbl_df/tbl/data.frame)
  ..$ letter: chr [1:2] "a" "b"
  ..$ .rows : list<int> [1:2] 
  .. ..$ : int 1
  .. ..$ : int 2
  .. ..@ ptype: int(0) 
  ..- attr(*, ".drop")= logi TRUE


>   str(stats3)
gropd_df [2 × 3] (S3: grouped_df/tbl_df/tbl/data.frame)
 $ letter      : chr [1:2] "a" "b"
 $ mean        : num [1:2] 2 20
 $ rawGroupData: list<list> [1:2] 
  ..$ :List of 1
  .. ..$ : num [1:2] 1 3
  ..$ :List of 1
  .. ..$ : num [1:2] 10 30
  ..@ ptype: list()
 - attr(*, "groups")= tibble [2 × 2] (S3: tbl_df/tbl/data.frame)
  ..$ letter: chr [1:2] "a" "b"
  ..$ .rows : list<int> [1:2] 
  .. ..$ : int 1
  .. ..$ : int 2
  .. ..@ ptype: int(0) 
  ..- attr(*, ".drop")= logi TRUE

If you want to recover the original structure, why are you using unnest_wider rather than unnest?

iirc, its to ensure that any multiple values generate additional columns rather than duplicate rows.

For example, the full code has:

      tidyr::nest() %>% 
      mutate( corrections = purrr::map( data, function(x) CorrectHinges(statsdata) ) ) %>% 
      tidyr::unnest_wider( corrections ) %>% 
      tidyr::unnest_wider( statsdata ) %>%

where CorrectHinges returns:

return( list(lowHinge=CorrectedLowHinge, highHinge=CorrectedHighHinge) )

unnest(corrections) results in duplicated rows with one returned value per row. I thought the same was prudent for unnesting statsdata, but didn't notice that it had mangled the list structure. unnest() seems to preserve it, though at the risk of introducing a possibly hard-to-catch error.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.