unnest_longer drops lists/rows with character(0)

Hello,

I just realized that when unnesting lists in a dataframe with unnest_longer, rows which contain character(0) in the list column are dropped. Below a reprex which makes this behavior hopefully clear.

I assume that this is intended behavior, but I was wondering how others are dealing with it. What it actually means, is that whenever you use unnest_longer, you first have to check whether there is "an empty cell"/character(0). I personally find this behavior quite counter-intuitive. I would have expected/hoped that the row with character(0) remains as it is.

How are others dealing with this behavior? Many thanks.

library(tidyverse)

my_df <- tibble(
  txt=c("chestnut, pear, kiwi, peanut",
        "grapes, banana"))

#Extract all nuts
my_df <- my_df %>% 
  mutate(nuts=str_extract_all(txt, regex("\\w*nut\\w*"))) %>% 
  mutate(index=row_number(), .before=1)

#Row index 2 has nuts <chr [0]>
my_df
#> # A tibble: 2 x 3
#>   index txt                          nuts     
#>   <int> <chr>                        <list>   
#> 1     1 chestnut, pear, kiwi, peanut <chr [2]>
#> 2     2 grapes, banana               <chr [0]>

#unnest
my_df_long <- my_df %>% 
  unnest_longer(nuts,
                values_to = "nuts_long")

#Row index 2 is now missing
my_df_long
#> # A tibble: 2 x 3
#>   index txt                          nuts_long
#>   <int> <chr>                        <chr>    
#> 1     1 chestnut, pear, kiwi, peanut chestnut 
#> 2     1 chestnut, pear, kiwi, peanut peanut

#Possible solution; are there other, more convenient approaches?
my_df_comb <- my_df %>% 
  left_join(., my_df_long)
#> Joining, by = c("index", "txt")
my_df_comb
#> # A tibble: 3 x 4
#>   index txt                          nuts      nuts_long
#>   <int> <chr>                        <list>    <chr>    
#> 1     1 chestnut, pear, kiwi, peanut <chr [2]> chestnut 
#> 2     1 chestnut, pear, kiwi, peanut <chr [2]> peanut   
#> 3     2 grapes, banana               <chr [0]> <NA>

Created on 2022-03-26 by the reprex package (v2.0.1)

Not sure about unnest_longer, but unnest has a keep_empty argument.

Compare the results of the following two:

> my_df %>% unnest(nuts, keep_empty = FALSE)  # `FALSE` is the default
# A tibble: 2 x 3
  index txt                          nuts    
  <int> <chr>                        <chr>   
1     1 chestnut, pear, kiwi, peanut chestnut
2     1 chestnut, pear, kiwi, peanut peanut 

vs

> my_df %>% unnest(nuts, keep_empty = TRUE)
# A tibble: 3 x 3
  index txt                          nuts    
  <int> <chr>                        <chr>   
1     1 chestnut, pear, kiwi, peanut chestnut
2     1 chestnut, pear, kiwi, peanut peanut  
3     2 grapes, banana               <NA>  

Hope this helps.

1 Like

Excellent. Many thanks! Just saw that tidyr's unchop function also has an keep_empty argument.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.