Hello,
I just realized that when unnesting lists in a dataframe with unnest_longer
, rows which contain character(0)
in the list column are dropped. Below a reprex which makes this behavior hopefully clear.
I assume that this is intended behavior, but I was wondering how others are dealing with it. What it actually means, is that whenever you use unnest_longer
, you first have to check whether there is "an empty cell"/character(0)
. I personally find this behavior quite counter-intuitive. I would have expected/hoped that the row with character(0)
remains as it is.
How are others dealing with this behavior? Many thanks.
library(tidyverse)
my_df <- tibble(
txt=c("chestnut, pear, kiwi, peanut",
"grapes, banana"))
#Extract all nuts
my_df <- my_df %>%
mutate(nuts=str_extract_all(txt, regex("\\w*nut\\w*"))) %>%
mutate(index=row_number(), .before=1)
#Row index 2 has nuts <chr [0]>
my_df
#> # A tibble: 2 x 3
#> index txt nuts
#> <int> <chr> <list>
#> 1 1 chestnut, pear, kiwi, peanut <chr [2]>
#> 2 2 grapes, banana <chr [0]>
#unnest
my_df_long <- my_df %>%
unnest_longer(nuts,
values_to = "nuts_long")
#Row index 2 is now missing
my_df_long
#> # A tibble: 2 x 3
#> index txt nuts_long
#> <int> <chr> <chr>
#> 1 1 chestnut, pear, kiwi, peanut chestnut
#> 2 1 chestnut, pear, kiwi, peanut peanut
#Possible solution; are there other, more convenient approaches?
my_df_comb <- my_df %>%
left_join(., my_df_long)
#> Joining, by = c("index", "txt")
my_df_comb
#> # A tibble: 3 x 4
#> index txt nuts nuts_long
#> <int> <chr> <list> <chr>
#> 1 1 chestnut, pear, kiwi, peanut <chr [2]> chestnut
#> 2 1 chestnut, pear, kiwi, peanut <chr [2]> peanut
#> 3 2 grapes, banana <chr [0]> <NA>
Created on 2022-03-26 by the reprex package (v2.0.1)