The older version of tidyr::unnest(), now called tidyr::unnest_legacy(), handles unnesting of different columns types by merging the column types.
The new version of tidyr::unnest() returns an error if some of the columns are of the same name, but different types.
I'm guessing the new version is safer, but I never had any issues with the old version.
How should I unnest a list of dataframes with different column types going forward?
For now, I can use unnest_legacy(), but if I want to use the new unnest() should I then first run something like mutate() and map() to get identical column types or is there an easier way around this issue?
Reproducible example:
library(tidyr)
a <- tibble(
value = rnorm(2),
char_vec = c(NA, "A")) # character vector
b <- tibble(
value = rnorm(2),
char_vec = c(NA, NA)) # logical
df <- tibble(
file = list(a, b))
# New tidyr::unnest()
unnest(df, cols = c(file))
#> No common type for `..1$file$char_vec` <character> and `..2$file$char_vec`
#> <logical>.
# Old tidyr::unnest()
unnest_legacy(df, file)
#> # A tibble: 4 x 2
#> value char_vec
#> <dbl> <chr>
#> 1 0.295 <NA>
#> 2 -0.389 A
#> 3 0.0308 <NA>
#> 4 -1.31 <NA>
I thought there might be a way to override this check (for common class types) but I didn't see any. So far, a fix to the example you had could be - clearly this is not ideal and automatic as it doesn't check whether the column types are the same.
library(tidyr)
a <- tibble(
value = rnorm(2),
char_vec = c(NA, "A")) # character vector
b <- tibble(
value = rnorm(2),
char_vec = as.character(c(NA, NA))) # was logical, now character
df <- tibble(
file = list(a, b))
# New tidyr::unnest()
unnest(df, cols = c(file))
#> # A tibble: 4 x 2
#> value char_vec
#> <dbl> <chr>
#> 1 -0.346 <NA>
#> 2 -0.960 A
#> 3 1.04 <NA>
#> 4 0.293 <NA>
That solution works fine as well, but I find it somewhat complicated - at least compared to the unnest_legacy() that just solved this particular case of unnesting without issues.
I am mainly raising the issue because I would like some comments on what the intended or best-practice workflow should be in a case like this.