I have a strong preference of using the tidyverse()
package and so I tried to do so with dtplyr. However, that didn't work out as I hoped for:
Error in UseMethod("fill_"): no applicable method for 'fill_' applied to an object of class "c('dtplyr_step_first', 'dtplyr_step')"
During my data wrangling phase of 2M+ records, I need to use tidyr::fill()
and the workaround I found was to temporarily switch from dt_lazy()
to as_tibble()
and back to dt_lazy()
. See reprex examples below.
Are there any plans of incorporating dtplyr
with the other tidyverse
packages?
library(dtplyr)
library(dplyr, warn.conflicts = FALSE)
library(tidyr)
## |1| variable hair_color contains NA values
starwars
#> # A tibble: 87 x 13
#> name height mass hair_color skin_color eye_color birth_year gender
#> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr>
#> 1 Luke~ 172 77 blond fair blue 19 male
#> 2 C-3PO 167 75 <NA> gold yellow 112 <NA>
#> 3 R2-D2 96 32 <NA> white, bl~ red 33 <NA>
#> 4 Dart~ 202 136 none white yellow 41.9 male
#> 5 Leia~ 150 49 brown light brown 19 female
#> 6 Owen~ 178 120 brown, gr~ light blue 52 male
#> 7 Beru~ 165 75 brown light blue 47 female
#> 8 R5-D4 97 32 <NA> white, red red NA <NA>
#> 9 Bigg~ 183 84 black light brown 24 male
#> 10 Obi-~ 182 77 auburn, w~ fair blue-gray 57 male
#> # ... with 77 more rows, and 5 more variables: homeworld <chr>,
#> # species <chr>, films <list>, vehicles <list>, starships <list>
## |2| fill() propagates missing values with previous value
starwars %>% fill(hair_color)
#> # A tibble: 87 x 13
#> name height mass hair_color skin_color eye_color birth_year gender
#> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr>
#> 1 Luke~ 172 77 blond fair blue 19 male
#> 2 C-3PO 167 75 blond gold yellow 112 <NA>
#> 3 R2-D2 96 32 blond white, bl~ red 33 <NA>
#> 4 Dart~ 202 136 none white yellow 41.9 male
#> 5 Leia~ 150 49 brown light brown 19 female
#> 6 Owen~ 178 120 brown, gr~ light blue 52 male
#> 7 Beru~ 165 75 brown light blue 47 female
#> 8 R5-D4 97 32 brown white, red red NA <NA>
#> 9 Bigg~ 183 84 black light brown 24 male
#> 10 Obi-~ 182 77 auburn, w~ fair blue-gray 57 male
#> # ... with 77 more rows, and 5 more variables: homeworld <chr>,
#> # species <chr>, films <list>, vehicles <list>, starships <list>
## |3| try fill() again with dataset converted to lazy data.table format
starwars %>% lazy_dt() %>% fill(hair_color)
#> Error in UseMethod("fill_"): no applicable method for 'fill_' applied to an object of class "c('dtplyr_step_first', 'dtplyr_step')"
## |4| workaround approach while data wrangling, before starting data anlysis
starwars %>% lazy_dt() %>% as_tibble() %>% fill(hair_color) %>% lazy_dt()
#> Source: local data table [87 x 13]
#> Call: `_DT3`
#>
#> name height mass hair_color skin_color eye_color birth_year gender
#> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr>
#> 1 Luke~ 172 77 blond fair blue 19 male
#> 2 C-3PO 167 75 blond gold yellow 112 <NA>
#> 3 R2-D2 96 32 blond white, bl~ red 33 <NA>
#> 4 Dart~ 202 136 none white yellow 41.9 male
#> 5 Leia~ 150 49 brown light brown 19 female
#> 6 Owen~ 178 120 brown, gr~ light blue 52 male
#> # ... with 5 more variables: homeworld <chr>, species <chr>, films <list>,
#> # vehicles <list>, starships <list>
#>
#> # Use as.data.table()/as.data.frame()/as_tibble() to access results