BLUF:
How would the following be emulated using dplyr?
Using the ChickWeight dataset to practice.
Primarily use data.table
Linear plots suggest 'missing' chicks.
Wanted to create NA values in dataset to easily pull missing chicks.
library(data.table)
dt <- data.table(ChickWeight)
# Personal preference and not at all necessary; format variable names to lowercase
dt <- dt[, .(weight, time = Time, chick = Chick, diet = Diet)]
# Reshaping data to include NAs
dt_1 <- melt(
dcast(dt,
chick ~ time,
value.var = c("weight", "diet")),
id.vars = "chick",
measure.vars = patterns("^weight", "^diet"),
variable.name = "time",
value.name = c("weight", "diet")
)
# NA results
dt_1[is.na(weight)][order(chick, time)]
This was a fun little project and I'd love to learn more.
I'm not super familiar with data.table - am I right in saying you're inserting the missing combinations of chick and time, creating NA values for weight and diet? If so, in the tidyverse you don't need to reshape - there's a tidyr function called complete() that does that for you:
library(tidyverse, quietly = T)
tibble(ChickWeight) %>%
janitor::clean_names() %>%
complete(chick, time) %>%
filter(is.na(weight) & is.na(diet))
#> # A tibble: 22 × 4
#> chick time weight diet
#> <ord> <dbl> <dbl> <fct>
#> 1 18 4 NA <NA>
#> 2 18 6 NA <NA>
#> 3 18 8 NA <NA>
#> 4 18 10 NA <NA>
#> 5 18 12 NA <NA>
#> 6 18 14 NA <NA>
#> 7 18 16 NA <NA>
#> 8 18 18 NA <NA>
#> 9 18 20 NA <NA>
#> 10 18 21 NA <NA>
#> # … with 12 more rows