I've tried lots of different things, but keep feeling like I must be missing the obvious solution.
What is the tidyverse way to do safe indexing by-group? For example, say I wanted to calculate the change in population from 2000 to 2012 from the populations dataset (provided in tidyr)
The basic approach with [[
or [
errors if some countries don't have a value for the given year (e.g., Montenegro)
suppressPackageStartupMessages(library("tidyverse"))
pop_chg <- population %>%
group_by(country) %>%
summarize(chg_2000_to_2012 = (population[year == 2012]) / (population[year ==2000]))
What I want is for these situations to return NA_real_
I played around with %||% but [
is returning numeric(0), which is different from NULL, so it doesn't help
pop_chg <- population %>%
group_by(country) %>%
summarize(chg_2000_to_2012 = (population[year == 2012] %||% NA_real_) / (population[year ==2000] %||% NA_real_))
I have been able to get a combination of dplyr::nth and purrr::detect_index to work safely
pop_chg <- population %>%
group_by(country) %>%
summarize(chg_2000_to_2012 = (nth(population, detect_index(year, ~.x == 2012))) /
nth(population, detect_index(year, ~.x == 2000)))
But this is so verbose for something relatively simple, there must be a better way, right? Also, when I did this on
a large-ish dataset, it was a lot slower than simple [
A second way that works is to make a variation on %||%
that uses rlang::is_empty instead of is.null
suppressPackageStartupMessages(library("tidyverse"))
`%|||%` <- function (x, y)
{
if (rlang::is_empty(x)) {
y
}
else {
x
}
}
population %>%
group_by(country) %>%
summarize(chg =
(population[year == 2012] %|||% NA_real_) /
(population[year == 2000] %|||% NA_real_))
This is what I am using at the moment, which works, but it feels strange because this seems like such a common operation I was surprised that I would need to define something new, so I assume I'm missing the idiomatic approach.