So, I'm getting a little confused here as I increasingly work with nested lists inside tibble columns.
First, a simple, no-lists example. With this, I know mutate
will operate row-by-row without having to specify group_by()
and it will produce values in the new RR
column which are the same as in data_var
column:
library(tidyverse)
df <- tribble(
~name, ~data_var,
"Jose", 5L,
"Beth", 7L,
"George", 10L
)
# RR column is correct
df %>%
mutate(RR = data_var)
#> # A tibble: 3 x 3
#> name data_var RR
#> <chr> <int> <int>
#> 1 Jose 5 5
#> 2 Beth 7 7
#> 3 George 10 10
Created on 2019-04-23 by the reprex package (v0.2.1)
Now for a really simple nested list example (the actual data I'm working on is more involved). Here, the first attempt returns out a 5
for all rows, which is not what I expected. The second attempt does what I expect it to do, but I have to use a group_by()
argument first.
My question is...why?
library(tidyverse)
df2 <- tribble(
~name, ~data_var,
"Jose", list('RR' = 5L),
"Beth", list('RR' = 7L),
"George", list('RR' = 10L))
# RR column is incorrect, shouldn't be all 5. This is unexpected.
df2 %>%
mutate(RR = pluck(data_var, 1, 1))
#> # A tibble: 3 x 3
#> name data_var RR
#> <chr> <list> <int>
#> 1 Jose <list [1]> 5
#> 2 Beth <list [1]> 5
#> 3 George <list [1]> 5
# RR column is now correct after using group_by(). But why?
df2 %>%
group_by(name) %>%
mutate(RR = pluck(data_var, 1, 1))
#> # A tibble: 3 x 3
#> # Groups: name [3]
#> name data_var RR
#> <chr> <list> <int>
#> 1 Jose <list [1]> 5
#> 2 Beth <list [1]> 7
#> 3 George <list [1]> 10
Created on 2019-04-23 by the reprex package (v0.2.1)