I'm pretty good with normal rectangular data. But I think I want to restructure my data to be nested.
My experiments so far are proving useless. Keeping things fairly simple. If I wanted to filter for something inside the nest, is there a tidy way to do it?
Simple dataset:
starwars |>
group_by(homeworld, species)|>
nest()
If I want to search inside the "data" column this creates to find any tibble with gender == "masculine" and return it with the homeworld and species - Can I?
And what if I want to search inside films for "The force Awakens" ( So a list within a tibble within a tibble...?)
Yeah, my cat sort of squeels! Using that tutorial I have:
require(tidyverse)
require(gapminder)
gapminder %>%
group_by(continent) %>%
nest() %>% #Nested data by continent
# Next line is from the tutorial and is calculating life expectency by continent
mutate(avg_lifeExp = map_dbl(data, ~{mean(.x$lifeExp)})) %>%
# This is my dreadfully crude filtering method!
# find 'country = Ireland' this will create a list column with T/F. If you sum Trues they are counted.
# if that is then unlisted you can have a numerical answer for number of times Ireland is a country
# in each continent.
mutate(filt = unlist(map(data, ~{sum(.x$country == "Ireland")})) ) %>%
# I can then filter for anything > 0
filter(filt > 0) %>%
# And then drop the filter bits
select(-filt)
Results in:
> # A tibble: 1 x 3
> # Groups: continent [1]
> continent data avg_lifeExp
> <fct> <list> <dbl>
> 1 Europe <tibble [360 x 5]> 71.9
BUT - it is utterly dreadful code...! It certainly doesn't feel like tidyverse readability. I feel there should be a command like
There might be two use cases for filter - one to find a nest that contains the item of interest (like I have done crudely) and one that filters within the nest and returns only Ireland in the nest.
There is a keep function. Maybe it can do this, but I haven't got it to work at all!
(Even the syntax is simillar to what I suggested!) This code will actually select JUST Ireland data from the nests (i.e. 18 rows in Europe), where as my horrible code will select the whole nest (i.e. 360 rows in Europe) that contains Ireland. Which we can achieve with: