Hi folks!
I've got a ML project where a whole host of classes (~200ish) are nested within each observation. Is there a way to tune the number of classes within this nested structure that get passed to the model? I can filter manually to just the top n, but if there's a more systematic way to do so that'd be more ideal!
library(tidyverse)
train_nested <-
tibble(observation = rep(seq(1, 5), 50),
classes = round(runif(250, 1, 200))) %>%
mutate(classes = paste("class", classes)) %>%
nest(data = classes) %>%
mutate(other_pred = rnorm(5))
train_nested
#> # A tibble: 5 x 3
#> observation data other_pred
#> <int> <list> <dbl>
#> 1 1 <tibble [50 x 1]> -0.0702
#> 2 2 <tibble [50 x 1]> 0.273
#> 3 3 <tibble [50 x 1]> -1.47
#> 4 4 <tibble [50 x 1]> -1.99
#> 5 5 <tibble [50 x 1]> 1.12
train_nested %>%
unnest(data) %>%
count(classes) %>%
arrange(desc(n)) %>%
slice_head(n = 15) %>%
mutate(classes = fct_reorder(classes, n)) %>%
ggplot(aes(x = classes,
y = n)) +
geom_col() +
coord_flip()
Created on 2022-02-25 by the reprex package (v2.0.1)