Hi, I'd like to write a function that returns the top and bottom 1% of each column in a data frame (tibble)- Can anyone help me?
Hi, my data is representative of a density plot and I'd like to find the top and bottom 1% of the distribution, not rows in the column. I hope thats clear, thanks!
library(tidyverse)
(result_df <- summarise_if(iris,
is.numeric,
~ list(quantile(x = .,
probs = c(0.01, 0.99)))))
# for display purposes ; the top and bottom 1% cut offs per variable
result_df %>% unnest(cols = everything())
#now get the actual values found
#first low then high
low_1 <- map2(
.x = names(result_df),
.y = result_df,
.f = ~ filter(iris, !!sym(.x) <= (.y %>% unlist() %>% .[[1]])) %>% pull(
.x
)
)
names(low_1) <- names(result_df)
low_1
high_99 <- map2(
.x = names(result_df),
.y = result_df,
.f = ~ filter(iris, !!sym(.x) >= (.y %>% unlist() %>% .[[2]])) %>% pull(
.x
)
)
names(high_99) <- names(result_df)
high_99
map2(
.x = names(result_df),
.y = result_df,
.f = ~ filter(iris, !!sym(.x) >= (.y %>% unlist() %>% .[[2]])) %>% pull(
.x
)
map2 provides iteration over two inputs in this case .x and .y params specify them.
within the .f param where you have your function that acts on what you iterate over you can refer to them as .x and .y
thanks for your help!
thanks, ill give this a go!
This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.