Finding top and bottom 1% of multiple columns in a dataframe

EGibbs · July 3, 2020, 2:58pm

Hi, I'd like to write a function that returns the top and bottom 1% of each column in a data frame (tibble)- Can anyone help me?

EGibbs · July 3, 2020, 3:05pm

Hi, my data is representative of a density plot and I'd like to find the top and bottom 1% of the distribution, not rows in the column. I hope thats clear, thanks!

nirgrahamuk · July 3, 2020, 3:29pm

library(tidyverse)
(result_df <- summarise_if(iris,
                           is.numeric,
                           ~ list(quantile(x = .,
                                           probs = c(0.01, 0.99)))))

# for display purposes ; the top and bottom 1% cut offs per variable
result_df %>% unnest(cols = everything())


#now get the actual values found
#first low then high
low_1 <- map2(
  .x = names(result_df),
  .y = result_df,
  .f = ~ filter(iris, !!sym(.x) <= (.y %>% unlist() %>% .[[1]])) %>% pull(
    .x
  )
)

names(low_1) <- names(result_df)
low_1

high_99 <- map2(
  .x = names(result_df),
  .y = result_df,
  .f = ~ filter(iris, !!sym(.x) >= (.y %>% unlist() %>% .[[2]])) %>% pull(
    .x
  )
)

names(high_99) <- names(result_df)
high_99

nirgrahamuk · July 4, 2020, 11:02am

map2(
  .x = names(result_df),
  .y = result_df,
  .f = ~ filter(iris, !!sym(.x) >= (.y %>% unlist() %>% .[[2]])) %>% pull(
    .x
  )

map2 provides iteration over two inputs in this case .x and .y params specify them.
within the .f param where you have your function that acts on what you iterate over you can refer to them as .x and .y

EGibbs · July 6, 2020, 12:37pm

thanks for your help!

EGibbs · July 6, 2020, 12:38pm

thanks, ill give this a go!

system · July 27, 2020, 12:40pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.