Finding 3 most common items in a column

Hi, thought I would ask for some pointers. I have a question, I have a grouped data set with 7 columns, and in one column of data there is a variable that returns differing values, and I need to find the 5 most common product numbers in that column. I need to split of all seven columns just with those 5 most common product numbers, but the other columns would be in there, just the rows containing non-most common would not be in there.

I then need to get the data organized so that just observations from that set are used.
Thanks in advance.

Hi!

To help us help you, could you please prepare a reproducible example (reprex) illustrating your issue? Please have a look at this guide, to see how to create one:

In case you prefer to learn how to do this sort of thing by yourself, here is a valuable resource.

1 Like

Does this post help? mode-most-common-value-function-ignoring-na-and-returning-largest-value-in-the-event-of-a-tie/91410

From the replies I got I was able to improve my code for top 1 MostCommon. Should be possible for you to change to top N

MostCommon <- function(x) {
  ux <- unique(x)
  uxnotna <- ux[which(!is.na(ux))]
  if(length(uxnotna) > 0) {
    tab <- tabulate(match(x, uxnotna))
    candidates = uxnotna[tab == max(tab)]
    if (class(x)[1]  == "logical") {
      any(candidates) # return TRUE if any true. max returns an integer
    } else {
      max(candidates) # return highest (ie max) value
    }
  } else {
    ux   # this returns the NA with the right class. ie that of x
  }
}
library(tidyverse)



#make up data
set.seed(42)
(exdf <- tibble(
  prods=c(sample(letters[1:26],size=100,replace = TRUE),
          sample(letters[c(4,5,12,25,26)],size=100,replace = TRUE)),
  s1 = sample.int(100,size=200,replace=TRUE),
  s2 = sample.int(100,size=200,replace=TRUE),
  s3 = sample.int(100,size=200,replace=TRUE),
  s4 = sample.int(100,size=200,replace=TRUE)
))

# detect the 5 most common 
(top_5_prods_df <- count(exdf,prods) %>% arrange(desc(n)) %>% head(n=5))

# go back to original data and only keep the rows for these 5
(result_df <- filter(exdf,
       prods %in% top_5_prods_df$prods))

#check
table(result_df$prods)

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.