Finding 3 most common items in a column

PR11 · February 21, 2021, 2:28am

Hi, thought I would ask for some pointers. I have a question, I have a grouped data set with 7 columns, and in one column of data there is a variable that returns differing values, and I need to find the 5 most common product numbers in that column. I need to split of all seven columns just with those 5 most common product numbers, but the other columns would be in there, just the rows containing non-most common would not be in there.

I then need to get the data organized so that just observations from that set are used.
Thanks in advance.

andresrcs · February 21, 2021, 3:01am

Hi!

To help us help you, could you please prepare a reproducible example (reprex) illustrating your issue? Please have a look at this guide, to see how to create one:

FAQ: How to do a minimal reproducible example ( reprex ) for beginners Guides & FAQs

A minimal reproducible example consists of the following items: A minimal dataset, necessary to reproduce the issue The minimal runnable code necessary to reproduce the issue, which can be run on the given dataset, and including the necessary information on the used packages. Let's quickly go over each one of these with examples: Minimal Dataset (Sample Data) You need to provide a data frame that is small enough to be (reasonably) pasted on a post, but big enough to reproduce your issue. Let's say, as an example, that you are working with the iris data frame head(iris) #> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> 1 5.1 3.5 1.4 0.…

In case you prefer to learn how to do this sort of thing by yourself, here is a valuable resource.

mikecro · February 22, 2021, 8:52pm

Does this post help? mode-most-common-value-function-ignoring-na-and-returning-largest-value-in-the-event-of-a-tie/91410

From the replies I got I was able to improve my code for top 1 MostCommon. Should be possible for you to change to top N

MostCommon <- function(x) {
  ux <- unique(x)
  uxnotna <- ux[which(!is.na(ux))]
  if(length(uxnotna) > 0) {
    tab <- tabulate(match(x, uxnotna))
    candidates = uxnotna[tab == max(tab)]
    if (class(x)[1]  == "logical") {
      any(candidates) # return TRUE if any true. max returns an integer
    } else {
      max(candidates) # return highest (ie max) value
    }
  } else {
    ux   # this returns the NA with the right class. ie that of x
  }
}

nirgrahamuk · February 22, 2021, 9:05pm

library(tidyverse)



#make up data
set.seed(42)
(exdf <- tibble(
  prods=c(sample(letters[1:26],size=100,replace = TRUE),
          sample(letters[c(4,5,12,25,26)],size=100,replace = TRUE)),
  s1 = sample.int(100,size=200,replace=TRUE),
  s2 = sample.int(100,size=200,replace=TRUE),
  s3 = sample.int(100,size=200,replace=TRUE),
  s4 = sample.int(100,size=200,replace=TRUE)
))

# detect the 5 most common 
(top_5_prods_df <- count(exdf,prods) %>% arrange(desc(n)) %>% head(n=5))

# go back to original data and only keep the rows for these 5
(result_df <- filter(exdf,
       prods %in% top_5_prods_df$prods))

#check
table(result_df$prods)

system · March 15, 2021, 9:05pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.