 # Finding 3 most common items in a column

Hi, thought I would ask for some pointers. I have a question, I have a grouped data set with 7 columns, and in one column of data there is a variable that returns differing values, and I need to find the 5 most common product numbers in that column. I need to split of all seven columns just with those 5 most common product numbers, but the other columns would be in there, just the rows containing non-most common would not be in there.

I then need to get the data organized so that just observations from that set are used.

Hi!

To help us help you, could you please prepare a reproducible example (reprex) illustrating your issue? Please have a look at this guide, to see how to create one:

In case you prefer to learn how to do this sort of thing by yourself, here is a valuable resource.

1 Like

From the replies I got I was able to improve my code for top 1 MostCommon. Should be possible for you to change to top N

``````MostCommon <- function(x) {
ux <- unique(x)
uxnotna <- ux[which(!is.na(ux))]
if(length(uxnotna) > 0) {
tab <- tabulate(match(x, uxnotna))
candidates = uxnotna[tab == max(tab)]
if (class(x)  == "logical") {
any(candidates) # return TRUE if any true. max returns an integer
} else {
max(candidates) # return highest (ie max) value
}
} else {
ux   # this returns the NA with the right class. ie that of x
}
}
``````
``````library(tidyverse)

#make up data
set.seed(42)
(exdf <- tibble(
prods=c(sample(letters[1:26],size=100,replace = TRUE),
sample(letters[c(4,5,12,25,26)],size=100,replace = TRUE)),
s1 = sample.int(100,size=200,replace=TRUE),
s2 = sample.int(100,size=200,replace=TRUE),
s3 = sample.int(100,size=200,replace=TRUE),
s4 = sample.int(100,size=200,replace=TRUE)
))

# detect the 5 most common
(top_5_prods_df <- count(exdf,prods) %>% arrange(desc(n)) %>% head(n=5))

# go back to original data and only keep the rows for these 5
(result_df <- filter(exdf,
prods %in% top_5_prods_df\$prods))

#check
table(result_df\$prods)``````

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.