 # Finding the most common occurring character vector (Finding the mode for a non numerical variable)

Can anyone help me with finding a way how to calculate the most common occurring value from this example data frame i have created with dput().

structure(list(X = c("Camden", "Bromley", "Croydon", "Bromley",
"Croydon", "Camden"), Y = c("Theft", "Theft", "Theft", "Burglary",
"Theft", "Robbery")), class = "data.frame", row.names = c(NA,
-6L))

In this data frame X represents The location, and Y represents the type of offence corresponding to that location.

My aim is to find out the most common offence for a particular type of location and then present the answer of that in a summary table. How can i go about doing this when I have thousands of data points?

For this example, of course the most common offence in croydon is theft. But how can I use code to find this answer.

p.s. I am a new user to R and have only had experience with tidyverse, dply and ggplot2.

I used df as name of the dataset.

``````df=structure(list(X = c("Camden", "Bromley", "Croydon", "Bromley",
"Croydon", "Camden"), Y = c("Theft", "Theft", "Theft", "Burglary",
"Theft", "Robbery")), class = "data.frame", row.names = c(NA,
-6L))
df_table=table(df)

df1=cbind(rownames(df_table),colnames(df_table)[apply(df_table,1,which.max)])
colnames(df1)=c("x","y")
``````

Here, df1 is the final datasaet.

Here is a way to do it using the tidyverse. No guarantees that this is the best/most efficient way, but it works!

As above, I'm calling your sample data frame `df` (thanks for providing reproducible sample data, by the way!

``````# For clarity, rename X and Y to more descriptive names
df <- structure(list(X = c("Camden", "Bromley", "Croydon", "Bromley",
"Croydon", "Camden"), Y = c("Theft", "Theft", "Theft", "Burglary",
"Theft", "Robbery")), class = "data.frame", row.names = c(NA,
-6L)) %>% rename("location" = X, "offense" = Y)

# Calculate the number of times each offense occurred in each location
tallied <- df %>% group_by(location, offense) %>%
summarize(frequency = n()) %>%
ungroup()

# Filter down to the most common offense for each location
summ <- tallied %>%
group_by(location) %>%
filter(frequency == max(frequency)) %>%
ungroup()
``````

You could definitely combine the steps above into a single pipeline if you prefer; I just left them separate so it's a little easier to read.

Note also that my solution above preserves both answers if there are ties. If you want to select just a single offense in the case of a tie, you might try using `slice()` to take e.g. only the first or only the last row after grouping by `location`.

I hope that helps!

1 Like