calculating categorical variable based on values in two columns

Hi there,

I am trying to convert an existing category into a new category based on their Electoral.Rating (existing category) and population count. For example, I have managed to recode two of the easiest categories using this code:

df <- df %>%
  mutate(Location_Category = recode(Electorate.Rating,
                                    "Inner Metropolitan" = "Brisbane CBD",
                                    "Outer Metropolitan" = "Greater Brisbane"
  ))
#> Error in df %>% mutate(Location_Category = recode(Electorate.Rating, `Inner Metropolitan` = "Brisbane CBD", : could not find function "%>%"

Created on 2022-11-16 with reprex v2.0.2

However, things get trickier for the business with Electorate.Ratings 'Provincial' or 'Rural'. I need to categorise these based on the population. Where the population is 23,000 or more, then the Location_Category = Regional City. Where the population is below 23,000 then the Location_Category = Regional Town.

I've tried this....

df %>%
  select(Electorate.Rating)
mutate_if(is.character, vars(Electorate.Rating=="Provincial")
)

but I don't know how to 'call-up' the population value and put the conditional statement regarding the size of the population.

Here is a sample set of data

data.frame(
   stringsAsFactors = FALSE,
               Name = c("Business A", "Business B", "Business C", "Business D"),
           Category = c("IL", "LQ", "IL", "IL"),
            Address = c("1 Street St","6 Street St",
                        "10 Street St","12 Street St"),
              State = c("QLD", "QLD", "QLD", "QLD"),
             Suburb = c("BRISBANE", "SANDGATE", "SOUTHPORT", "ST GEORGE"),
           Postcode = c(4000L, 4017L, 4215L, 4487L),
         Population = c(93931L, 81535L, 134750L, 10362L),
           SA2.name = c("Brisbane City, Spring Hill",
                        "Bracken Ridge, Brighton (Qld), Deagon, Sandgate - Shorncliffe",
                        "Labrador, Southport - North, Southport - South","Balonne, Roma Surrounds"),
  Electorate.Rating = c("Inner Metropolitan",
                        "Outer Metropolitan","Provincial","Rural"),
  Location_Category = c("Brisbane CBD",
                        "Greater Brisbane","Provincial","Rural")
)

Thank you, in advance, for any help or advice.

the short answer is to use case_when from dplyr/tidyverse

Here is a worked out example; note Regional City and Regional Town were not included in your example; so I substituted State and Suburb instead; if you in fact; do have city and town you can adapt the code.

library(tidyverse)

df_1 <- data.frame(
  stringsAsFactors = FALSE,
  Name = c("Business A", "Business B", "Business C", "Business D"),
  Category = c("IL", "LQ", "IL", "IL"),
  Address = c(
    "1 Street St", "6 Street St",
    "10 Street St", "12 Street St"
  ),
  State = c("QLD", "QLD", "QLD", "QLD"),
  Suburb = c("BRISBANE", "SANDGATE", "SOUTHPORT", "ST GEORGE"),
  Postcode = c(4000L, 4017L, 4215L, 4487L),
  Population = c(93931L, 81535L, 134750L, 10362L),
  SA2.name = c(
    "Brisbane City, Spring Hill",
    "Bracken Ridge, Brighton (Qld), Deagon, Sandgate - Shorncliffe",
    "Labrador, Southport - North, Southport - South", "Balonne, Roma Surrounds"
  ),
  Electorate.Rating = c(
    "Inner Metropolitan",
    "Outer Metropolitan", "Provincial", "Rural"
  ),
  Location_Category = c(
    "Brisbane CBD",
    "Greater Brisbane", "Provincial", "Rural"
  )
)

(df_2 <- df_1 |> mutate(Location_Category = case_when(
  Electorate.Rating == "Inner Metropolitan" ~ "Brisbane CBD",
  Electorate.Rating == "Outer Metropolitan" ~ "Greater Brisbane",
  Electorate.Rating %in% c("Provincial", "Rural") & Population >= 23000 ~ State,
  Electorate.Rating %in% c("Provincial", "Rural") & Population < 23000 ~ Suburb,
  TRUE ~ "Unknown"
))
)
1 Like

Thank you very @nirgrahamuk! That worked terrific. The final code, tweaked to include Regional City and Regional Town, was:

(df_2 <- df_1  |> mutate(Location_Category = case_when(
  Electorate.Rating == "Inner Metropolitan" ~ "Brisbane CBD",
  Electorate.Rating == "Outer Metropolitan" ~ "Greater Brisbane",
  Electorate.Rating %in% c("Provincial", "Rural") & Population >= 23000 ~ "Regional City",
  Electorate.Rating %in% c("Provincial", "Rural") & Population < 23000 ~ "Regional Town",
  TRUE ~ "Interstate"
))
)

Could I ask a couple of follow up questions:

  1. Why is the whole code in brackets?
  2. Why |> instead of %>%

Thank you again for your help!

It will display your results immediately in console, so you don't have to use View(df2).

This is |> native inside R pipe sign, %>% is coming from magrittr package.
I hope that helps.

1 Like

Thank you @Andrzej - good to know!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.