initial zero causing problems in sort into groups

Hi guys, I have a question: I got codes from supermarket products and have to redirect them to their sectors. All the products have 5-digit. The problem is that products that start with the number 0, become a 4-digit product, so the 0 goes away and the product goes to clothes. FJCC helped me another time but now I don't get what I am doing wrong.

  • What I want is for example every product that starts with 01 or 02 goes to the food sector, products that start with 10 or 20 belong to clothes...

library(dplyr)

DF <- data.frame(Code = c(10105, 01112, 25441, 02422, 02552, 01010, 34552, 21120, 45210))

DF <- DF %>% mutate(Category = case_when(
substr(Code, 1, 2) %in% c(01, 02) ~ "Food",
substr(Code, 1, 2) %in% c(10, 20) ~ "Clothes",
substr(Code, 1, 2) %in% c(34, 45 ) ~ "Toiletries",
TRUE ~ "Unknown"
))
DF

Hi @Shin1,

You should treat the Code variable as a character rather than an integer/numeric. Leading zeroes will always be removed for numeric data by default. If you enclose the numbers in quotes, and treat them as strings, then you won't have this issue.

For example:

library(dplyr)
library(stringr)

DF <- data.frame(Code = c('10105', '01112', '25441',' 02422', 
                          '02552', '01010', '34552', '21120', '45210'))

DF <- DF %>% 
  mutate(
    Category = case_when(
      str_detect(Code, '^(01|02)') ~ "Food",
      str_detect(Code, '^(10|20)') ~ "Clothes",
      str_detect(Code, '^(34|45)') ~ "Toiletries",
      TRUE ~ "Unknown"
    )
  )

DF
#>     Code   Category
#> 1  10105    Clothes
#> 2  01112       Food
#> 3  25441    Unknown
#> 4  02422    Unknown
#> 5  02552       Food
#> 6  01010       Food
#> 7  34552 Toiletries
#> 8  21120    Unknown
#> 9  45210 Toiletries

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.