recoding large number of values

I am trying to recode a large number of values in a set of columns in a data frame (df). Columns abc1:abc4 has values from c(1:175, 501, 502, 503, 504, 505, 506, 507, 508, 509, 510). So I want to recode the values as
1:175 = 501, 501:508 = 91, 509 = 92 and 510 - 93 across the four columns (abc1:abc4) at once. Example

df <- data.frame(
  id_number = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
  age = c(10, 11, 12, 13, 14, 15, 16, 17, 18, 19),
  abc1 = c(14, 158, 170, 504, 505, 506, 507, 508, 509, 510),
  abc2 = c(501, 502, 501, 501, 45, 501, 502, 59, 501, 100),
  abc3 = c(89, 506, 12, 501, 510, 13, 510, 501, 11, 501),
  abc4 = c(32, 505, 35, 501, 501, 56, 501, 12, 501, 501)
)

df

What would be the best way to do it (and across all the columns at once)? I can do the recoding one by one but how do you handle large number of values at once? Thanks

Welcome to the forum and thanks for such a nice clean example.

I think that this will do what you want.

I renamed your dataframe . df is a function in base R.


library(dplyr)

dat1 <- data.frame(
  id_number = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
  age = c(10, 11, 12, 13, 14, 15, 16, 17, 18, 19),
  abc1 = c(14, 158, 170, 504, 505, 506, 507, 508, 509, 510),
  abc2 = c(501, 502, 501, 501, 45, 501, 502, 59, 501, 100),
  abc3 = c(89, 506, 12, 501, 510, 13, 510, 501, 11, 501),
  abc4 = c(32, 505, 35, 501, 501, 56, 501, 12, 501, 501)
)


# create recoding function using "case_when" from dplyr

rc <-   function(xx){
  case_when(xx >= 1 & xx <= 175 ~ 501,
            xx >= 501 & xx <= 508 ~ 91,
            xx == 509 ~ 91,
            xx == 510 ~ 93)
}

# Quick chec
rc(dat1$abc4) # function seems to work


apply(dat1[, 3:6], 2, rc)
1 Like

I am practicing mutate(across(...)) so this was a good quick exercise for me.

library(tidyverse)

dat1 <- data.frame(
  id_number = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
  age = c(10, 11, 12, 13, 14, 15, 16, 17, 18, 19),
  abc1 = c(14, 158, 170, 504, 505, 506, 507, 508, 509, 510),
  abc2 = c(501, 502, 501, 501, 45, 501, 502, 59, 501, 100),
  abc3 = c(89, 506, 12, 501, 510, 13, 510, 501, 11, 501),
  abc4 = c(32, 505, 35, 501, 501, 56, 501, 12, 501, 501)
)

dat1 %>% mutate(across(abc1:abc4, ~ case_when(. %in% 1:175 ~ 501,
                                             . %in% 501:508 ~ 91, 
                                             . == 509 ~ 92, 
                                             . == 510 ~ 93)))
#>    id_number age abc1 abc2 abc3 abc4
#> 1          1  10  501   91  501  501
#> 2          2  11  501   91   91   91
#> 3          3  12  501   91  501  501
#> 4          4  13   91   91   91   91
#> 5          5  14   91  501   93   91
#> 6          6  15   91   91  501  501
#> 7          7  16   91   91   93   91
#> 8          8  17   91  501   91  501
#> 9          9  18   92   91  501   91
#> 10        10  19   93  501   91   91

Created on 2021-08-29 by the reprex package (v2.0.1)

1 Like

Thank you @jrkrideau ! This is very helpful.

This is exactly what I wanted @EconProf . changing the values across different columns picking their filename. I tried it and worked perfectly

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.