Replace numerical codes with strings from codebook

Hello all,

I've recently begun analyzing a data set and need to tabulate how many occurences of prison misconducts there are in it, by administration, by individual, and so on. The misconducts are coded by number (e.g. 1= Assault; 5= robbery; 34= possession of contraband; etc.) in the data set, and I have a code book in excel that lists out all 51 types of misconduct by their numerical code and their text-based description. I'm hoping there is a way to import this code book into RStudio and have it automatically match the numbers up and put label the rows in my table with them.

Basically, I have written the code below to give me a frequency count table of the number of misconducts, but I also want to have their text string labels in there as another column too. Only, I don't want to have to re-write them all myself. Anyone have any suggestions?

XXXX_MisconductTable=MisconductDataset%>%
  filter(offeringadmin==-1)%>%
  count(Misconducts)

Misconducts n

1 0 183
2 1 127
3 3 1
4 4 2
5 8 1
6 9 5

There are several ways to do what is, in effect, a join (you want to match up the misconduct code numbers with their equivalents in the codebook, and get the string value).

I'm just showing two below (one base, and one with dplyr), but I'll post a link to a StackOverflow thread with lots of options at the bottom.

To can replace the original variable, you could assigning over the misconduct variable (or whatever the equivalent is in your dataset), but I've turned it into a new variable for illustrative purposes in the reprex below.

incidents <- tibble::tribble(
   ~person, ~misconduct,
  "Howard",          0L,
   "Robin",          0L,
    "Fred",          2L,
    "Gary",          2L,
    "John",          3L,
      "JD",          4L,
   "Benjy",          6L
  )

codebook <- tibble::tribble(
              ~code_num, ~code_string,
                     0L,      "pride",
                     1L,       "envy",
                     2L,   "gluttony",
                     3L,       "lust",
                     4L,      "anger",
                     5L,      "greed",
                     6L,      "sloth"
              )

# using base R `match()` function
incidents$code_string <- codebook[match(incidents$misconduct, codebook$code_num), ]$code_string

incidents
#> # A tibble: 7 x 3
#>   person misconduct code_string
#>   <chr>       <int> <chr>      
#> 1 Howard          0 pride      
#> 2 Robin           0 pride      
#> 3 Fred            2 gluttony   
#> 4 Gary            2 gluttony   
#> 5 John            3 lust       
#> 6 JD              4 anger      
#> 7 Benjy           6 sloth

# remove the added variable so I can show the second method
incidents$code_string <- NULL

# with dplyr
suppressPackageStartupMessages(library(dplyr))
incidents %>%
  left_join(codebook, by = c("misconduct" = "code_num"))
#> # A tibble: 7 x 3
#>   person misconduct code_string
#>   <chr>       <int> <chr>      
#> 1 Howard          0 pride      
#> 2 Robin           0 pride      
#> 3 Fred            2 gluttony   
#> 4 Gary            2 gluttony   
#> 5 John            3 lust       
#> 6 JD              4 anger      
#> 7 Benjy           6 sloth

Created on 2019-12-05 by the reprex package (v0.3.0.9001)

As for reading in the data, if it's in an Excel spreadsheet, you might want to look at the readxl package.

2 Likes

Both worked perfectly! And here I thought I needed a significantly more complicated solution...thanks so much!

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.