Variable recoding with if applied

Hi,
I'm trying to convert my SPSS statistics recoding syntax files to R and I have done following is SPSS:

Firs I extract first two characters from variable called VIN using this code:

COMPUTE VINChar = SUBSTR(VIN,4,2) .

...then, having VINChar (just created) and ModelLong (already in my data) variables:

DO IF (ModelLong = 'aaa'.
IF VINChar='D1' ModelLong='aaa (2012-2015)'.
IF VINChar='D2' ModelLong='aaa (2012-2015)'.
IF VINChar='D3' ModelLong='aaa (2012-2015)'.
IF VINChar='H2' ModelLong='aaa (2016 ~ )'.
IF VINChar='H3' ModelLong='aaa (2016 ~ )'.
IF VINChar='d1' ModelLong='aaa (2012-2015)'.
IF VINChar='d2' ModelLong='aaa (2012-2015)'.
IF VINChar='h2' ModelLong='aaa (2016 ~ )'.
IF VINChar='h3' ModelLong='aaa (2016 ~ )'.
END IF.

Please note that characters above are case sensitive so I have to repeat the same operation twice.

After that I create new integer variable ModelCat by following recoding:

if ModelLong='ccc (2014 ~ )' ModelCat=1.
if ModelLong='ccc (2013~ )' ModelCat=1.
if ModelLong='aaa (2016 ~ )' ModelCat=5.
if ModelLong='aaa (2012-2015)' ModelCat=6.
if ModelLong='bbb Hybrid' ModelCat=21.
if ModelLong='bbb Electric' ModelCat=22.
if ModelLong='bbb Plug-in Hybrid' ModelCat=23.
recode ModelCat (SYSMIS=99).

then I add labels to each value:

value labels ModelCat
1 'ccc'
5 'aaa'
6 'aaa 2016'
21 'bbb Hyb'
22 'bbb Elec'
23 'bbb Plug-in'
99 'Other'.

I realise that variables with labels is not a standard thing in R so I could accept creating new character variable like that:

if ModelLong='ccc (2014 ~ )' ModelCat='ccc'.
if ModelLong='ccc (2013~ )' ModelCat='ccc'.
if ModelLong='aaa (2016 ~ )' ModelCat='aaa'.
if ModelLong='aaa (2012-2015)' ModelCat='aaa 2016'.
if ModelLong='bbb Hybrid' ModelCat='bbb Hyb'.
if ModelLong='bbb Electric' ModelCat='bbb Elec'.
if ModelLong='bbb Plug-in Hybrid' ModelCat='bbb Plug-in'.
recode ModelCat (SYSMIS='Other').

What would be R recoding equivalent assuming that the data frame is simply df1 so variables above are df1$VIN (string), df1$VINChar (string), df1$ModelLong (string) and df1$ModelCat (string or integer)?

Thank you.

I'm not familiar enough with spss to immediately recognize the data structures you're working with. I'd encourage you to offer a reprex of the data you're working with. That will make offering advice much easier.

By the looks of it, I suspect you can handle a question like this pretty easily with dplyr's mutate and case_when functions.

For example


library(dplyr)
df <- tibble(
  ModelLong = c(1,1,2,3,5)
)

df <- df %>% 
  mutate(
    ModelCat = case_when(
      ModelLong == 1 ~ "a",
      ModelLong == 2 ~ "b",
      ModelLong > 2 & ModelLong < 5 ~ "c",
      TRUE ~ "other"
    )
  )
df
#> # A tibble: 5 x 2
#>   ModelLong ModelCat
#>       <dbl> <chr>   
#> 1         1 a       
#> 2         1 a       
#> 3         2 b       
#> 4         3 c       
#> 5         5 other

Created on 2019-02-19 by the reprex package (v0.2.1)

Details on dplyr https://cran.r-project.org/web/packages/dplyr/vignettes/dplyr.html
Details on case_when https://dplyr.tidyverse.org/reference/case_when.html


And I think you'll want to work with R's factor variables. The tidyverse's forcats package is a pretty handy tool for dealing with factors.

In which case,fct_recode is a handy tool for that.

library(dplyr)

library(forcats)
df <- tibble(
  ModelLong = c(1,1,2,3,5)
)


df <- df %>% 
  mutate(
    ModelCat = as.factor(ModelLong)
  )

df %>% 
  mutate(
    ModelCat_fct = fct_recode(ModelCat,
      "a" = "1",
      "b" = "2",
      "d" = "3"
    )
  )
#> # A tibble: 5 x 3
#>   ModelLong ModelCat ModelCat_fct
#>       <dbl> <fct>    <fct>       
#> 1         1 1        a           
#> 2         1 1        a           
#> 3         2 2        b           
#> 4         3 3        d           
#> 5         5 5        5

Created on 2019-02-19 by the reprex package (v0.2.1)

Thank you for your help. Unfortunately a couple of initial, important steps are still missing:
I need first two characters of VIN to be saved as separate variable used later (VINChar).
Then I need to use this variable for recoding but "IF" statement is important as I do recoding only if ModelLong = 'aaa'.
I can see that my ModelLong should be then changed into factor (it is currently a string variable) to be able to recode values from 2 to 5 in your example into "other". Correct?

1 Like

That is my data:

data.frame(stringsAsFactors=FALSE,
             URN = c("9x013122478", "9x012544645", "9x013122457",
                     "9x015094333", "9x014145611", "9x014963276",
                     "9x015104994", "9x014635481", "9x012544643", "9x013918584",
                     "9x013122284", "9x012544539", "9x013680544", "9x013918572",
                     "9x013918545", "9x014032615", "9x014635460", "9x015094372",
                     "9x013680557", "9x012654879", "9x013703410", "9x013122485",
                     "9x014853449", "9x013122486", "9x014032625", "9x013122697",
                     "9x013122342", "9x013122510", "9x013233865", "9x015094394",
                     "9x013918536", "9x013122546", "9x013122646", "9x013359154",
                     "9x013122574", "9x013122548", "9x015104981", "9x013918605",
                     "9x013122618", "9x013918581", "9x013122341", "9x013122547",
                     "9x013122698", "9x012545899", "9x014145624", "9x013233864",
                     "9x013703438", "9x013918532", "9x014145627", "9x015094359"),
             VIN = c("D1H3511AJJ022400", "D2H3511AJJ032049",
                     "D3H3511AJJ044023", "H3511AJJ056083", "H3511AJJ068381",
                     "H2511AJJ068650", "d43511AJJ068849", "d53813AJJ031649",
                     "h3813GJJ033097", "h2A751AAJZ416909", "xHA751AAJZ436008",
                     "xHA751AAJZ442808", "xHA751AAJZ454024", "xHA751AAJZ454025",
                     "xHA751AAJZ454130", "xHA751AAJZ454189", "xHA751AAJZ454540",
                     "xHA751AAJZ454684", "xH3511AJJ005753", "xH3813AJJ002980",
                     "xH3813AJJ003133", "d53813AJJ003225", "h23813AJJ003485",
                     "H3813AJJ003623", "xH3813AJJ007170", "xH3813AJJ010053",
                     "xH3813AJJ018997", "xD351CAHJ345582", "xH3511AJJ068899", "xH3511AJJ068959",
                     "xH3511AJJ069029", "xH3513AJJ066203", "D1H3513AJJ066537",
                     "xH3513AJJ068491", "xH3513GJJ071991", "xH3517AJJ050025",
                     "xH3811AJJ015160", "xH3811AJJ021897", "xH3811AJJ022000",
                     "xH3811AJJ029163", "xH3811AJJ030519", "xH3811AJJ030967",
                     "xH3811AJJ031162", "xH3813AJJ028961", "xH3813AJJ028962",
                     "xH3813AJJ029534", "xH3813AJJ031259", "xH3813AJJ031606", "xH3511AJJ021465",
                     "xH3511AJJ022010"),
   InterviewDate = c("2018-06-04 21:25:00", "2017-03-20 20:54:00",
                     "2018-06-05 10:23:00", "2018-12-23 12:01:17",
                     "2018-10-15 19:02:13", "2018-12-10 09:02:05", "2018-12-22 11:01:14",
                     "2018-11-16 20:01:54", "2017-02-02 10:51:00",
                     "2018-09-25 12:02:14", "2017-01-15 14:32:00", "2017-01-28 00:11:00",
                     "2018-05-21 09:50:00", "2018-09-26 11:02:27", "2018-05-17 19:40:00",
                     "2018-10-02 11:02:39", "2018-11-24 13:01:10",
                     "2018-12-19 15:01:21", "2018-05-21 08:30:00", "2017-01-09 11:02:00",
                     "2018-05-20 09:19:00", "2018-06-04 19:45:00",
                     "2018-12-01 12:01:20", "2018-06-04 19:14:00", "2018-10-08 08:02:09",
                     "2018-05-28 14:14:00", "2017-01-12 09:18:00", "2018-06-04 16:42:00",
                     "2018-05-26 15:10:00", "2018-12-20 22:01:27",
                     "2018-05-18 08:59:00", "2018-06-04 14:57:00", "2018-05-31 21:02:00",
                     "2018-05-25 16:05:00", "2018-06-04 12:27:00",
                     "2018-06-04 14:28:00", "2018-12-29 21:01:06", "2018-09-25 14:01:32",
                     "2018-06-03 22:41:00", "2018-10-01 09:01:57", "2017-01-18 08:24:00",
                     "2018-06-04 14:49:00", "2018-05-28 13:46:00",
                     "2017-01-01 10:33:00", "2018-10-15 13:04:32", "2018-05-26 16:21:00",
                     "2018-05-18 19:16:00", "2018-05-18 10:54:00",
                     "2018-10-15 13:04:32", "2018-12-17 11:03:42"),
              A1 = c(80, 50, 90, 80, 70, 90, 90, 90, 100, 90, 90, 90, 100, 100,
                     100, 80, 100, 80, 100, 80, 100, 100, 80, 90, 80, 90, 80,
                     100, 90, 100, 100, 100, 90, 90, 100, 100, 100, 100, 100, 100,
                     100, 80, 90, 80, 80, 100, 90, 100, 90, 100),
              A2 = c(80, 40, 100, 80, 70, 90, 90, 90, 100, 90, 90, 90, 100,
                     100, 100, 80, 100, 80, 100, 80, 100, 100, 80, 90, 80, 90,
                     90, 100, 90, 100, 100, 100, 90, 90, 100, 100, 100, 100, 100,
                     100, 100, 80, 90, 90, 80, 100, 90, 100, 90, 100),
              B1 = c(90, 60, 90, 80, 80, 90, 100, 80, 80, 90, 80, 90, 100, 100,
                     100, 90, 80, 80, 100, 80, 100, 100, 80, 80, 80, 80, 80,
                     100, 90, 90, 100, 100, 90, 90, 90, 100, 100, 100, 100, 100,
                     100, 80, 80, 90, 100, 90, 80, 100, 90, 100),
              B2 = c(NA, 60, NA, 80, 80, NA, NA, 80, 100, NA, 80, NA, NA, NA,
                     NA, NA, 80, 80, NA, 90, NA, NA, 80, 80, 80, 80, 90, NA,
                     NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 80, 70,
                     NA, NA, NA, 80, NA, NA, NA),
              B3 = c(NA, 60, NA, 80, 80, NA, NA, 80, 100, NA, 80, NA, NA, NA,
                     NA, NA, 90, 80, NA, 80, NA, NA, 80, 80, 80, 80, 90, NA,
                     NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 80, 80,
                     NA, NA, NA, 70, NA, NA, NA),
              B4 = c(NA, 50, NA, 80, 80, NA, NA, 80, 100, NA, 80, NA, NA, NA,
                     NA, NA, 80, 80, NA, 80, NA, NA, 80, 90, 80, 80, 90, NA,
                     NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 80, 80,
                     NA, NA, NA, 70, NA, NA, NA),
              B5 = c(NA, 50, NA, NA, 80, NA, NA, 90, 100, NA, 80, NA, NA, NA,
                     NA, NA, 90, 80, NA, 80, NA, NA, 80, 90, 80, 90, 80, NA,
                     NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 80, 70,
                     NA, NA, NA, 70, NA, NA, NA),
              B6 = c(NA, NA, NA, 80, 80, NA, NA, 100, 100, NA, 80, NA, NA, NA,
                     NA, NA, 80, 80, NA, 90, NA, NA, 80, 80, 80, 60, 70, NA,
                     NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 80, 10,
                     NA, NA, NA, 70, NA, NA, NA),
            OS21 = c(NA, 50, NA, 80, 80, NA, NA, 90, 100, NA, 80, NA, NA, NA,
                     NA, NA, 100, 80, NA, 90, NA, NA, 80, 90, 80, 80, 80, NA,
                     NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 80, 80,
                     NA, NA, NA, 80, NA, NA, NA),
              C1 = c(80, 50, 90, 90, 80, 100, 90, 90, 100, 90, 90, 90, 100,
                     100, 100, 90, 90, 90, 100, 90, 100, 100, 80, 90, 80, 90,
                     90, 100, 90, 90, 100, 100, 100, 90, 100, 100, 100, 100, 100,
                     100, 100, 80, 100, 80, 80, 100, 90, 100, 100, 100),
              C2 = c(90, 50, 100, 90, 70, 100, 100, 100, 100, 90, 90, 60, 100,
                     100, 100, 100, 100, 80, 100, 80, 100, 100, 80, 90, 80, 90,
                     90, 100, 80, 100, 100, 100, 100, 100, 100, 100, 100, 100,
                     100, 100, 100, 80, 100, 100, 80, 100, 100, 100, 100, 100),
              C3 = c(90, 60, 100, 90, 80, 100, 90, 90, 100, 90, 90, 80, 100,
                     100, 100, 90, 100, 80, 100, 80, 100, 100, 80, 90, 80, 90,
                     90, 100, 90, 90, 100, 100, 100, 90, 100, 100, 100, 100, 100,
                     100, 100, 80, 90, 100, 80, 100, 80, 100, 100, 100),
              C4 = c(90, 60, NA, NA, 90, NA, NA, NA, NA, NA, NA, NA, NA, NA,
                     NA, NA, NA, NA, NA, NA, NA, NA, 80, NA, 80, NA, NA, NA,
                     NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 80, NA,
                     100, 90, NA, NA, NA, NA, NA),
              C5 = c(90, 60, NA, NA, 90, NA, NA, NA, NA, NA, NA, NA, NA, NA,
                     NA, NA, NA, NA, NA, NA, NA, NA, 80, NA, 80, NA, NA, NA,
                     NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 80, NA,
                     100, 80, NA, NA, NA, NA, NA),
              C6 = c(90, 60, NA, NA, 80, NA, NA, NA, NA, NA, NA, NA, NA, NA,
                     NA, NA, NA, NA, NA, NA, NA, NA, 80, NA, 80, NA, NA, NA,
                     NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 80, NA,
                     100, 90, NA, NA, NA, NA, NA),
              C7 = c(90, 60, NA, NA, 90, NA, NA, NA, NA, NA, NA, NA, NA, NA,
                     NA, NA, NA, NA, NA, NA, NA, NA, 80, NA, 80, NA, NA, NA,
                     NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 80, NA,
                     100, 100, NA, NA, NA, NA, NA),
              C8 = c(90, 60, NA, NA, 90, NA, NA, NA, NA, NA, NA, NA, NA, NA,
                     NA, NA, NA, NA, NA, NA, NA, NA, 80, NA, 80, NA, NA, NA,
                     NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 80, NA,
                     100, 90, NA, NA, NA, NA, NA),
              D1 = c(80, 990, 90, -10, 100, -10, -10, -10, 70, -10, -10, 990,
                     100, 100, 100, 80, -10, 80, 100, 80, 100, 100, 990, 90,
                     60, 90, 100, 100, 100, -10, 100, 100, 90, 80, 90, 100, 100,
                     100, 100, -10, 100, 990, 100, 80, 90, 100, 90, 100, 90, 100),
              E1 = c(90, 70, 90, 90, 80, 100, 100, 90, 100, 100, 90, 90, 100,
                     100, 100, 80, 90, 80, 100, 80, 100, 100, 80, 90, 80, 90,
                     90, 100, 80, 100, 100, 100, 90, 90, 100, 100, 100, 100, 100,
                     100, 100, 80, 80, 90, 80, 100, 80, 100, 100, 100),
              E2 = c(90, 80, 50, 90, 90, 100, 100, 90, 100, 90, 80, 70, 100,
                     100, 100, 80, 100, 80, 100, 80, 100, 100, 80, 90, 80, 90,
                     100, 100, 90, 100, 100, 100, 90, 90, 100, 100, 80, 100, 100,
                     100, 100, 80, 100, 100, 90, 100, 90, 100, 100, 100),
              E3 = c(NA, 70, NA, NA, 80, NA, NA, NA, NA, NA, NA, NA, NA, NA,
                     NA, 80, NA, 80, NA, 80, NA, NA, 70, NA, 80, NA, NA, NA,
                     90, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 80, 100,
                     NA, 90, NA, 90, NA, NA, NA),
            OS17 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
                     NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
                     NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
                     NA, NA, NA, NA, NA, NA, NA),
              F1 = c(90, 60, 90, 90, 80, 100, 100, 90, 100, 90, 90, 90, 100,
                     100, 100, 80, 100, 80, 100, 70, 100, 100, 80, 100, 80, 90,
                     80, 100, 90, 100, 100, 100, 90, 90, 90, 100, 100, 100, 100,
                     100, 100, 80, 100, 100, 90, 100, 90, 100, 100, 100),
              F2 = c(NA, 80, NA, NA, 100, NA, NA, NA, NA, NA, NA, NA, NA, NA,
                     NA, 90, NA, 80, NA, 80, NA, NA, 80, NA, 80, NA, 100, NA,
                     NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 80, NA,
                     NA, NA, NA, NA, NA, NA, NA),
              F3 = c(NA, 80, NA, NA, 80, NA, NA, NA, NA, NA, NA, NA, NA, NA,
                     NA, 90, NA, 80, NA, 80, NA, NA, 80, NA, 80, NA, 100, NA,
                     NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 80, NA,
                     NA, NA, NA, NA, NA, NA, NA),
              F4 = c(NA, 60, NA, NA, 100, NA, NA, NA, NA, NA, NA, NA, NA, NA,
                     NA, 80, NA, 80, NA, 80, NA, NA, 80, NA, 80, NA, 100, NA,
                     NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 80, NA,
                     NA, NA, NA, NA, NA, NA, NA),
              F5 = c(NA, 60, NA, NA, 60, NA, NA, NA, NA, NA, NA, NA, NA, NA,
                     NA, 80, NA, 70, NA, 80, NA, NA, 80, NA, 80, NA, 90, NA,
                     NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 80, NA,
                     NA, NA, NA, NA, NA, NA, NA),
              F6 = c(NA, 70, NA, NA, 60, NA, NA, NA, NA, NA, NA, NA, NA, NA,
                     NA, 70, NA, 70, NA, 10, NA, NA, 50, NA, 80, NA, 90, NA,
                     NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 80, NA,
                     NA, NA, NA, NA, NA, NA, NA),
              F7 = c(NA, 70, NA, NA, 60, NA, NA, NA, NA, NA, NA, NA, NA, NA,
                     NA, 60, NA, 60, NA, 60, NA, NA, 80, NA, 80, NA, 90, NA,
                     NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 80, NA,
                     NA, NA, NA, NA, NA, NA, NA),
              G1 = c(0, 60, 90, 0, 90, 100, 100, 90, 100, 0, 0, 90, 100, 100,
                     100, 0, 0, 0, 100, 0, 100, 0, 80, 0, 80, 90, 80, 100, 60,
                     0, 100, 100, 90, 70, 100, 100, 100, 100, 0, 100, 100, 0,
                     100, 0, 0, 100, 90, 100, 0, 100),
       ModelLong = c("aaa", "aaa", "aaa", "aaa", "aaa", "aaa", "aaa", "aaa",
                     "aaa", "aaa", "aaa", "aaa", "aaa", "aaa", "aaa (2016 ~ )",
                     "aaa (2016 ~ )", "aaa (2014 ~ )", "aaa (2014 ~ )",
                     "aaa (2012-2015)", "bbb", "bbb", "bbb", "bbb", "bbb", "bbb", "bbb",
                     "bbb", "ccc", "ccc (2014 ~ )", "ccc (2014 ~ )",
                     "ccc (2013~ )", "bbb Hybrid", "bbb Hybrid", "bbb Electric",
                     "bbb Electric", "bbb Electric", "bbb Plug-in Hybrid",
                     "bbb Plug-in Hybrid", "ccc (2016 ~ )", "ccc (2016 ~ )", "ccc (2016 ~ )",
                     "ccc (2016 ~ )", "ccc (2016 ~ )", "ccc (2016 ~ )",
                     "ccc (2016 ~ )", "ccc (2016 ~ )", "ccc (2016 ~ )", "ccc (2016 ~ )", "eee",
                     "eee"),
            A2TB = c(0, 0, 100, 0, 0, 100, 100, 100, 100, 100, 100, 100, 100,
                     100, 100, 0, 100, 0, 100, 0, 100, 100, 0, 100, 0, 100,
                     100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100,
                     100, 100, 100, 0, 100, 100, 0, 100, 100, 100, 100, 100)
)

I think that with this example you can figure out the rest

library(dplyr)
library(stringr)
df1 %>% 
    mutate(VINChar = toupper(str_trunc(VIN, 2, side = "right", ellipsis = "")),
           ModelLong = case_when(
               ModelLong =='aaa' & VINChar == 'D1' ~ "aaa (2012-2015)",
               ModelLong =='aaa' & VINChar == 'D2' ~ "aaa (2012-2015)",
               ModelLong =='aaa' & VINChar == 'D3' ~ "aaa (2012-2015)",
               ModelLong =='aaa' & VINChar == 'H2' ~ "aaa (2016 ~ )",
               ModelLong =='aaa' & VINChar == 'H3' ~ "aaa (2016 ~ )",
               TRUE ~ "other"
           )
    ) %>% 
    select(ModelLong, VINChar) %>% 
    head()
#>         ModelLong VINChar
#> 1 aaa (2012-2015)      D1
#> 2 aaa (2012-2015)      D2
#> 3 aaa (2012-2015)      D3
#> 4   aaa (2016 ~ )      H3
#> 5   aaa (2016 ~ )      H3
#> 6   aaa (2016 ~ )      H2

Created on 2019-02-20 by the reprex package (v0.2.1)

2 Likes

Just to support Andres answer, I'd check out the stringr package (and a vingette on regular expressions). It's got a bunch of helper functions to help you detect the presence of character strings, sub-strings and prefixes, which you can use inside your mutate (create new columns) function.

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.