Dichotomising an Ordinal Categorical Variable Into 1-2 (not 0-1)

Dear All,

I wondered if someone could help regarding a set of commands for re-coding an ordinal categorical variable into a dichotomous variable, where the binary values are '1' and '2' rather than '0' and '1'.

Suppose I had a data set called 'trial' and within the data set there were five categorical variables. Call them: 'L', 'M', 'N', 'O', 'P'.

Each of these five variables, has six categories {0,1,2,3,4,5} representing health states (zero meaning no health issues - five representing poor health).

I'd like to take variables (L, M, N, O, P) within the data set 'trial', and dichotomise them so that 1 = {0,1,2}, and 2 = {3,4,5}.

Any help would be much appreciated. The 'sjmisc' package has several dichotomosing functions for splitting into groups; but again this is only zeros/ones.

https://cran.r-project.org/web/packages/sjmisc/vignettes/recodingvariables.html

All guidance would be appreciated,

Best,

Andrew

The easiest way to recode is to "create" a new column with mutate to your dataframe and then just simply perform case_when to tell it which conditions for x should change into what for y. Let me know if this solves your problem.

library(tidyverse)

df <- c(0,1,2,3,4,5) %>% as.data.frame()

names(df) <- "x"


df %>%
  mutate(y = case_when(
    x <= 2 ~ 1,
    x >= 3 ~ 2))
#>   x y
#> 1 0 1
#> 2 1 1
#> 3 2 1
#> 4 3 2
#> 5 4 2
#> 6 5 2

Created on 2021-05-12 by the reprex package (v2.0.0)

Hi GreyMerchant,

I tried your commands. However, I think I actually mis-expressed the question beforehand. Apologies.

I have re-expressed the question. I wondered if similar would work under such instances?

Best,

Andrew

Here are a couple (olde school) ways, so pick your favorite.

set.seed(22)
L <- sample(0:5, 5)
# [1] 5 0 3 1 2

#### factor() lets you map multiple values to the same category
factor(
  L,
  levels = c(0, 1, 2, 3, 4, 5), # Use `0:5`, this is just for explanation
  labels = c(1, 1, 1, 2, 2, 2)
)
# [1] 2 1 2 1 1
# Levels: 1 2

#### ifelse() is fine for splitting values into 2 groups
ifelse(L < 3, 1, 2)
# [1] 2 1 2 1 1

#### Using the cut() function gives you a factor
#### But it's a little overkill for making just 2 groups
cut(L, breaks = c(-Inf, 2, 5), labels = 1:2)
# [1] 2 1 2 1 1
# Levels: 1 2

#### Use lapply() or your preferred dplyr function to replace multiple columns
trial <- data.frame(
  L = L,
  M = sample(0:5, 5),
  N = sample(0:5, 5),
  O = sample(0:5, 5),
  P = sample(0:5, 5)
)
trial
#   L M N O P
# 1 5 3 2 5 1
# 2 0 5 5 3 0
# 3 3 2 3 0 5
# 4 1 0 1 4 4
# 5 2 4 4 1 3

trial[] <- lapply(trial, factor, levels = 0:5, labels = c(1, 1, 1, 2, 2, 2))
trial
#   L M N O P
# 1 2 2 1 2 1
# 2 1 2 2 2 1
# 3 2 1 2 1 2
# 4 1 1 1 2 2
# 5 1 2 2 1 2

As a note, a factor vector is an integer vector with fancy labels and no order. An ordered vector is a factor vector with order, which means it's a fancy-looking integer. I like using them for all categorical data to:

  1. Give them nice word labels to improve the code's readability (status == "healthy" is more intuitive than status == 1).
  2. Remind myself to never use them in numeric operations (R raised a warning if I try).

Choosing appropriate classes for variables, especially ones that restrict what you can do, can be a very useful thing. If you stick to factors, you're guaranteed to never silently end up with -1 for health status.

1 Like

Thank you Nathan, really appreciate your help. 'lapply' turned out to do the trick :slight_smile:

What if I had a more tricky question where I wanted different labels attached to different variables?

For example, suppose L = levels 0:5, labels c(1,1,1,2,2,2), yet M = levels 0:5, labels (1,1,2,2,2,2). Is that a possibility?

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.