recoding multiple Likert variables to numeric values at once without mutate_at

orange20485 · July 14, 2020, 3:00am

I'm working on data from a survey with 50 questions and I need to make the responses numeric. So far I've been trying to use mutate_at but after about 6 hours of wrestling with it this is the only thing I've produced that won't make an error message:

data %>% 
mutate_at(vars(2:12),
function(x) recode(x,
"None'"=0, 
"A little of the time"=1,
"Sometime"=2,
"Most of the time"=3,
"All the time"=4))

The results of this are useless for my purposes because all of the "None" responses are displayed as NA and apparently none of the responses are actually being recoded as numbers, just as strings named "1" "2" etc. If I use summary() on any given variable it appears that none of these changes actually stick, since it produces something like

A little of the time 70
All the time 7
etc.

I am able to correctly recode a variable using this code:

data$variable <- as.numeric(recode(
  data$variable,
  "None'"=0, 
  "A little of the time"=1,
  "Sometime"=2,
  "Most of the time"=3,
  "All the time"=4
))

I have the patience to copy-paste/edit this 50 times but something tells me that's not how you're supposed to do this. I literally started using R yesterday so if I broke a rule or something I'm sorry, I'll delete this post.

EDIT: reproducible example

library("dplyr")
rawdata <- read.csv('filenamet.csv', header = TRUE, sep = ",", na.strings="")
data <- rawdata[-262,-c(1:62,75:83)]

#rename columns
names(data) <- gsub("\\.", "", names(data))
names(data)
data <- dplyr::rename(data,
    frustration = aFrustration,
    sad = bSad,
    guilt = cGuiltyselfblame,
    worry = dWorried,
    irritable = eIrritable,
    fear = fFear,
    angry = gAngry,
    lonely = hLonely,
    helpless = iHelpless,
    hopeless = jHopeless,
    anxious = kAnxious,
    depressed = lDepressed
  )

#recode responses 

data %>% 
mutate_at(vars(2:12),
  ~as.numeric(recode(.,
    "None"=0, 
    "A little of the time"=1,
    "Sometime"=2,
    "Most of the time"=3,
    "All the time"=4)))

#get sum of each row

data$totalscore <- rowSums(data[,c(2:12)],na.rm = TRUE)

joels · July 14, 2020, 3:03am

Three things for starters:

There's a typo in "None'". Note the extra single quote. It should be "None".
After recoding, run as.numeric to convert the digits from strings to numeric values.
You can use ~ to run the function directly, rather than wrap it inside function().

So the code would be:

data %>% 
mutate_at(vars(2:12), 
          ~as.numeric(recode(.,
                             "None"=0, 
                             "A little of the time"=1,
                             "Sometime"=2,
                             "Most of the time"=3,
                             "All the time"=4)))

orange20485 · July 14, 2020, 3:07am

I ran this and got

Error: Evaluation error: object 'x' not found.

joels · July 14, 2020, 3:09am

See my updated code. I forgot to change the x to .. For future reference, it's helpful if you provide a reproducible example so that we can run your code and test out solutions.

A couple of additional notes: (1) While vars(2:12) works in this case, it's brittle, because it will fail if the order of your columns changes. If there's some regularity to the column names, you can use other column selection methods that will be more robust. (2) mutate_at will continue to work, but the latest version of dplyr has new capabilities that allow you to use mutate whether you're working on one column or multiple columns. see the vignette for details.

orange20485 · July 14, 2020, 3:34am

I have updated my original question with the example. The code works fine now but when I use str() it shows that all of the variables are still character variables and not numeric.

joels · July 14, 2020, 3:47am

I'm not sure why that's not working with your data. Your example isn't reproducible as we don't have access to your data, so I'm not able to do any testing. For now, here's a simple example showing that you don't actually need the as.numeric to get numeric columns from recode:

library(tidyverse)

d = tibble(x1=rep(LETTERS[1:3], 2),
           x2=rep(LETTERS[1:3], 2))
d[3, 2] = NA

d
#> # A tibble: 6 x 2
#>   x1    x2   
#>   <chr> <chr>
#> 1 A     A    
#> 2 B     B    
#> 3 C     <NA> 
#> 4 A     A    
#> 5 B     B    
#> 6 C     C

d %>% 
  mutate_at(vars(starts_with("x")), ~recode(., A=1, B=2, C=3)) %>% 
  mutate(total = rowSums(., na.rm=TRUE))
#> # A tibble: 6 x 3
#>      x1    x2 total
#>   <dbl> <dbl> <dbl>
#> 1     1     1     2
#> 2     2     2     4
#> 3     3    NA     3
#> 4     1     1     2
#> 5     2     2     4
#> 6     3     3     6

^{Created on 2020-07-13 by the reprex package (v0.3.0)}

orange20485 · July 14, 2020, 3:53am

Sorry about wasting your time. I won't do this again.

joels · July 14, 2020, 4:26am

You didn't waste my time. The purpose of this site is to help people learn how to use R. You're doing pretty well for someone who started learning R two days ago! When you ask questions in the future, all we ask is that you help us help you by providing a reproducible example. Happy coding!

EconomiCurtis · July 15, 2020, 3:21pm

4 posts were split to a new topic: record Likert scale values

jkdby · November 10, 2020, 4:20pm

Hi There, I thought I might add to this chain, because my problem is similar.

I was successful at recoding my Likert variable from character to numeric, using a similar method as written above.

But after that I've been trying to use sapply in order to change all my variables without copy-pasting my command. I achieved this by creating a function myrecode (see below). It WORKS! But Im having a problem: how can I keep the information of the Day and Study Arm that are not in the recoded command in conjunction with my newly recoded variables?

#first I created the recode function I want to apply to my dataset

myrecode <- function(x){
recode(x, "Not at all"=1, "A little"=2, "Moderately"=3, "Quite a bit" =4, "Extremely" = 5)
}

#then I created an index for the 22 columns I will want to convert

ind <- my_data[,4:26]

#then I applied it to the variable columns

my_data_2 <-
 sapply(ind, myrecode)

MY PROBLEM NOW IS THAT I DON'T HAVE THE RECODED VALUES ALONG WITH MY STUDY ARM AND DAY VARIABLE Hope this makes sense. Thanks so much in advance.

here's a portion of my data frame which is called my_data to make this reproducible

structure(list(Day = c("0", "1", "2", "3", "4", "5", "6", "7", 
"9", "10"), `Study Arm` = c("B", "B", "B", "B", "B", "B", "B", 
"B", "B", "B"), `Low energy` = c("Not at all", "Not at all", 
"Not at all", "Not at all", "Not at all", "Not at all", "Not at all", 
"Not at all", "Not at all", "Not at all"), Yawning = c("Not at all", 
"Not at all", "Not at all", "Not at all", "Not at all", "Not at all", 
"A little", "Not at all", "Not at all", "Not at all"), Alert = c("Extremely", 
"Extremely", "Extremely", "Quite a bit", "Extremely", "Extremely", 
"Quite a bit", "Extremely", "Extremely", "Extremely"), Tired = c("Not at all", 
"Not at all", "Not at all", "Not at all", "Not at all", "A little", 
"A little", "Not at all", "Not at all", "Not at all")), row.names = c(NA, 
-10L), class = c("tbl_df", "tbl", "data.frame"))

joels · November 10, 2020, 5:57pm

You would need to index the columns on both sides of the assignment. For example:

my_data_2 = my_data
my_data_2[ , 3:6] <- sapply(3:6, function(i) myrecode(my_data_2[[i]]))
my_data_2

   Day   `Study Arm` `Low energy` Yawning Alert Tired
   <chr> <chr>              <dbl>   <dbl> <dbl> <dbl>
 1 0     B                      1       1     5     1
 2 1     B                      1       1     5     1
 3 2     B                      1       1     5     1
 4 3     B                      1       1     4     1
 5 4     B                      1       1     5     1
 6 5     B                      1       1     5     2
 7 6     B                      1       2     4     2
 8 7     B                      1       1     5     1
 9 9     B                      1       1     5     1
10 10    B                      1       1     5     1

You can do this in the tidyverse with:

library(tidyverse)

my_data_2 = my_data %>% mutate(across(3:6,  myrecode))

Selecting columns by index can be risky if the columns can move around. Another option is to select columns based on their names or characteristics. For example, here we make a slight variation of myrecode inorder to keep the recoding vector handy. We then select the desired columns to mutate when they contain at least one valid answer listed in recode_vec:

library(tidyverse)

recode_vec=c("Not at all"=1, "A little"=2, "Moderately"=3, 
             "Quite a bit" =4, "Extremely" = 5)

myrecode <- function(x){
  recode(x, !!!recode_vec)
}

# Select columns to mutate based on containing unrecoded versions 
# of the answers
my_data_2 = my_data %>% 
  mutate(across(where(~any(. %in% names(recode_vec))), myrecode))

jkdby · November 11, 2020, 3:41pm

Hi Joel, thanks so much for suggesting the tidyverse alternative, that's a great idea.

A few questions/issues though:

(1) when I run your code I get the following error:

Error in across(where(~any(. %in% names(recode_vec))), myrecode_r) : 
  could not find function "across"

(2) There were many layers in the mutate command. I've used mutate in tidyverse, but never with that many other layers such as across, where, any, or %in%. If you could explain this line of code that would be awesome.

(3) Can you comment on the following command, Whats the purpose of three exclamation marks?

recode(x, !!!recode_vec)

Thank you!

nirgrahamuk · November 11, 2020, 4:46pm

Across was introduced in dplyr 1.0.0 , you.may need to upgrade

joels · November 11, 2020, 9:12pm

As Nir said, you'll need to update to dplyr 1.0.0 or later for across.

The across(where( is a way to refer to multiple columns within the mutate function, and I agree it can indeed result in annoyingly deep nesting of functions. On the other hand, any(. %in% names(recode_vec)) is a relatively standard R way to check whether at least one element of a vector has any values that match the values in another vector (in this case we only want to mutate columns that contain answers to the survey questions). The only dplyr-specific element is the use of . as a "pronoun" to refer back to each of the columns being mutated.

The recode function has the ... argument, which allows us to provide any number of value-replacement pairs as arguments to the function. But recode_vec is a single vector that contains a bunch of value-replacement pairs. The !!! ("bang-bang-bang") takes that single vector and "splices" each element of that vector into recode as a separate argument. The Advanced R book has some discussion of this operation, which is called "unquote-splice" (although, confusingly, in this particular case we didn't need to quote recode_vec, so we're really just splicing rather than unquote-splicing).