for loop, join variables names, and case_when

Greetings,

I am trying to use a for loop to create a rescored version of several variables using case_when. However, after several hours of trying and searching, I can't figure out what I'm doing wrong. The code runs, but doesn’t create the 8 new variables I want.
Any help would be greatly appreciated.

Data

dput(head(psqi.Q5, 100))
structure(list(ID = c("IP001", "IP002", "IP003", "IP004", "IP005", 
"IP007", "IP008", "IP009", "IP010", "IP011", "IP012", "IP013", 
"IP015", "IP016", "IP017", "IP019", "IP020", "IP021", "IP023", 
"IP031", "IP032", "IP034", "IP036", "IP037", "IP038", "IP039", 
"IP040", "IP041", "IP042", "IP043", "IP044", "IP045", "IP047", 
"IP048", "IP049", "IP050", "IP051", "IP052", "IP053", "IP054", 
"IP055", "IP056", "IP058", "IP068", "IP070", "IP078", "IP104", 
"IP132", "IP133", "IP135", "IP139", "IP141", "IP143", "IP145", 
"IP146", "IP147", "IP153", "IP155", "IP156", "IP157", "IP158", 
"IP160", "IP161", "IP162", "IP163", "IP164", "IP165", "IP166", 
"IP168", "IP169", "IP170", "IP171", "IP172", "IP173", "IP175", 
"IP176", "IP177", "IP178", "IP179", "IP180", "IP182", "IP183", 
"IP184", "IP187", "IP188", "IP191", "IP192", "IP193", "IP194", 
"IP196", "IP197", "IP198", "IP199", "IP200", "IP201", "IP202", 
"IP203", "IP204", "IP205", "IP206"), psqi_5b = c(1, 2, 2, 2, 
2, 2, 3, 2, 1, 4, 1, 2, 2, 4, 2, 2, 3, 3, 3, 3, 3, 2, 4, 2, 4, 
3, 2, 4, 3, 4, 2, 3, 2, 2, 4, 2, 4, 3, 2, 4, 1, 1, 3, 4, 3, 3, 
2, 4, 1, 1, 4, 3, 4, 4, 4, 2, 4, 3, 2, 4, 4, NA, 2, 4, 3, 3, 
4, 4, 2, 4, 4, 1, 2, 4, 4, 2, 4, 4, 4, 4, 3, 4, 2, 4, 3, 4, 3, 
3, 2, 4, 3, 3, 4, 2, 4, 4, 3, 3, 4, 2), psqi_5c = c(1, 2, 1, 
1, 2, 1, 1, 2, 1, 3, 1, 1, 4, 3, 2, 1, 2, 2, 2, 2, 3, 2, 3, 2, 
1, 2, 2, 2, 4, 4, 2, 2, 2, 4, 4, 2, 2, 3, 1, 3, 1, 3, 3, 3, 3, 
1, 2, 4, 1, 1, 4, 1, 4, 4, 4, 1, 4, 1, 3, 4, 2, 2, 2, 4, 2, 3, 
3, 4, 4, 3, 2, 1, 1, 4, 3, 3, 4, 1, 4, 2, 2, 4, 1, 4, 4, 3, 3, 
2, 2, 4, 2, 1, 4, 4, 1, 4, 3, 2, 4, 3), psqi_5d = c(1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 4, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 
1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 4, 1, 1, 1, 2, 1, 1, 1, 1, 1, 
1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 2, 2, 1, 3, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 3, 1), psqi_5e = c(1, 1, 1, 
1, 2, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 2, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 3, 1, 2, 1, 
3, 1, 1, 2, 1, 1, 1, 2, 1, 3, 1, 4, 2, 2, 1, 1, 1, 1, 3, 1, 1, 
1, 2, 1, 2, 1, 4, 1, 2, 2, 1, 4, 1, 2, 2, 1, 1, 1, 3, 3, 1, 1, 
1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 4, 3, 1), psqi_5f = c(2, 1, 1, 
2, 1, 1, 2, 3, 1, 2, 1, 1, 2, 1, 2, 1, 2, 1, 1, 1, 2, 1, 1, 1, 
1, 1, 1, 2, 2, 1, 1, 3, 1, 2, 2, 1, 1, 1, 1, 2, 1, 1, 1, 2, 1, 
2, 1, 3, 3, 1, 3, 2, 1, 4, NA, 1, 2, 2, 2, 1, 3, 1, 2, 2, 2, 
1, 2, 4, 1, 1, 1, 1, 1, 2, 1, 1, 2, 3, 1, 3, 1, 2, 3, 2, 1, 1, 
3, 2, 1, 1, 1, 1, 1, 1, 2, 4, 3, 1, 4, 1), psqi_5g = c(1, 1, 
1, 2, 2, 2, 1, 3, 1, 2, 1, 2, 2, 1, 2, 1, 2, 1, 2, 2, 3, 1, 2, 
2, 1, 1, 2, 3, 2, 4, 1, 3, 1, 4, 2, 1, 2, 3, 1, 3, 3, 2, 1, 4, 
2, 2, 1, 4, 3, 2, 3, 3, 1, 4, 1, 1, 4, 2, 3, 1, 3, 2, 2, 2, 2, 
4, 4, 4, 2, 2, 3, 1, 1, 3, 2, 1, 2, 3, 2, 3, 1, 4, 1, 3, 2, 3, 
3, 1, 3, 2, 2, 4, 3, 2, 2, 4, 3, 1, 4, 2), psqi_5h = c(2, 2, 
1, 2, 2, 2, 1, 3, 1, 1, 1, 1, 2, 1, 1, 1, 2, 1, 1, 2, 1, 1, 1, 
1, 1, 2, 1, 3, 4, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 2, 
1, 1, 1, 2, 2, 1, 3, 2, 2, 4, 3, 2, 2, 1, 1, 1, 2, 2, 1, 1, 2, 
1, 4, 2, 3, 1, 1, 2, 1, 1, 3, 2, 3, 3, 3, 1, 3, 1, 1, 4, 1, 1, 
1, 2, 1, 2, 2, 1, 3, 2, 4, 4, 1, 2, 1, 2), psqi_5i = c(1, 1, 
1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 2, 4, 2, 
1, 1, 1, 1, 2, 1, 1, 1, 3, 1, 1, 3, 1, 2, 1, 1, 4, 2, 3, 1, 4, 
1, 3, 2, 4, 3, 1, 2, 4, 4, 4, 4, 3, 4, 1, 2, 2, 4, 3, 2, 2, 2, 
3, 3, 4, 4, 2, 4, 3, 2, 4, 4, 3, 4, 3, 4, 4, NA, 4, 2, 4, 1, 
1, 3, 3, 1, 3, 2, 3, 2, 4, 4, 4, 1, 1, 4, 1), psqi_5othera = structure(c(0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 0L, 0L, 0L, 0L, 0L, 
0L, 2L, 0L, 3L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 2L, 0L, 
0L, 0L, 0L, 3L, 3L, 0L, 1L, 3L, 0L, 0L, 0L, 0L, 2L, 0L, 0L, 3L, 
2L, 0L, 3L, 0L, 0L, 0L, 0L, 0L, 3L, 0L, 0L, 0L, 0L, 2L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L), label = "How often during the past month have you had trouble sleeping because of this?", class = c("labelled", 
"integer"))), row.names = c(NA, -100L), class = c("tbl_df", "tbl", 
"data.frame"))

Code

psqi_Q5s <- c("b","c","d","e","f","g","h","i")
j <- "psqi_5"
k <- "psQ5"
for (i in psqi_Q5s){
	q <- paste0(j, i) # original variable
	y <- paste0(k,i) # new scored variable
	# for(z in 1:8){ 
		psqi.Q5  <- psqi.Q5 %>%
			mutate(
				y = case_when(
					q == 1 ~ 0,
					q == 2 ~ 1,
					q == 3 ~ 2,
					q == 4 ~ 3 #,
					# TRUE ~ NA_integer_
					#TRUE ~ as.numeric(q)
				)
			)
		# }
}

Jason the #rstatsnewbie

You don't need to use a loop to create a new rescored column.
Try data.table packagem like this

library(data.table)

Step 1: convert your data frame to data.table object

your_data = as.data.table(your_data)

Step 2: Use great data.table syntax

your_data[ psqi_Q5s == 1, psqi_Q5s := 0]
your_data[ psqi_Q5s == 2, psqi_Q5s := 1]
your_data[ psqi_Q5s == 3, psqi_Q5s := 2]
your_data[ psqi_Q5s == 4, psqi_Q5s := 3]

If you are in a numeric column you can't add character replacement. If that is the case, first convert column to character. Then, do character replacements.

1 Like

You're trying to overthink the problem. Remember that dplyr functions are vectorized so you'll very rarely need to write for loops yourself.

Here's how to do this with case_when(). Use the _if, _at and _all variants of mutate() when you want to operate on multiple columns.

psqi.Q5 %>% 
  mutate_at(vars(matches("psqi_5[b-i]")), 
            ~ case_when(. == 1 ~ 0, 
                        . == 2 ~ 1, 
                        . == 3 ~ 2, 
                        . == 4 ~ 3)
  )
# A tibble: 100 x 10
   ID    psqi_5b psqi_5c psqi_5d psqi_5e psqi_5f psqi_5g psqi_5h psqi_5i psqi_5othera
   <chr>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl> <labelled>  
 1 IP001       0       0       0       0       1       0       1       0 0           
 2 IP002       1       1       0       0       0       0       1       0 0           
 3 IP003       1       0       0       0       0       0       0       0 0           
 4 IP004       1       0       0       0       1       1       1       0 0           
 5 IP005       1       1       0       1       0       1       1       0 0           
 6 IP007       1       0       0       0       0       1       1       0 0           
 7 IP008       2       0       0       0       1       0       0       2 0           
 8 IP009       1       1       0       0       2       2       2       0 0           
 9 IP010       0       0       0       1       0       0       0       0 0           
10 IP011       3       2       0       0       1       1       0       0 0           
# ... with 90 more rows

By the way, if your rescored values are just one less than the old values, you could do away with case_when() too.

psqi.Q5 %>% 
  mutate_at(vars(matches("psqi_5[b-i]")), ~ . - 1)
# A tibble: 100 x 10
   ID    psqi_5b psqi_5c psqi_5d psqi_5e psqi_5f psqi_5g psqi_5h psqi_5i psqi_5othera
   <chr>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl> <labelled>  
 1 IP001       0       0       0       0       1       0       1       0 0           
 2 IP002       1       1       0       0       0       0       1       0 0           
 3 IP003       1       0       0       0       0       0       0       0 0           
 4 IP004       1       0       0       0       1       1       1       0 0           
 5 IP005       1       1       0       1       0       1       1       0 0           
 6 IP007       1       0       0       0       0       1       1       0 0           
 7 IP008       2       0       0       0       1       0       0       2 0           
 8 IP009       1       1       0       0       2       2       2       0 0           
 9 IP010       0       0       0       1       0       0       0       0 0           
10 IP011       3       2       0       0       1       1       0       0 0           
# ... with 90 more rows
6 Likes

These are both great solutions.
I can now do this with a single column.

Just for future reference, how would I rescore/recode several variables into new variables, some sort of loop?
My ultimate goal is to perform the same operation on several original data columns and create corresponding new "scored" columns.

Jason the #rstatsnewbie

UPDATE 1

I’ve encountered a new snag. In the larger dataset, there are variables with similar names (e.g., psqi_5b and psqi_5b.factor) and the code from @siddharthprabhu throws an error (feel below). Any suggestions?

dat.psqi <- dat.psqi %>% 
+ 	mutate_at(vars(matches("psqi_5[b-i]")), ~ . - 1)
Error: `mutate()` argument `psqi_5b.factor` errored.
ℹ `psqi_5b.factor` is `(structure(function (..., .x = ..1, .y = ..2, . = ..1) ...`.
x non-numeric argument to binary operator
Run `rlang::last_error()` to see where the error occurred.

UPDATE 2

I’ve found something that works, I added a space after the right bracket ( ] ). However, I am not sure about the quality of the solution.

We just need a minor tweak to the column matching pattern; instead of matches("psqi_5[b-i]"), use matches("psqi_5[b-i]$"). Essentially, we'll only match names where [b-i] is at the end.

EDIT: My mistake. The error seems to be something different than the link I posted earlier.

1 Like

Thanks!
I didn’t realize these were bugs. Good to know.

What would I have to add to create "scored variables" and leave the original variables intact?

Name the function being passed to mutate_at(). So, instead of ~ . - 1, use something like list(rescored = . - 1). The name will be appended as a suffix to each modified variable.

By the way, I would strongly recommend reading the documentation (type ?mutate_at) as these are basic operations and you can find a lot of good information and examples there.

@siddharthprabhu Thank you for the response.
I really appreciate the help.

This should be the last set of questions for this issue. I'm trying to break down exactly what is happening. :smile::+1:

From the man page:

mutate_at(.tbl, .vars, .funs, ..., .cols = NULL)

Your suggestion:

psqi.Q5 %>%
mutate_at(vars(matches("psqi_5[b-i]")), ~ . - 1)

So in my case:

  • psqi.Q5 (i.e., .tbl) is passed to #1 (i.e., vars)
  • matches (part of the tidyverse) identifies the correct variables
  • ~ : does what?
  • . : does what?
  • -1 : is subtracted from the values of the current variable
  • list("var_name" = . -1) : becomes the new .funs?

Thanks for your patience and help.

Before posting here I will usually spend a lot of time reading the documentation and doing various searches, with the presumption that someone else has already had this question, and it was answered. Perhaps I missed your suggestions in the documentation. I will go back and reread the docs to see what else I can learn.

Many thanks.

Cheers,
Jason the #rstatsnewbie

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.