Creating a new variable based on a range of scores in the dataset

I feel this is an easy solution for someone with more experience in R then myself, but I am having trouble completing the correct syntax for the following calculation.

As part of a 9 item questionnaire we use, people can respond between the ranges of 1 - 11.
For questions Q1, Q4 and Q7 if you answered between 8-11 for at least 2 of them AND for question Q3, Q6, and Q9 you answered between 8-11 for at least 2 of them you are classified into a certain category.

My question is how do you properly perform this, specifically the AND/OR option since 2 out of 3 and 3 out of 3 are both valid.

I import the csv file with 10 columns (ID, Q1, Q2....Q9)
Making a new variable at the end (i.e., categoryX) is simply. But the calculation/operator is the part I'm having an issue with.

For example I've been playing around with:
database$categoryX[(dataset$Q1 & dataset$Q4 | dataset$Q7 > 7) & (dataset$Q3 & dataset$Q6 | dataset$Q9 > 7)] <- 1

But as some of you probably already know, this doesn't work fully. I've tried || and playing around but I can't get it. Does anyone know how to handle this?

Greatly appreciate it any advice !

dat = tibble::tribble(
   ID,  Q1, Q2, Q3, Q4, Q5, Q6, Q7, Q8, Q9
   001,  9,  3,  9,  8,  3,  3,  1,  3, 10   
   002,  1, 11,  7,  1,  7,  7,  1,  7,  8   
   003,  1, 11, 11,  1, 11, 11,  7, 11, 11 
   004,  4, 10,  9,  9,  9,  6,  6,  8,  6
  )``````

Could you post some of the data and an example of what you are trying to achieve? It sounds like you should be able to do this using mutate and case_when. Just make sure you read the help file for case_when, especially this part:

# Like an if statement, the arguments are evaluated in order, so you must
# proceed from the most specific to the most general. 

I second clausp request for a REPRoducible EXample (reprex) illustrating your issue.

dat = tibble::tribble(
   ID,  Q1, Q2, Q3, Q4, Q5, Q6, Q7, Q8, Q9
   001,  9,  3,  9,  8,  3,  3,  1,  3, 10   
   002,  1, 11,  7,  1,  7,  7,  1,  7,  8   
   003,  1, 11, 11,  1, 11, 11,  7, 11, 11 
   004,  4, 10,  9,  9,  9,  6,  6,  8,  6
  )

My apologiez. Here is an example of the data

Is this what you mean?

library(dplyr)

dat <- data.frame(
  stringsAsFactors = FALSE,
                ID = c("001", "002", "003", "004"),
                Q1 = c(9, 1, 1, 4),
                Q2 = c(3, 11, 11, 10),
                Q3 = c(9, 7, 11, 9),
                Q4 = c(8, 1, 1, 9),
                Q5 = c(3, 7, 11, 9),
                Q6 = c(3, 7, 11, 6),
                Q7 = c(1, 1, 7, 6),
                Q8 = c(3, 7, 11, 8),
                Q9 = c(10, 8, 11, 6)
)

dat %>%
    rowwise() %>% 
    mutate(flag = if_else(sum(Q1 > 7, Q4 > 7, Q7 > 7) >= 2 & sum(Q3 > 7, Q6 > 7, Q9 > 7) >= 2,
                          "Category x",
                          "Other Category"))
#> Source: local data frame [4 x 11]
#> Groups: <by row>
#> 
#> # A tibble: 4 x 11
#>   ID       Q1    Q2    Q3    Q4    Q5    Q6    Q7    Q8    Q9 flag          
#>   <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>         
#> 1 001       9     3     9     8     3     3     1     3    10 Category x    
#> 2 002       1    11     7     1     7     7     1     7     8 Other Category
#> 3 003       1    11    11     1    11    11     7    11    11 Other Category
#> 4 004       4    10     9     9     9     6     6     8     6 Other Category

Created on 2020-03-16 by the reprex package (v0.3.0.9001)

1 Like

@andresrcs that is EXACTLY what I mean yes! I can't tell you my appreciation, Thank you. I was not aware you can combined the "if_else(sum(Q1 > 7, Q4 > 7, Q7 > 7) >= 2 ... option. That was a huge savior.

Thank you very much sir!! The lab and I are extremely grateful

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.