plyr/dplyr -- recode values based on multi-column conditional logic

gxm204 · April 1, 2019, 4:08pm

Hi, I am having a hard time re-coding values in a data frame based on conditional logic using more than one column. Here is what I mean from an example. Here we have two columns about a schedule.

The schedule has to be both requested and approved, such that we have schedules that are both requested and approved and schedules that were requested but not approved. I want to capture whether schedules were either not requested, requested and approved, or requested but not approved. How could I capture this conditional logic into one column?

Here is my solution that I bet someone here could improve on Basically I concatenated the two columns together and then re-coded them to indicate the possibilities.

schedule <- data.frame(requested = c("Yes","No","Yes","Yes","Yes"),
                       approved = c("Yes",NA,"No","Yes","Yes"))


schedule

  requested approved
1       Yes      Yes
2        No     <NA>
3       Yes       No
4       Yes      Yes
5       Yes      Yes

schedule$requestapprove <- paste0(schedule$requested, schedule$approved)

library(plyr)

schedule$requestapprove.r <- revalue(schedule$requestapprove,
                                   c("YesYes" = "Approved",
                                     "NoNA" = "Not requested",
                                     "YesNo" = "Requested but not approved"))

schedule

  requested approved requestapprove           requestapprove.r
1       Yes      Yes         YesYes                   Approved
2        No     <NA>           NoNA              Not requested
3       Yes       No          YesNo Requested but not approved
4       Yes      Yes         YesYes                   Approved
5       Yes      Yes         YesYes                   Approved

andresrcs · April 1, 2019, 4:31pm

You can use case_when()

schedule <- data.frame(requested = c("Yes","No","Yes","Yes","Yes"),
                       approved = c("Yes",NA,"No","Yes","Yes"))
library(dplyr)

schedule %>% 
    mutate(category = case_when(requested == "Yes" & approved == "Yes" ~ "Approved",
                                requested == "No" & is.na(approved) ~ "Not requested",
                                requested == "Yes" & approved == "No" ~ "Requested but not approved"
    ))
#>   requested approved                   category
#> 1       Yes      Yes                   Approved
#> 2        No     <NA>              Not requested
#> 3       Yes       No Requested but not approved
#> 4       Yes      Yes                   Approved
#> 5       Yes      Yes                   Approved

^{Created on 2019-04-01 by the reprex package (v0.2.1.9000)}

Yarnabrina · April 1, 2019, 4:35pm

Andres beat me in providing the first answer, but a little bit shortened answer:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

schedule <- data.frame(requested = c("Yes", "No", "Yes", "Yes", "Yes"),
                       approved = c("Yes", NA, "No", "Yes", "Yes"),
                       stringsAsFactors = FALSE)

schedule %>%
  mutate(category = case_when((requested == "No") ~ "not requested",
                              (approved == "Yes") ~ "approved",
                              TRUE ~ "requested, but not approved"))
#>   requested approved                    category
#> 1       Yes      Yes                    approved
#> 2        No     <NA>               not requested
#> 3       Yes       No requested, but not approved
#> 4       Yes      Yes                    approved
#> 5       Yes      Yes                    approved

^{Created on 2019-04-01 by the reprex package (v0.2.1)}

gxm204 · April 1, 2019, 6:54pm

I like that a lot & it looks pretty familiar from SQL! Thanks, Andres!

andresrcs · April 1, 2019, 7:28pm

Yes it's pretty similar, you can even use dbplyr and pull data directly from the sql server but using dplyr commands, which get translated into sql code under the hood.

system · April 8, 2019, 7:28pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.