Populating a factor column based on conditions of two variables

Good evening everyone, I'm stuck on a problem with data transformation that I'd like to share with you.

In tidying the data I am working with, I've successfully refined two data frames with the desirable variables.

I now want to populate a column I have named 'VISIT_DAY', and made into a factor ("YES", "NO). I want to populate it based on the conditions of "PATIENT_ID" and "DATE", depending on them matching in both data frames (i.e. if 'df1' and 'df2' both have an observation with PATIENT_ID 123 and DATE 08-20-2018).

Here is the attempt I have made thus far.

Home.raw4$VISIT_DAY <- factor(Home.raw4$VISIT_DAY, levels = c("YES", "NO"))
Home.raw4$VISIT_DAY <- ifelse((Home.raw4$PATIENT_ID == "-2147483646" & Home.raw4$DATE == "06.08.2018"), "YES", "NO")

When I execute the second line of code, it populates the column with only "NO", even in the observation where conditions are met.

I was wondering if anyone could help me resolve this? It seems like the ifelse() code is somewhat correct, but I feel that I am missing something for it to state "YES" when conditions are met.

I'm quite new to R Studio, so if it is blatantly obvious what I should do, my apologies.

Thanks for reading.

Hi!

To help us help you, could you please prepare a reproducible example (reprex) illustrating your issue? Please have a look at this guide, to see how to create one:

Hi, here is the example. For some reason it says the code does not work when i render it in reprex, but it does in the console.

#Data in question (extracted from large dataframe I am working with)
datapasta::df_paste(example)
#> Could not format input_table as table. Unexpected class.
data.frame(
  PATIENT_ID = c(-2147483646,-2147483646,-2147483646,
                 -2147483646,-2147483646,-2147483646,-2147483646,-2147483646,
                 -2147483646,-2147483646,-2147483645),
        FEV1 = c(4.046,4.053,4.024,4.094,4.005,3.869,
                 4.089,3.879,3.859,3.974,1.577),
         FVC = c(5.286,5.043,5.25,5.235,5.265,5.036,
                 5.154,4.998,5.092,5.198,4.769),
         FEF = c(199.3,226.9,196.8,223.4,199.6,212.7,
                 219.8,205.1,182.9,199.2,59.2),
        DATE = as.factor(c("20.08.2018 08:37:23 +02:00","20.08.2018 08:36:55 +02:00",
                           "20.08.2018 08:36:29 +02:00","15.08.2018 21:38:40 +02:00",
                           "15.08.2018 21:38:11 +02:00","15.08.2018 21:37:20 +02:00",
                           "07.08.2018 10:24:26 +02:00",
                           "07.08.2018 10:23:01 +02:00","07.08.2018 10:22:36 +02:00",
                           "06.08.2018 13:30:53 +02:00","22.03.2019 19:17:50 +01:00")),
    LOCATION = as.factor(c("HOME","HOME","HOME",
                           "HOME","HOME","HOME","HOME","HOME","HOME",
                           "HOME","HOME")),
   VISIT_DAY = as.factor(c("NO","NO","NO","NO",
                           "NO","NO","NO","NO","NO","NO","NO"))
)
#>     PATIENT_ID  FEV1   FVC   FEF                       DATE LOCATION VISIT_DAY
#> 1  -2147483646 4.046 5.286 199.3 20.08.2018 08:37:23 +02:00     HOME        NO
#> 2  -2147483646 4.053 5.043 226.9 20.08.2018 08:36:55 +02:00     HOME        NO
#> 3  -2147483646 4.024 5.250 196.8 20.08.2018 08:36:29 +02:00     HOME        NO
#> 4  -2147483646 4.094 5.235 223.4 15.08.2018 21:38:40 +02:00     HOME        NO
#> 5  -2147483646 4.005 5.265 199.6 15.08.2018 21:38:11 +02:00     HOME        NO
#> 6  -2147483646 3.869 5.036 212.7 15.08.2018 21:37:20 +02:00     HOME        NO
#> 7  -2147483646 4.089 5.154 219.8 07.08.2018 10:24:26 +02:00     HOME        NO
#> 8  -2147483646 3.879 4.998 205.1 07.08.2018 10:23:01 +02:00     HOME        NO
#> 9  -2147483646 3.859 5.092 182.9 07.08.2018 10:22:36 +02:00     HOME        NO
#> 10 -2147483646 3.974 5.198 199.2 06.08.2018 13:30:53 +02:00     HOME        NO
#> 11 -2147483645 1.577 4.769  59.2 22.03.2019 19:17:50 +01:00     HOME        NO

#Code attempting to populate column based on conditions
example$VISIT_DAY <- factor(example$VISIT_DAY, levels = c("YES", "NO"))
#> Error in example$VISIT_DAY: object of type 'closure' is not subsettable
example$VISIT_DAY <- ifelse((example$PATIENT_ID == "-2147483646" & example$DATE == "06.08.2018"), "YES", "NO")
#> Error in example$PATIENT_ID: object of type 'closure' is not subsettable

This is because reprex() runs your code on a clean R session where the environment is empty so you have to define everything within the code itself, and since you haven't assigned the sample data to any object name, it doesn't exist on that environment.

About your issue, the problem is that your Date column is not an actual "date" variable it has been read as a factor variable so you need to convert it before been able to make comparisons on that way, see this example.

example <- data.frame(
    PATIENT_ID = c(-2147483646,-2147483646,-2147483646,
                   -2147483646,-2147483646,-2147483646,-2147483646,-2147483646,
                   -2147483646,-2147483646,-2147483645),
    FEV1 = c(4.046,4.053,4.024,4.094,4.005,3.869,
             4.089,3.879,3.859,3.974,1.577),
    FVC = c(5.286,5.043,5.25,5.235,5.265,5.036,
            5.154,4.998,5.092,5.198,4.769),
    FEF = c(199.3,226.9,196.8,223.4,199.6,212.7,
            219.8,205.1,182.9,199.2,59.2),
    DATE = as.factor(c("20.08.2018 08:37:23 +02:00","20.08.2018 08:36:55 +02:00",
                       "20.08.2018 08:36:29 +02:00","15.08.2018 21:38:40 +02:00",
                       "15.08.2018 21:38:11 +02:00","15.08.2018 21:37:20 +02:00",
                       "07.08.2018 10:24:26 +02:00",
                       "07.08.2018 10:23:01 +02:00","07.08.2018 10:22:36 +02:00",
                       "06.08.2018 13:30:53 +02:00","22.03.2019 19:17:50 +01:00")),
    LOCATION = as.factor(c("HOME","HOME","HOME",
                           "HOME","HOME","HOME","HOME","HOME","HOME",
                           "HOME","HOME")),
    VISIT_DAY = as.factor(c("NO","NO","NO","NO",
                            "NO","NO","NO","NO","NO","NO","NO"))
)

library(dplyr)
library(lubridate)

example %>% 
    mutate(DATE = as.Date(dmy_hms(DATE) + hours(2)),
           VISIT_DAY = ifelse((PATIENT_ID == "-2147483646" & DATE == dmy("06.08.2018")), "YES", "NO"))
#>     PATIENT_ID  FEV1   FVC   FEF       DATE LOCATION VISIT_DAY
#> 1  -2147483646 4.046 5.286 199.3 2018-08-20     HOME        NO
#> 2  -2147483646 4.053 5.043 226.9 2018-08-20     HOME        NO
#> 3  -2147483646 4.024 5.250 196.8 2018-08-20     HOME        NO
#> 4  -2147483646 4.094 5.235 223.4 2018-08-15     HOME        NO
#> 5  -2147483646 4.005 5.265 199.6 2018-08-15     HOME        NO
#> 6  -2147483646 3.869 5.036 212.7 2018-08-15     HOME        NO
#> 7  -2147483646 4.089 5.154 219.8 2018-08-07     HOME        NO
#> 8  -2147483646 3.879 4.998 205.1 2018-08-07     HOME        NO
#> 9  -2147483646 3.859 5.092 182.9 2018-08-07     HOME        NO
#> 10 -2147483646 3.974 5.198 199.2 2018-08-06     HOME       YES
#> 11 -2147483645 1.577 4.769  59.2 2019-03-22     HOME        NO
3 Likes

Thank you so much for this, it worked like a charm for assigning VISIT_DAY. Apologies for my poor reprex, it was my first time using the tool.

If I were to assign multiple dates in the mutate function, would I just use DATE == dmy(c("Date1", "Date2"))?

I figured it out! Simply by doing DATE == dmy("06.08.2018") | DATE == dmy("20.08.2018")), it selected the dates I wished for.

Thanks for all the help!

Using the %in% operator would save you a little bit of typing

library(dplyr)
library(lubridate)

example %>% 
    mutate(DATE = as.Date(dmy_hms(DATE) + hours(2)),
           VISIT_DAY = ifelse((PATIENT_ID == "-2147483646" & DATE %in% dmy(c("06.08.2018", "20.08.2018"))),
                              "YES",
                              "NO"))
#>     PATIENT_ID  FEV1   FVC   FEF       DATE LOCATION VISIT_DAY
#> 1  -2147483646 4.046 5.286 199.3 2018-08-20     HOME       YES
#> 2  -2147483646 4.053 5.043 226.9 2018-08-20     HOME       YES
#> 3  -2147483646 4.024 5.250 196.8 2018-08-20     HOME       YES
#> 4  -2147483646 4.094 5.235 223.4 2018-08-15     HOME        NO
#> 5  -2147483646 4.005 5.265 199.6 2018-08-15     HOME        NO
#> 6  -2147483646 3.869 5.036 212.7 2018-08-15     HOME        NO
#> 7  -2147483646 4.089 5.154 219.8 2018-08-07     HOME        NO
#> 8  -2147483646 3.879 4.998 205.1 2018-08-07     HOME        NO
#> 9  -2147483646 3.859 5.092 182.9 2018-08-07     HOME        NO
#> 10 -2147483646 3.974 5.198 199.2 2018-08-06     HOME       YES
#> 11 -2147483645 1.577 4.769  59.2 2019-03-22     HOME        NO
2 Likes

It amazes me how much simpler you can make a line of code through certain features. Thank you very much you've saved me a few hours of confusion!

I have another question (thank you so much for the help so far). I have multiple patients in this data frame that I wish to carry out this code on. Would it be right to add another ifelse() within the mutate tool, but just change the patient ID?

Yes.

If you want to learn how to use most of the tidyverse tools, I recommend you to read this free ebook.

1 Like

Thanks very much, I had been working through the dplyr function prior in this but hadn't gotten to tidyr yet.

Much appreciated, all the best man.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.