Filtering multiple condition within a column

Rscotty · May 18, 2018, 12:17pm

I want to list all Patient_code who have taken Botox and Non-Botox.

Below is my Primary table

library(tidyverse)
primary_table <- tibble::tribble(
                   ~Patinet_code, ~Brand_Name,
                     "Patient-1",     "Botox",
                     "Patient-2",     "Botox",
                     "Patient-2",     "Botox",
                     "Patient-2",     "Botox",
                     "Patient-1", "Non-Botox",
                     "Patient-2", "Non-Botox",
                     "Patient-3", "Non-Botox",
                     "Patient-3", "Non-Botox",
                     "Patient-3", "Non-Botox"
                   )

My end result should look like this:

result <- structure(list(Patinet_code = c("Patient-1", "Patient-2")), .Names = "Patinet_code", row.names = c(NA, 
-2L), class = c("tbl_df", "tbl", "data.frame"))

I can easily eyeball and I know Patient-1 has taken Botox and Non_Botox. Also, Patient-2 has taken Botox and Non_Botox. Therefore, only those Patinets should be on my list. However, I have list of 1000+ patients. How can I do it using tidyverse tools.

I don't have my codes to show you, it's really embarrassing but I really don't know where to start. Can you please help me. I have used Dplyr common verbs but never solved anything like this before.

Thank you

jim89 · May 18, 2018, 12:55pm

I think this will work:

my_result <- primary_table %>% 
    group_by(Patinet_code) %>% 
    filter(
        max(Brand_Name == "Botox") == 1,
        max(Brand_Name == "Non-Botox") == 1
    ) %>% 
    distinct(Patinet_code)

identical(my_result, result)
#> TRUE

We find patients who have ever had botox and non-botox by finding those who have ever had botox, those who have ever had non botox, and then filter to those who have had both.

This might be easier to see by creating the "ever had botox" and "ever had non-botox" fields first:

primary_table %>% 
    group_by(Patinet_code) %>% 
    mutate(
        ever_botox = max(Brand_Name == "Botox"),
        ever_non_botox = max(Brand_Name == "Non-Botox")
    ) %>% 
    filter(
        ever_botox == 1,
        ever_non_botox == 1
    )

(But it's not strictly necessary to get to your desired result).

nwerth · May 18, 2018, 5:52pm

May I suggest a minor improvement for human reading?

primary_table %>%
    group_by(Patinet_code) %>%
    filter(
      any(Brand_Name == "Botox"),
      any(Brand_Name == "Non-Botox")
    ) %>%
    distinct(Patinet_code)

Rscotty · May 18, 2018, 10:00pm

@jim89 Thank you for your solution.

I knew max function but didn't know you could use that way. I preferred your second solution since if gives me visual understanding of my problem.

Rscotty · May 18, 2018, 10:12pm

Thank you for you advise on improvement.

In your code how would you filter only "Non-Botox" patients using any function? [In real data sets I will have many different combinations of Brand name to filter]

Below is my attempt which doesn't give correct answer

primary_table %>%
    group_by(Patinet_code) %>%
    filter(
      any(Brand_Name != "Botox"),
      any(Brand_Name == "Non-Botox")
    ) %>%
    distinct(Patinet_code)

My attempt using @jim89 solution gives correct answer

primary_table %>% 
  group_by(Patinet_code) %>% 
  filter(
    max(Brand_Name == "Botox") == 0,
    max(Brand_Name == "Non-Botox") == 1
  )

jonspring · May 18, 2018, 10:20pm

I think this would do it:

primary_table %>%
  group_by(Patinet_code) %>%
  filter(
    !any(Brand_Name == "Botox"),
    any(Brand_Name == "Non-Botox")
  ) %>%
  distinct(Patinet_code)

Rscotty · May 18, 2018, 10:24pm

ah! ok Thank you it worked.

jim89 · May 21, 2018, 7:48pm

I like this better than mine - showing my SQL roots with max() - any() is much nicer!