Filtering multiple condition within a column

dplyr

#1

I want to list all Patient_code who have taken Botox and Non-Botox.

Below is my Primary table

library(tidyverse)
primary_table <- tibble::tribble(
                   ~Patinet_code, ~Brand_Name,
                     "Patient-1",     "Botox",
                     "Patient-2",     "Botox",
                     "Patient-2",     "Botox",
                     "Patient-2",     "Botox",
                     "Patient-1", "Non-Botox",
                     "Patient-2", "Non-Botox",
                     "Patient-3", "Non-Botox",
                     "Patient-3", "Non-Botox",
                     "Patient-3", "Non-Botox"
                   )

My end result should look like this:

result <- structure(list(Patinet_code = c("Patient-1", "Patient-2")), .Names = "Patinet_code", row.names = c(NA, 
-2L), class = c("tbl_df", "tbl", "data.frame"))

I can easily eyeball and I know Patient-1 has taken Botox and Non_Botox. Also, Patient-2 has taken Botox and Non_Botox. Therefore, only those Patinets should be on my list. However, I have list of 1000+ patients. How can I do it using tidyverse tools.

I don't have my codes to show you, it's really embarrassing but I really don't know where to start. Can you please help me. I have used Dplyr common verbs but never solved anything like this before.

Thank you


Demeaning / Mean-Centering of certain values only
#2

I think this will work:

my_result <- primary_table %>% 
    group_by(Patinet_code) %>% 
    filter(
        max(Brand_Name == "Botox") == 1,
        max(Brand_Name == "Non-Botox") == 1
    ) %>% 
    distinct(Patinet_code)

identical(my_result, result)
#> TRUE

We find patients who have ever had botox and non-botox by finding those who have ever had botox, those who have ever had non botox, and then filter to those who have had both.

This might be easier to see by creating the "ever had botox" and "ever had non-botox" fields first:

primary_table %>% 
    group_by(Patinet_code) %>% 
    mutate(
        ever_botox = max(Brand_Name == "Botox"),
        ever_non_botox = max(Brand_Name == "Non-Botox")
    ) %>% 
    filter(
        ever_botox == 1,
        ever_non_botox == 1
    )

(But it's not strictly necessary to get to your desired result).


#3

May I suggest a minor improvement for human reading?

primary_table %>%
    group_by(Patinet_code) %>%
    filter(
      any(Brand_Name == "Botox"),
      any(Brand_Name == "Non-Botox")
    ) %>%
    distinct(Patinet_code)

#4

@jim89 Thank you for your solution.

I knew max function but didn't know you could use that way. I preferred your second solution since if gives me visual understanding of my problem.


#5

Thank you for you advise on improvement.

In your code how would you filter only "Non-Botox" patients using any function? [In real data sets I will have many different combinations of Brand name to filter]

Below is my attempt which doesn't give correct answer

primary_table %>%
    group_by(Patinet_code) %>%
    filter(
      any(Brand_Name != "Botox"),
      any(Brand_Name == "Non-Botox")
    ) %>%
    distinct(Patinet_code)

My attempt using @jim89 solution gives correct answer

primary_table %>% 
  group_by(Patinet_code) %>% 
  filter(
    max(Brand_Name == "Botox") == 0,
    max(Brand_Name == "Non-Botox") == 1
  )

#6

I think this would do it:

primary_table %>%
  group_by(Patinet_code) %>%
  filter(
    !any(Brand_Name == "Botox"),
    any(Brand_Name == "Non-Botox")
  ) %>%
  distinct(Patinet_code)

#7

ah! ok Thank you it worked.


#8

I like this better than mine - showing my SQL roots with max() - any() is much nicer!