Help me understand this code??!

Hi everyone, thank you for all of your help in this forum-truly invaluable!

So, to provide background on the code below, I have a dataframe where the columns are a bunch of drug names. The drug names can either have "C", "P", or "no value/NA" in them. I had asked for help earlier in filtering the data such that only certain drugs would equal "C".

In the example below, methamphetamine, cocaine, and fentanyl equal "C", and all the other drugs cannot equal C or P (no value/NA is fine).

MFCALONE <- MEC_2013_up%>%filter(Fentanyl=="C",
                                      Cocaine=="C",
                                      Methamphetamine=="C",
                                      across(c(-Fentanyl,-Cocaine,-Methamphetamine), ~!.x %in% c("C","P")))

My problem is that I am having trouble understanding exactly what the last line is doing.

From my understanding of the "across" function, the first argument is specifying the columns to take from (in this case, every column but fentanyl, cocaine, and methamphetamine). However, I am unclear on what the "~!.x" as well as the %in% is doing. Could someone provide an explanation? I was still unclear after looking up the documentation on the "across" function.

Thank you!

I see that using across() in filter() is deprecated, so I will show some examples using if_any() and if_all(), as recommended.
The ~ allows you to conveniently write a function and refer to the current column as .x
The %in% operator returns TRUE or FALSE based on if the elements of the vector on its left are in the vector on its right. For example,

X <- c("A","B","D","C","A")
X %in% c("A","C")
[1]  TRUE FALSE FALSE  TRUE  TRUE

Finally, the ! inverts the TRUE/FALSE response and can be thought of as a NOT operator. So,

~!.x %in% c("C","P")

is a function that tests whether the elements of the current column are not in the vector c("C","P").
Here are some examples.

library(tidyverse)
DF <- data.frame(A = c("D","C","E","P"),
                 B= c("R","T","R","C"),
                 C = 1:4)
DF
#>   A B C
#> 1 D R 1
#> 2 C T 2
#> 3 E R 3
#> 4 P C 4

#The following two do the same thing
DF |> filter(if_any(.cols = c(A, B), 
                    .fns = ~ .x %in% c("C", "P")))
#>   A B C
#> 1 C T 2
#> 2 P C 4
DF |> filter(if_any(.cols = c(A, B), 
                    .fns = function(Col) Col %in% c("C","P")))
#>   A B C
#> 1 C T 2
#> 2 P C 4

#Variations with ! to invert the TRUE/FALSE response and with if_all()
DF |> filter(if_any(.cols = c(A, B), .fns = ~ !.x %in% c("C", "P")))
#>   A B C
#> 1 D R 1
#> 2 C T 2
#> 3 E R 3

DF |> filter(if_all(.cols = c(A, B), .fns = ~ .x %in% c("C", "P")))
#>   A B C
#> 1 P C 4

DF |> filter(if_all(.cols = c(A, B), .fns = ~ !.x %in% c("C", "P")))
#>   A B C
#> 1 D R 1
#> 2 E R 3

Created on 2022-11-28 with reprex v2.0.2

3 Likes

Thank you, this is super helpful! A quick clarification question. You said:

The %in% operator returns TRUE or FALSE based on if the elements of the vector on its left are in the vector on its right. (emphasis mine)

Then, you give the example below:

X <- c("A","B","D","C","A")
X %in% c("A","C")
[1]  TRUE FALSE FALSE  TRUE  TRUE

With this example, it seems like the in function is testing whether elements of the vector on its right are in the vector on its left. Would it be possible to clarify?

Thanks!

X %in% Y returns as many TRUE/FALSE values as there are elements in X. That is why I think of it as answering "is X in Y"? In the code above, we get answers for "is A in c(A,C)", "is B in c(A,C)", etc. Maybe that is a quirk of my brain.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.