Tidy way to select if columns are of a particular type

I would like to understand the modern tidy way to select columns of a particular type from a tibble.

For example we can use dplyr::select_if() to select numeric columns from a tibble as follows:

library("tidyverse")
starwars %>% 
    dplyr::select_if(is.numeric)

This works. However, it is superseded by the across type verbs according to this help page.

I had the following 2 questions:

  1. Could anyone please explain the non-superseded way to do this same type of command?
  2. Also are the if_any type selectors also superseded by across in dplyr?

Hi @shamindras ,

I am not sure if I misread you, but just so we're on the same page "superseded" means that there is a new and better way to perform the same task.

  1. The new way of selecting a column based on its class is not with across(), but with where(). Here is a quick example with some made up data:
# Load package
library(dplyr)

# Create a data frame (tibble) with various data types
tbl <- tibble(
  col1 = sample(x = c(TRUE, FALSE), size = 10, replace = TRUE),
  col2 = as.numeric(col1),
  col3 = as.factor(col2),
  col4 = tolower(as.character(col1))
)

tbl

# A tibble: 10 × 4
   col1   col2 col3  col4 
   <lgl> <dbl> <fct> <chr>
 1 TRUE      1 1     true 
 2 FALSE     0 0     false
 3 FALSE     0 0     false
 4 TRUE      1 1     true 
 5 FALSE     0 0     false
 6 FALSE     0 0     false
 7 FALSE     0 0     false
 8 FALSE     0 0     false
 9 FALSE     0 0     false
10 TRUE      1 1     true 

# Select numeric column
tbl %>%
  select(where(is.numeric))

    col2
   <dbl>
 1     1
 2     0
 3     0
 4     1
 5     0
 6     0
 7     0
 8     0
 9     0
10     1

# Select factor column
tbl %>%
  select(where(is.factor))

   col3 
   <fct>
 1 1    
 2 0    
 3 0    
 4 1    
 5 0    
 6 0    
 7 0    
 8 0    
 9 0    
10 1 
  1. No, if_any() and if_all() are currently the recommended functions to do what they were designed to do.

I hope this helps.

1 Like

Thanks @gueyenono. Appreciate your helpful response.

I am not sure if I misread you, but just so we're on the same page "superseded" means that there is a new and better way to perform the same task.

We are definitely on the same page here. Thanks for clarifying and confirming this.

  1. The new way of selecting a column based on its class is not with across() , but with where() . Here is a quick example with some made up data:

I was not aware that the where clause was applicable inside the dplyr::select function. I realized this is on the help page. Appreciate you pointing this out. I will use this approach going forward for filtering columns by class, as you kindly showed.

  1. No, if_any() and if_all() are currently the recommended functions to do what they were designed to do.

I see. My question arose because I was trying to do something as follows:

library("tidyverse")

starwars %>% 
    dplyr::select(
        dplyr::if_any(.cols = dplyr::everything(),
                      .fns = ~is.numeric)
    )

This was trying to replicate the superseded dplyr::select_if functionality. For curiosity would you know how to alternatively filter columns using the if_any function?

Thanks

Let me try to construct a useful example to explain to you how to use if_any() and if_all(). But, just so you know, you cannot use them with select(). They were designed to be used with filter() instead. Short tutorial coming up for you soon.

The full code is provided at the bottom.

Let's assume we have a wide dataset of students as well as their scores in Maths, Biology, English and French.

# Load dplyr ----

library(dplyr)

# Create a sample dataset

set.seed(123)

tbl <- tibble(
  student = LETTERS[1:5],
  maths = sample(50:80, size = 5),
  biology = sample(70:100, size = 5),
  english = sample(70:90, size = 5),
  french = sample(70:100, size = 5)
)

tbl
# A tibble: 5 × 5
  student maths biology english french
  <chr>   <int>   <int>   <int>  <int>
1 A          80      79      89     77
2 B          64      87      83     95
3 C          68      91      74     76
4 D          63      80      78     79
5 E          52      74      72     78

The school is offering some scholarships to students who have had a fairly good performance in various categories.

  1. A language scholarship is awarded to those with an 80 or above in English AND French
  • First, we use the filter() function for the task.
  • Second, we use the if_all() function because we need our condition to be met across all relevant columns, namely English AND French.
  • This is why we specify .cols = c("english", "french") as the first argument of if_all()
  • Finally, we write an anonymous function .fns = ~ .x >= 80, which illustrates the condition to get the scholarship
tbl %>%
  filter(
    if_all(.cols = c("english", "french"),
           .fns = ~ .x >= 80)
  )
# A tibble: 1 × 5
  student maths biology english french
  <chr>   <int>   <int>   <int>  <int>
1 B          64      87      83     95

As you can see only Student B is eligible for the language scholarship as he/she is the only one who obtained a score above 80 in both English and French.

  1. A science scholarship is awarded to those with an 80 or above in Maths AND Biology
tbl %>%
  filter(
    if_all(.cols = c("maths", "biology"),
           .fns = ~ .x >= 80)
  )
# A tibble: 0 × 5
# … with 5 variables: student <chr>, maths <int>, biology <int>, english <int>, french <int>

Unfortunately, no student qualifies for the science scholarship.

  1. A lower quality language scholarship is awarded to those with an 80 or above in English OR French

The main difference here is that we do not need the condition of obtaining a score of 80 or above to be met for both English and French. We only need the condition to be met for ONLY ONE of them. In such a case, the if_any() function will do the job.

tbl %>%
  filter(
    if_any(.cols = c("english", "french"),
           .fns = ~ .x >= 80)
  )
# A tibble: 2 × 5
  student maths biology english french
  <chr>   <int>   <int>   <int>  <int>
1 A          80      79      89     77
2 B          64      87      83     95

Students A and B qualify for the lower quality language scholarship (we already saw that Student B gets the main scholarship anyway).

  1. A lower quality science scholarship is awarded to those with an 80 or above in Maths OR Biology
tbl %>%
  filter(
    if_any(.cols = c("maths", "biology"),
           .fns = ~ .x >= 80)
  )
# A tibble: 2 × 5
  student maths biology english french
  <chr>   <int>   <int>   <int>  <int>
1 A          80      79      89     77
2 B          64      87      83     95

I hope this short tutorial enables you to understand how to use if_any() and if_all().

Cheers :slight_smile:

Here is the full code:

# Load dplyr ----

library(dplyr)

# Create a sample dataset

set.seed(123)

tbl <- tibble(
  student = LETTERS[1:5],
  maths = sample(50:80, size = 5),
  biology = sample(70:100, size = 5),
  english = sample(70:90, size = 5),
  french = sample(70:100, size = 5)
)

tbl

# A language scholarship is awarded to those with an 80 or above in English AND French

tbl %>%
  filter(
    if_all(.cols = c("english", "french"),
           .fns = ~ .x >= 80)
  )

# A lower quality language scholarship is awarded to those with an 80 or above in English OR French

tbl %>%
  filter(
    if_any(.cols = c("english", "french"),
           .fns = ~ .x >= 80)
  )


# A science scholarship is awarded to those with an 80 or above in Maths AND Biology

tbl %>%
  filter(
    if_all(.cols = c("maths", "biology"),
           .fns = ~ .x >= 80)
  )

# A lower quality science scholarship is awarded to those with an 80 or above in Maths AND Biology

tbl %>%
  filter(
    if_any(.cols = c("maths", "biology"),
           .fns = ~ .x >= 80)
  )
3 Likes

Hi @gueyenono - thanks so much for your helpful tutorial. Much appreciated.

I'll mark the original answer as the solution. Thanks!

I'm glad I could help :slight_smile:

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.