select(_if): all numeric columns and one character column

Is there any simple/elegant (=tidy) way to select all numeric columns and one character column (specified by column name)? I was trying to use the select_if for this purpose, but I didn’t found a solution.

My workaround is to construct programmatically a vector of column names and use the select afterwards. But I am interesting to see any simpler solutions.

Hi,

I found a workaround in this post that can be adapted to your problem:

library("dplyr")

#Fake data
myData = data.frame(a = 1:10, b = 1:10, c = LETTERS[1:10], d = 1:10, e = letters[1:10], stringsAsFactors = F)

#Filtering
myData %>% select_if(function(col) all(col == .$e) | is.numeric(col))

    a  b  d e
1   1  1  1 a
2   2  2  2 b
3   3  3  3 c
4   4  4  4 d
5   5  5  5 e
6   6  6  6 f
7   7  7  7 g
8   8  8  8 h
9   9  9  9 i
10 10 10 10 j

However .... I'm not happy with the way that code looks and tried a long time to get a more elegant solution but tidyverse is not letting me :slight_smile:

This works individually:

myData %>% select_if(is.numeric)
   a  b  d
1   1  1  1
2   2  2  2
3   3  3  3
4   4  4  4
5   5  5  5
6   6  6  6
7   7  7  7
8   8  8  8
9   9  9  9
10 10 10 10
myData %>% select_if(names(.) == "e")
  e
1  a
2  b
3  c
4  d
5  e
6  f
7  g
8  h
9  i
10 j

But NOT combined ...

myData %>% select_if(names(.) == "c" | is.numeric) # ERROR
myData %>% select_if(names(.) == "c" | is.numeric(.)) # Only column c output
myData %>% select_if(function(col) names(col) == "e" | is.numeric(col)) # ERROR

So I'm hoping for a tidyverse expert here to explain what I'm doing wrong...

At least you have something that works meanwhile :slight_smile:
PJ

Hi,

Thank you for your suggestions.
Below is my approach, but I would prefer something more simpler/tidier.

library(tidyverse)

#Fake data
myData = data.frame(a = 1:10, b = 1:10, c = LETTERS[1:10], d = 1:10, e = letters[1:10], stringsAsFactors = F)

numerical_cols <-
  myData %>%
  select_if(is.numeric) %>%
  colnames()

myData %>% 
  select(numerical_cols, e)
#>     a  b  d e
#> 1   1  1  1 a
#> 2   2  2  2 b
#> 3   3  3  3 c
#> 4   4  4  4 d
#> 5   5  5  5 e
#> 6   6  6  6 f
#> 7   7  7  7 g
#> 8   8  8  8 h
#> 9   9  9  9 i
#> 10 10 10 10 j

I don't know if you find it tidier but I got this

library(tidyverse)

#Fake data
myData = data.frame(a = 1:10, b = 1:10, c = LETTERS[1:10], d = 1:10, e = letters[1:10], stringsAsFactors = F)

# Using pipe and {}
myData %>% {
  bind_cols(
    select_if(., is.numeric),
    select_at(., "e")
  )
}
#>     a  b  d e
#> 1   1  1  1 a
#> 2   2  2  2 b
#> 3   3  3  3 c
#> 4   4  4  4 d
#> 5   5  5  5 e
#> 6   6  6  6 f
#> 7   7  7  7 g
#> 8   8  8  8 h
#> 9   9  9  9 i
#> 10 10 10 10 j

# not using {}
bind_cols(
  select_if(myData, is.numeric),
  select_at(myData, "e")
)
#>     a  b  d e
#> 1   1  1  1 a
#> 2   2  2  2 b
#> 3   3  3  3 c
#> 4   4  4  4 d
#> 5   5  5  5 e
#> 6   6  6  6 f
#> 7   7  7  7 g
#> 8   8  8  8 h
#> 9   9  9  9 i
#> 10 10 10 10 j

Created on 2019-08-06 by the reprex package (v0.3.0)

1 Like

Hi,

That's definitively more readable than what I had, though I still don't understand why and OR-statement in the select_if would fail...

Grtz,
PJ

A OR-statement works, it is just that you can't access the name of column in select_if, just the value

library(tidyverse)

#Fake data
myData = data.frame(a = 1:10, b = 1:10, c = LETTERS[1:10], d = 1:10, e = letters[1:10], stringsAsFactors = F)

# A or statement works
myData %>% 
  mutate(c = as.factor(c)) %>%
  select_if(~ is.numeric(.) | is.factor(.))
#>     a  b c  d
#> 1   1  1 A  1
#> 2   2  2 B  2
#> 3   3  3 C  3
#> 4   4  4 D  4
#> 5   5  5 E  5
#> 6   6  6 F  6
#> 7   7  7 G  7
#> 8   8  8 H  8
#> 9   9  9 I  9
#> 10 10 10 J 10

# You can't access the names of the column
# this does not works
myData %>% 
  select_if(~ names(.) == "e")
#> Error in selected[[i]] <- eval_tidy(.p(column, ...)): l'argument de remplacement est de longueur nulle
# equivalent of 
myData %>% 
  select_if(function(x) names(x) == "e")
#> Error in selected[[i]] <- eval_tidy(.p(column, ...)): l'argument de remplacement est de longueur nulle

# this works because . is myData here 
myData %>% 
  select_if(names(.) == "e")
#>    e
#> 1  a
#> 2  b
#> 3  c
#> 4  d
#> 5  e
#> 6  f
#> 7  g
#> 8  h
#> 9  i
#> 10 j
# equivalent of 
myData %>% 
  select_if(names(myData) == "e")
#>    e
#> 1  a
#> 2  b
#> 3  c
#> 4  d
#> 5  e
#> 6  f
#> 7  g
#> 8  h
#> 9  i
#> 10 j

Created on 2019-08-06 by the reprex package (v0.3.0)

3 Likes

Hi,

That already makes a lot of sense.
But... if select_if(names(myData) == "e") works and OR works, why does the following NOT work:

myData %>% select_if(names(myData) == "e" | is.numeric(.))

PJ

You can perform both selections within select_at:

myData %>% 
  select_at(vars(names(.)[map_lgl(., is.numeric)], e))
Output
    a  b  d e
1   1  1  1 a
2   2  2  2 b
3   3  3  3 c
4   4  4  4 d
5   5  5  5 e
6   6  6  6 f
7   7  7  7 g
8   8  8  8 h
9   9  9  9 i
10 10 10 10 j

I've been trying to shorten the selection of the numeric columns, but this is the best I've come up with so far.

1 Like

Interesting ... but as @kbzsl said, there should be a readable and tidy approach given the power of tidyverse. Your approach, as several others in this post do the job, but none seem intuitive, which goes against the mantra of the tidyverse lol :stuck_out_tongue:

PJ

I agree. I don't know why select_if doesn't take multiple conditions. It seems like the natural approach.

As it turns out, these also work:

myData %>% 
  select_if(names(.)=="e" | sapply(., is.numeric))

myData %>% 
  select_if(names(.)=="e" | map_lgl(., is.numeric))

You can also go with select, but once again, only (apparently) by doing things that seem unnatural within the tidyverse.

myData %>% 
  select(e, which(sapply(., is.numeric)))

myData %>% 
  select(e, which(map_lgl(., is.numeric)))
1 Like