Select/order tibble columns using vector of prefix strings

dplyr
tibble

#1

Is there a way to give select() a vector of prefixes and use start_with() or something similar to order the columns in a tibble?

Reprex:

suppressMessages(library(tidyverse))

y <- tibble(id = 1:4, a1 = 1, d6 = 5, d = 4, a2 = 2, b6 = 3, s1 = 10, r2 = 7)

x <- c("a", "d", "b")

y %>% select(id, starts_with(x))
#> Error in starts_with(x): is_string(match) is not TRUE

Desired output:

# A tibble: 4 x 6
     id    a1    a2    d     d6    b
  <int> <dbl> <dbl> <dbl> <dbl> <dbl>
1     1    1.    2.    4.    5.    3.
2     2    1.    2.    4.    5.    3.
3     3    1.    2.    4.    5.    3.
4     4    1.    2.    4.    5.    3.

#2

From SO:

From the select() help file, I gather that the only special internal function that accepts a regular expression is matches(). You can use the regular expression....

library(dplyr)
y <- tibble(id = 1:4, a1 = 1, d6 = 5, d = 4, a2 = 2, b6 = 3, s1 = 10, r2 = 7)
x <- c("a", "d", "b")
y %>% select(id, matches("^b|a|d"))
#> # A tibble: 4 x 6
#>      id    a1    d6     d    a2    b6
#>   <int> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1     1  1.00  5.00  4.00  2.00  3.00
#> 2     2  1.00  5.00  4.00  2.00  3.00
#> 3     3  1.00  5.00  4.00  2.00  3.00
#> 4     4  1.00  5.00  4.00  2.00  3.00

Created on 2018-05-24 by the reprex package (v0.2.0).


#3

Thanks Curtis, unfortunately I don't think this method would work for my use case.

x is dynamic from another data source so I cannot manually create the regex.

I would also like to use x to order the columns as well as I will be uniting them into a numeric vector where the order is important. I could do a named vector but still have a similar problem in ordering that vector using the prefix strings.


#4

I'm not sure your problem is well-defined right now, so difficult to give you any practical solution. One example of the approach you can take is below:

suppressMessages(library(tidyverse))

y <- tibble(id = 1:4, a1 = 1, d6 = 5, d = 4, a2 = 2, b6 = 3, s1 = 10, r2 = 7)

x <- c("a", "d", "b")
nms <- names(y)
correct_order <- 
  purrr::map(x, function(prefix) nms[grepl(pattern = paste0("^", prefix), x = nms)]) %>% 
  purrr::flatten_chr()

pattern <- paste("^", x, "|", collapse = "", sep = "") %>%
  stringr::str_replace("\\|$", replacement = "")

res <- y %>% select(id, matches(pattern))

res <- res[correct_order]
res
#> # A tibble: 4 x 5
#>      a1    a2    d6     d    b6
#>   <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1     1     2     5     4     3
#> 2     1     2     5     4     3
#> 3     1     2     5     4     3
#> 4     1     2     5     4     3

#5

To get the x matches in sorted order (as indicated in your desired output):

y %>% 
  select(id, map(x, ~ names(y)[grep(paste0("^",.x), names(y))]) %>% unlist %>% sort)
     id    a1    a2    b6     d    d6
  <int> <dbl> <dbl> <dbl> <dbl> <dbl>
1     1     1     2     3     4     5
2     2     1     2     3     4     5
3     3     1     2     3     4     5
4     4     1     2     3     4     5

If you want to tack on any remaining columns at the end:

y %>% 
  select(id, 
         map(x, ~ names(y)[grep(paste0("^",.x), names(y))]) %>% unlist %>% sort, 
         everything())

#6

joels - I think this is on the right track but it puts the columns in alphabetical order (and I see the mistake I made in my question to lead you on this path). The order of x is a, d, b and the output should also follow this order. I will update my post.


#7

Thank you Misha! I was able to work with this to get a functioning solution. I only needed to concatenate the id to the front of correct_order. I was hoping there was a simpler solution to simply order by prefix but this is effective.


#8

If you remove the sort step at the end of the map chain, then the columns will be in the order of the x prefixes.


#9

This does work, thanks joels!