dplyr - group_by variables many variables

Hello,
I recently created many variables (s1,s2,s3,...,s20).
I can run group_by(s1,s2,s3,s4,s5,s6,s7,s8,s9,s10,s11,s12,s13,s14,s15,s16,s17,s18,s19,s20)
But I don't know how to simplify that only declaring the first and last variable.
Something like group_by(s1:s20)
Is it possible to do that?
Thanks for your time and interest.

1 Like

Looks like you can do this with across():

library(tidyverse)

tibble(s1 = 1:2, s2 = 1:2, v = 5:6) %>%
  group_by(across(num_range("s", 1:2)))
#> # A tibble: 2 x 3
#> # Groups:   s1, s2 [2]
#>      s1    s2     v
#>   <int> <int> <int>
#> 1     1     1     5
#> 2     2     2     6

Created on 2021-06-10 by the reprex package (v2.0.0)

And {tidyselect} has lots of other selection helpers if your name matching ever gets more complicated! :smiley:

1 Like

Thanks a lot, EeethB.
Great tool number_range.
It worked flawlessly.

1 Like

You can also use the library (tidyverse) as below:

iris|>
  group_by(across(c(starts_with("S"))))

The key function across will help you use functions usually available to dplyr::select.

It was an option, but I already have other variables that start with "s".

How did you adapt the solution code to go beyond requiring "s" in the beginning of the variable? I imagine what you were looking for was a location range, everything between s1 to s20 but that could include, for example, an "m4" variable name.

1 Like

@isadora Just to be clear, I believe num_range() chooses columns s1, s2, s3, ..., s20 regardless of location, as opposed to s1:s20, which selects all columns between. So if there's another variable between s1 and s20, it does not get selected. Hopefully that answers your question.

There are all sorts of other select helpers, including one that matches using arbitrary regular expressions! :woman_mage: You can read about all of them here:

You can also use

group_by_at to supply a string vector of the columns names

or

group_by_if to for example group by all character columns

Hi,
I just created the variables using letters and numbers not already taken.
Using the position and selecting every variable between s1 to s20 was not possible.
Still, the answer I get was functional.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.