Re-arranging columns in numerical order

siddharthprabhu · March 3, 2020, 5:37am

I have a data frame with numerical values as column names. These columns have resulted from a pivot_wider() operation and contain both positive and negative values. I would like to get the columns in ascending numerical order without having to hard code the variable names (since I need to repeat this operation on multiple data frames).

A reprex is given below.

library(tidyverse)

# This is the input data frame. Notice that the columns `-1` and `3` are out of place.
df <- tribble(~col1, ~col2, ~`-3`, ~`-2`, ~`0`, ~`1`, ~`2`, ~`-1`, ~`4`, ~`3`,
              "A", "B", NA, 1, 15, 98, 183, 445, 2, 487,
              "C", "D", NA, NA, 6, 26, 146, 571, NA, 82)

print(df)
#> # A tibble: 2 x 10
#>   col1  col2  `-3`   `-2`   `0`   `1`   `2`  `-1`   `4`   `3`
#>   <chr> <chr> <lgl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 A     B     NA        1    15    98   183   445     2   487
#> 2 C     D     NA       NA     6    26   146   571    NA    82

# This is the expected result.
output <- tribble(~col1, ~col2, ~`-3`, ~`-2`, ~`-1`, ~`0`, ~`1`, ~`2`, ~`3`, ~`4`,
                  "A", "B", NA, 1, 445, 15, 98, 183, 487, 2,
                  "C", "D", NA, NA, 571, 6, 26, 146, 82, NA)

print(output)
#> # A tibble: 2 x 10
#>   col1  col2  `-3`   `-2`  `-1`   `0`   `1`   `2`   `3`   `4`
#>   <chr> <chr> <lgl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 A     B     NA        1   445    15    98   183   487     2
#> 2 C     D     NA       NA   571     6    26   146    82    NA

Here's what I've tried so far.

# str_sort() doesn't give the desired result.
df %>% 
  select(col1, col2, str_sort(colnames(.), numeric = TRUE))
#> # A tibble: 2 x 10
#>   col1  col2   `-1`  `-2` `-3`    `0`   `1`   `2`   `3`   `4`
#>   <chr> <chr> <dbl> <dbl> <lgl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 A     B       445     1 NA       15    98   183   487     2
#> 2 C     D       571    NA NA        6    26   146    82    NA

# Setting the decreasing parameter is no good either.
df %>% 
  select(col1, col2, str_sort(colnames(.), decreasing = TRUE, numeric = TRUE))
#> # A tibble: 2 x 10
#>   col1  col2    `4`   `3`   `2`   `1`   `0` `-3`   `-2`  `-1`
#>   <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <lgl> <dbl> <dbl>
#> 1 A     B         2   487   183    98    15 NA        1   445
#> 2 C     D        NA    82   146    26     6 NA       NA   571

I attempted to break the problem down into smaller pieces by first selecting only the negative values and then only the positive values but the columns still need to be sorted.

df %>% 
  select_at(vars(contains("col"), contains("-"), !contains("-")))
#> # A tibble: 2 x 10
#>   col1  col2  `-3`   `-2`  `-1`   `0`   `1`   `2`   `4`   `3`
#>   <chr> <chr> <lgl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 A     B     NA        1   445    15    98   183     2   487
#> 2 C     D     NA       NA   571     6    26   146    NA    82

Now if I could just apply str_sort() to each set of selections, that would do the trick but I'm not able to figure out how. Can someone please help?

^{Created on 2020-03-03 by the reprex package (v0.3.0)}

technocrat · March 3, 2020, 6:08am

suppressPackageStartupMessages(library(dplyr)) 
suppressPackageStartupMessages(library(tribble)) 
#> Error in library(tribble): there is no package called 'tribble'

df <- tribble(~col1, ~col2, ~`-3`, ~`-2`, ~`0`, ~`1`, ~`2`, ~`-1`, ~`4`, ~`3`,
              "A", "B", NA, 1, 15, 98, 183, 445, 2, 487,
              "C", "D", NA, NA, 6, 26, 146, 571, NA, 82)

collate <- function(x){
  x %>% select(sort(colnames(df))) 
}

collate(df)
#> # A tibble: 2 x 10
#>    `-1`  `-2` `-3`    `0`   `1`   `2`   `3`   `4` col1  col2 
#>   <dbl> <dbl> <lgl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
#> 1   445     1 NA       15    98   183   487     2 A     B    
#> 2   571    NA NA        6    26   146    82    NA C     D

^{Created on 2020-03-02 by the reprex package (v0.3.0)}

This gets part way there. I'm not sure whether col1 and col2 are always present. If they are

collate <- function(x){
  x %>% select(sort(colnames(df))) %>% select(col1, col2, everything())
}

siddharthprabhu · March 3, 2020, 6:26am

@technocrat Thanks for your response. Yes, the columns col1 and col2 are always present.

Just to clarify, I need the columns to be arranged from smallest to largest value; so the order should be -3, -2, -1, 0, 1, 2, 3, 4. So we still need to tackle the issue of negative values being arranged as -1, -2, -3 (as seen in your output too).

technocrat · March 3, 2020, 6:28am

Thanks for the clarification; I'll see want I can gin up.

joels · March 3, 2020, 6:53am

I'm not sure what form your data frame is in before pivot_wider, but pivot_wider will respect the original order of the names_from column. So if you sort the column that will be pivoted to become column names, then they will be sorted after pivoting as well.

For example:

# Start with a long data frame
dfl = df %>% 
  pivot_longer(cols=`-3`:`3`) %>% 
  mutate(name=as.numeric(name)) 

dfl
#>    col1  col2   name value
#>    <chr> <chr> <dbl> <dbl>
#>  1 A     B        -3    NA
#>  2 A     B        -2     1
#>  3 A     B         0    15
#>  4 A     B         1    98
#>  5 A     B         2   183
#>  6 A     B        -1   445
#>  7 A     B         4     2
#>  8 A     B         3   487
#>  9 C     D        -3    NA
#> 10 C     D        -2    NA
#> 11 C     D         0     6
#> 12 C     D         1    26
#> 13 C     D         2   146
#> 14 C     D        -1   571
#> 15 C     D         4    NA
#> 16 C     D         3    82

Note the difference in the column order if we sort before pivoting:

dfl %>% 
  pivot_wider(names_from=name, values_from=value)
#> # A tibble: 2 x 10
#>   col1  col2   `-3`  `-2`   `0`   `1`   `2`  `-1`   `4`   `3`
#>   <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 A     B        NA     1    15    98   183   445     2   487
#> 2 C     D        NA    NA     6    26   146   571    NA    82

dfl %>% 
  arrange(name) %>% 
  pivot_wider(names_from=name, values_from=value)
#> # A tibble: 2 x 10
#>   col1  col2   `-3`  `-2`  `-1`   `0`   `1`   `2`   `3`   `4`
#>   <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 A     B        NA     1   445    15    98   183   487     2
#> 2 C     D        NA    NA   571     6    26   146    82    NA

^{Created on 2020-03-02 by the reprex package (v0.3.0)}

technocrat · March 3, 2020, 7:17am

suppressPackageStartupMessages(library(dplyr)) 
suppressPackageStartupMessages(library(tibble)) 

df <- tribble(~col1, ~col2, ~`-3`, ~`-2`, ~`0`, ~`1`, ~`2`, ~`-1`, ~`4`, ~`3`,
              "A", "B", NA, 1, 15, 98, 183, 445, 2, 487,
              "C", "D", NA, NA, 6, 26, 146, 571, NA, 82)

collate <- function(x){
  sort(as.numeric(colnames(x)[3:length(x)])) %>% 
  as.character() 
}
headers <- collate(df)
df %>% select(col1, col2, all_of(headers))
#> # A tibble: 2 x 10
#>   col1  col2  `-3`   `-2`  `-1`   `0`   `1`   `2`   `3`   `4`
#>   <chr> <chr> <lgl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 A     B     NA        1   445    15    98   183   487     2
#> 2 C     D     NA       NA   571     6    26   146    82    NA

^{Created on 2020-03-02 by the reprex package (v0.3.0)}

joels · March 3, 2020, 7:25am

Given a wide data frame, I think you could go with match:

df %>% select(col1, col2, match(-3:4, names(df)))

  col1  col2  `-3`   `-2`  `-1`   `0`   `1`   `2`   `3`   `4`
  <chr> <chr> <lgl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 A     B     NA        1   445    15    98   183   487     2
2 C     D     NA       NA   571     6    26   146    82    NA

Or, if we know we want to sort all columns with numeric names, regardless of their particular values:

num.cols = sort(as.numeric(names(df)[!grepl("col", names(df))]))
df %>% select(col1, col2, match(num.cols, names(df)))

siddharthprabhu · March 3, 2020, 8:02am

@joels Wow, I never knew pivot_wider() had that functionality. Thank you! One simple call to arrange() has solved my problem.

@technocrat Appreciate your efforts mate, but I think the arrange() solution is a lot more elegant.

system · March 10, 2020, 8:02am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.