Unexpected behavior of the order() function inside the tidy framework

Disclaimer: I know the tidy solution to the issue described below and that is the dplyr::min_rank() function.

As my potential helpers may already know, the base R functions order() and sort() are different in that the former outputs a vector of indices and the latter outputs the sorted version of the vector you pass to it. It turns out that the order() function does not work as intended in the tidy framework. Let's consider the following tibble:

set.seed(123)
 
dat <- tibble(
  unit = LETTERS[1:5],
  a = rnorm(5),
  b = rnorm(5)
)

dat

# A tibble: 5 x 3
  unit        a      b
  <chr>   <dbl>  <dbl>
1 A     -0.560   1.72 
2 B     -0.230   0.461
3 C      1.56   -1.27 
4 D      0.0705 -0.687
5 E      0.129  -0.446

Now, I would like to create two new columns: rank_a and rank_b, which, as the names imply, contain the rank (or order) of each value in their corresponding columns.

dat <- dat %>%
  mutate(
    rank_a = order(a),
    rank_b = order(b),
  )

dat 

# A tibble: 5 x 5
  unit        a      b rank_a rank_b
  <chr>   <dbl>  <dbl>  <int>  <int>
1 A     -0.560   1.72       1      3
2 B     -0.230   0.461      2      4
3 C      1.56   -1.27       4      5
4 D      0.0705 -0.687      5      2
5 E      0.129  -0.446      3      1

A close look at the tibble above reveals that the order() function did not work as intended. An example is that the table states that the value 0.129 in the a column (i.e. unit E) is the 3rd lowest value in the column. This is not true! The 3rd lowest value is actually 0.0705 (i.e. unit D)! Interestingly enough, the function works as expected outside the tidy framework.

dat$a[order(dat$a)]

[1] -0.56047565 -0.23017749  0.07050839  0.12928774  1.55870831

The rank_b column suffers from the same issue.

I'm sorry, but I think you're mistaken. order performs exactly as it should.

Here, 3 corresponding to 0.129 doesn't indicate that it is the 3rd lowest value. It indicates that the 3rd element of the vector is the maximum value. The same argument holds or all others.

Let's illustrate:

> set.seed(seed = 31581)
> 
> u <- sample(x = 1:10)
> u
 [1]  4  5  1  9  2  6 10  7  8  3
> 
> v <- order(u)
> v
 [1]  3  5 10  1  2  6  8  9  4  7
> 
> u[v]
 [1]  1  2  3  4  5  6  7  8  9 10

Hope this helps.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.