 # Unexpected behavior of the order() function inside the tidy framework

Disclaimer: I know the tidy solution to the issue described below and that is the `dplyr::min_rank()` function.

As my potential helpers may already know, the base R functions `order()` and `sort()` are different in that the former outputs a vector of indices and the latter outputs the sorted version of the vector you pass to it. It turns out that the `order()` function does not work as intended in the tidy framework. Let's consider the following tibble:

``````set.seed(123)

dat <- tibble(
unit = LETTERS[1:5],
a = rnorm(5),
b = rnorm(5)
)

dat

# A tibble: 5 x 3
unit        a      b
<chr>   <dbl>  <dbl>
1 A     -0.560   1.72
2 B     -0.230   0.461
3 C      1.56   -1.27
4 D      0.0705 -0.687
5 E      0.129  -0.446
``````

Now, I would like to create two new columns: `rank_a` and `rank_b`, which, as the names imply, contain the rank (or order) of each value in their corresponding columns.

``````dat <- dat %>%
mutate(
rank_a = order(a),
rank_b = order(b),
)

dat

# A tibble: 5 x 5
unit        a      b rank_a rank_b
<chr>   <dbl>  <dbl>  <int>  <int>
1 A     -0.560   1.72       1      3
2 B     -0.230   0.461      2      4
3 C      1.56   -1.27       4      5
4 D      0.0705 -0.687      5      2
5 E      0.129  -0.446      3      1
``````

A close look at the tibble above reveals that the `order()` function did not work as intended. An example is that the table states that the value `0.129` in the `a` column (i.e. unit E) is the 3rd lowest value in the column. This is not true! The 3rd lowest value is actually `0.0705` (i.e. unit D)! Interestingly enough, the function works as expected outside the tidy framework.

``````dat\$a[order(dat\$a)]

 -0.56047565 -0.23017749  0.07050839  0.12928774  1.55870831
``````

The `rank_b` column suffers from the same issue.

I'm sorry, but I think you're mistaken. `order` performs exactly as it should.

Here, `3` corresponding to `0.129` doesn't indicate that it is the 3rd lowest value. It indicates that the 3rd element of the vector is the maximum value. The same argument holds or all others.

Let's illustrate:

``````> set.seed(seed = 31581)
>
> u <- sample(x = 1:10)
> u
  4  5  1  9  2  6 10  7  8  3
>
> v <- order(u)
> v
  3  5 10  1  2  6  8  9  4  7
>
> u[v]
  1  2  3  4  5  6  7  8  9 10
``````

Hope this helps.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.