Disclaimer: I know the tidy solution to the issue described below and that is the
As my potential helpers may already know, the base R functions
sort() are different in that the former outputs a vector of indices and the latter outputs the sorted version of the vector you pass to it. It turns out that the
order() function does not work as intended in the tidy framework. Let's consider the following tibble:
set.seed(123) dat <- tibble( unit = LETTERS[1:5], a = rnorm(5), b = rnorm(5) ) dat # A tibble: 5 x 3 unit a b <chr> <dbl> <dbl> 1 A -0.560 1.72 2 B -0.230 0.461 3 C 1.56 -1.27 4 D 0.0705 -0.687 5 E 0.129 -0.446
Now, I would like to create two new columns:
rank_b, which, as the names imply, contain the rank (or order) of each value in their corresponding columns.
dat <- dat %>% mutate( rank_a = order(a), rank_b = order(b), ) dat # A tibble: 5 x 5 unit a b rank_a rank_b <chr> <dbl> <dbl> <int> <int> 1 A -0.560 1.72 1 3 2 B -0.230 0.461 2 4 3 C 1.56 -1.27 4 5 4 D 0.0705 -0.687 5 2 5 E 0.129 -0.446 3 1
A close look at the tibble above reveals that the
order() function did not work as intended. An example is that the table states that the value
0.129 in the
a column (i.e. unit E) is the 3rd lowest value in the column. This is not true! The 3rd lowest value is actually
0.0705 (i.e. unit D)! Interestingly enough, the function works as expected outside the tidy framework.
dat$a[order(dat$a)]  -0.56047565 -0.23017749 0.07050839 0.12928774 1.55870831
rank_b column suffers from the same issue.