Dplyr::row_number v.s. row_number()

It seems that using dplyr::row_number() instead of row_number() can sometime yield an error (see reprex below). This has been brought up before as a Github issue, but the explanation was unclear to me. I was wondering if anyone has a different explanation or could break down the one given in the Github issue?

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

iris <- head(iris)

mutate(iris, id = row_number())
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species id
#> 1          5.1         3.5          1.4         0.2  setosa  1
#> 2          4.9         3.0          1.4         0.2  setosa  2
#> 3          4.7         3.2          1.3         0.2  setosa  3
#> 4          4.6         3.1          1.5         0.2  setosa  4
#> 5          5.0         3.6          1.4         0.2  setosa  5
#> 6          5.4         3.9          1.7         0.4  setosa  6

mutate(iris, id = dplyr::row_number())
#> Error in mutate_impl(.data, dots): Evaluation error: argument "x" is missing, with no default.
2 Likes

It has already been fixed in last dev version of dplyr. See discussion here

The explanation is given by @krlmlr in this issue

when you call dplyr::row_number, it is the R version which is called. This R version is equivalent to

rank(x, ties.method = "first", na.last = "keep")

and needs a x argument.

without dplyr::, it is the internal C++ version that allow a powerful behaviour included a working behaviour with database.

is it clearer to you?

5 Likes

Yes, much clearer now. Thank you!

A post was split to a new topic: error on windows but not mac: row_number