Error in mutate_impl(.data, dots) :
Evaluation error: argument "x" is missing, with no default.
My question is, how does row_number() work without specifying the x argument? I'd like to be able to add similar functionality to some of my own functions.
I believe this functionality is limited to using row_number inside of single table dplyr verbs, like mutate (see last line of row_number documentation).
As far as implementation goes, I don't see anything purely in R that makes this work, so I'm guessing it's implemented in the C++ code, but I haven't dug into it.
row_number() is part of the class of hybrid evaluation functions in dplyr. When possible, these are evaluated in C++ and in the context of the data frame you are mutating / filtering / etc.
The hybrid implementation of row_number() in particular is defined here:
And I think it gets registered here:
In that second link, you can see all the other hybrid functions!
You can actually check if an expression is going to use hybrid evaluation or not (at least in dev dplyr)
suppressPackageStartupMessages(library(dplyr)) # 0.8.0.9000
d <- tibble(a = 1:5)
# A cpp call
hybrid_call(d, row_number())
#> <hybrid evaluation>
#> call : dplyr::row_number()
#> C++ class : dplyr::hybrid::internal::RowNumber0<dplyr::NaturalDataFrame>
# A R call
hybrid_call(d, row_number() + 1)
#> <standard evaluation>
#> call : row_number() + 1
Created on 2019-01-04 by the reprex package (v0.2.1.9000)
RowNumber0 is defined in that first link to the cpp file where all of the row_number() implementation is.
As @pete mentioned, in the newest dplyr there is also some extra code in row_number() using from_context("..group_size") when x is missing. If you try and call that outright, you will be disappointed:
dplyr:::from_context("..group_size")
# Error: NULL should only be called in a data context
But (and you should not do this) use it inside of a mutate() call where the "context" is correct, and you get real results:
suppressPackageStartupMessages(library(dplyr)) # 0.8.0.9000
d <- tibble(a = 1:5)
# using it in the right context
mutate(d, x = dplyr:::from_context("..group_size"))
#> # A tibble: 5 x 2
#> a x
#> <int> <int>
#> 1 1 5
#> 2 2 5
#> 3 3 5
#> 4 4 5
#> 5 5 5
# it just returns the group size
mtcars %>%
group_by(cyl) %>%
mutate(
group_size = dplyr:::from_context("..group_size")
) %>%
select(cyl, group_size)
#> # A tibble: 32 x 2
#> # Groups: cyl [3]
#> cyl group_size
#> <dbl> <int>
#> 1 6 7
#> 2 6 7
#> 3 4 11
#> 4 6 7
#> 5 8 14
#> 6 6 7
#> 7 8 14
#> 8 4 11
#> 9 4 11
#> 10 6 7
#> # … with 22 more rows
Created on 2019-01-04 by the reprex package (v0.2.1.9000)
The moral of the story is, just let dplyr use these hybrid evaluation functions, and there currently is no way for you to access enough information at the R level to create custom ones for your own use.
this has changed, maybe due to new behaviour, a breaking change
Some thoughts about how it works and why you observe this:
some of dplyr magic comes from something called hybrid evaluation. You'll find some reference of in release candidate blog post
basically, dplyr executes some codes in C++ not R, and try to identify some function call to use either c++ call or R call. (or at least I think of it that way... ). So when it identifies row_number() inside a mutate or a summary, it does not call the R version. (At least version <= 0.7.6).
There is some C code about row_number dispatch that illustrate this
new version has a help hybrid_call to see some of that dark magic
library(dplyr)
#>
#> Attachement du package : 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
packageVersion("dplyr")
#> [1] '0.8.0.9000'
mtcars %>%
slice(1:5) %>%
select(1:4) %>%
hybrid_call(mutate(row = row_number()))
#> <standard evaluation>
#> call : mutate(row = row_number())
it does not back me up in new >= 0.8 version though as it said standard evalutation...
But as you found code changed and R function row_number() now deals with empty x.
NEWS from 0.8.0 says
Hybrid evaluation has been completely redesigned for better performance and stability.
So it may be why difficult to explain or illustrate previous behavioir
Hope it is not too confusing and it helps in some ways