Function argument naming conventions (`.x` vs `x`)

In a lot of tidyverse packages, function arguments are prefixed with a ., for instance purrr:

purrr::map
#> function (.x, .f, ...) 
#> {
#>     .f <- as_mapper(.f, ...)
#>     .Call(map_impl, environment(), ".x", ".f", "list")
#> }
#> <environment: namespace:purrr>

As I understand it, this is to minimise ambiguity/clashes with other variable names in the current R session. Is this the preferred style for naming function arguments? Are there times when this isn't the preferred style?

I've had a look at the tidyverse style guide, but it doesn't cover this explicitly (should I raise an issue about this?).

EDIT: Raised as an issue here: Function arguments prefixed with `.` · Issue #75 · tidyverse/style · GitHub

Issue closed by Hadley:

It is not a general principle. It's something that only apply to FP tools like purrr, because you don't want to confuse the arguments to map() with the arguments to .f

4 Likes

This is very well explained in purrr. Example from the map() help file:

• For a single argument function, use .
• For a two argument function, use .x and .y
• For more arguments, use ..1, ..2, ..3 etc

This is not a styling question (and that's why it is not in the style guide). These are package specific conventions.

My understanding is exactly the same as yours -- dots in arguments are designed to reduce a chance of name collision since you are probably not going to name your function with a dot in front.

I think, it's a good idea to raise an issue to clarify the status of this convention.

4 Likes

I assume you're referring to the purrr::as_mapper help, but as far as I can tell, this is referring to the use of formulas to create anonymous functions:

If a formula, e.g. ~ .x + 2, it is converted to a function. There are three ways to refer to the arguments:

  • For a single argument function, use .
  • For a two argument function, use .x and .y
  • For more arguments, use ..1, ..2, ..3 etc

I'm not sure I quite follow. What is a style guide if not a set of established conventions to make our lives easier?

I can understand it being package specific at the moment, but if it's good enough for the tidyverse, I'd imagine it's good enough for the rest of us. Unless of course, there's a specific reason, which is what I'm trying to understand.

Does this make sense? It's possible I've misunderstood your points

Thanks, I suppose either way we'll get an answer.

1 Like

I was referring to the purrr::map() help, since you gave map() as an example, but this applies to any purrr formula.

You are right that the section I pasted refers specifically to the purrr formulae, but my point is that this notation is a convention that the package developers established. It is specific to a package or a set of packages. If you write a package, you can use it or make your own convention on how to refer to functions and variables.

I feel that a styling guide is about writing code in a standard and more readable way in a language. But imposing a package-specific convention to the entire language is beyond its role.

But I guess how constraining a style guide is is a question of opinion and this could belong in it depending on where you stand in the trade-off between standardisation vs free expression in different packages and package-universes.

Of note, there are some inconsistencies on this within older tidyverse packages (e.g. in dplyr, the sample() functions refer to tbl rather than .tbl. But this is probably something that should be fixed in future releases). In more recent packages like purrr, this has been pushed further and become much more standard.

Edit: This motivated me to submit a PR to dplyr on this actually :slightly_smiling_face:

1 Like

Looking at it more closely, dplyr is actually all over the place in this respect. Variables and functions are sometimes referred to with the dot, sometimes without, tibbles are sometimes referred to as .data, sometimes as tlb, sometimes as .tbl... :stuck_out_tongue: Purrr documentation is much more consistent.

1 Like

Looking at the tidyverse as a whole, it seems that it is slowly becoming a preferred style. For instance:

  • in ggplot2, data is referred to as data

  • in dplyr, it is referred to as .data some of the time, with a lot of inconsistencies (will submit a PR on this). Other arguments sometimes use the dot and sometimes don't. It is a bit messy... but there is a first flavour of this dot idea

  • in purrr, this idea is fully fledged and applied consistently to every argument, function and variable

So, to go back to your first question, I think it is something that has grown over time amongst the developers of the tidyverse, but that is still far from being consistently implemented in it. If new package releases implement it consistently across the entire tidyverse, then the question of whether it should be added to the style guide or not may become relevant. But at this point, it is very purrr specific (with some messy dplyr non-fully fledged applications).

2 Likes

It helps to have functions which take other functions and further arguments to pass on (e.g., map) use dot-prefixed arguments to avoid name collisions with those further arguments.

For example, this would be bad:

non_dot_map <- function(x, f, ...) {
    map(.x = x, .f = f, ...)
}

non_dot_map(c("my", "your"), startsWith, x = "my dog")
# [[1]]
# NULL

What I want is list(startsWith("my dog", "my"), startsWith("my dog", "your")). But the x meant to be passed onto startsWith() will be caught by non_dot_map(). Honestly, I don't know how this returned anything at all, since c("my", "your") should've been passed to the .f argument.

It has nothing to do with name collisions in parent environments. If the function has an argument named x, then x in that function will always* mean that argument. If a function uses an argument named .x, then it could really mess things up when used with map.

Base R uses a different convention: all-caps. With the *apply() family, they have arguments like X, FUN, and SIMPLIFY. As far as I know, no other functions use those arguments.

* I'm sure there are ways to intentionally violate this assumption, but then you're asking for it.

4 Likes

So, going back to dplyr, then maybe it would make sense to use .data for the mutate and summarise families, but not for the others. And do this consistently.

In that case, yes. While it's not passing arguments onto other functions, it does avoid conflicts in expressions such as mtcars %>% mutate(data = "something"). Looking through the dplyr docs and such, it's hard to find a documented function that both:

  • Uses the names of the arguments caught by ..., and
  • Has any named arguments that don't start with ..

The only one I've found so far is select_vars() and rename_vars(), and their Details section starts:

For historic reasons, the vars and include arguments are not prefixed with ..

Code used to winnow down functions to look at
# Find dplyr non-generic functions which take ... and a non-dot-prefixed
# argument. Returns a logical indicating if it meets the criteria, or NA if
# isS3stdGeneric() fails. It seems some functions have crazy bodies.
has_dots_and_non_prefixed <- function(x) {
  just_stop <- FALSE
  if (!is.function(x)) {
    return(FALSE)
  }
  tryCatch(
    expr    = if (isS3stdGeneric(x)) return(FALSE),
    error   = function(err) just_stop <<- TRUE,
    finally = if (just_stop) return(NA)
  )
  arg_names <- names(formals(x))
  return(
    "..." %in% arg_names &
      any(grepl("^[^\\.]", arg_names, perl = TRUE))
  )
}


dplyr_ns <- asNamespace("dplyr")
dplyr_varnames <- getNamespaceExports("dplyr")

results <- vapply(
  X = dplyr_varnames,
  FUN = function(vname) {
    f <- get(vname, envir = dplyr_ns)
    has_dots_and_non_prefixed(f)
  },
  FUN.VALUE = logical(1)
)

print(names(results)[which(results)])
#  [1] "group_by_prepare" "rename_vars"      "summarize_each"  
#  [4] "lead"             "with_order"       "compare_tbls"    
#  [7] "lag"              "src_postgres"     "select_vars"     
# [10] "add_count"        "one_of"           "mutate_each"     
# [13] "src"              "all_equal"        "compare_tbls2"   
# [16] "auto_copy"        "n_distinct"       "summarise_each"  
# [19] "src_mysql"        "bench_tbls"       "make_tbl"        
# [22] "count"

Some of the functions listed above aren't listed in the documentation page, but they are exported and do have documentation.

1 Like

Theres a lot of discussion on this here, but the short answer is: tidyverse generally uses .x naming scheme for functions that take ... to avoid name collisions, and a normal x for functions that dont use .... That is also where the supposed inconsistencies in dplyr come from.

8 Likes

This is basically the guideline I was looking for and I think the one that makes the most sense. Thanks everyone for the help.

Yeah, so I'd frame the "inconsistencies" (which, yes, are real) as sort of an evolution— one that attempts to balance consistency with not busting legacy code. The tidyverse team tends to work on packages in bursts (for sanity's and quality's sake), and so the introduction of "tidy dots" and tidy eval conventions has been/is being rolled out as the packages are updated (e.g. ggplot2 is getting quasiquotation, as described in NEWS.md).

I think the forcats 0.3.0 release (under New features) is a good example, I think:

Consistent with other tidyverse packages, all functions that take ... now use tidy dots. This means that you can use !!! to splice in a list of values.

All other arguments to functions that take ... gain a . prefix in order to avoid unhelpful matching of named arguments.

5 Likes

Ah! Great! That is very useful. Thank you.

There are other inconsistencies in the dplyr documentation though (maybe, here as well, there is an answer I don't know about?)

For instance:

  • arrange first argument:

".data: A tbl."

  • arrange_all first argument:

`.tbl: A tbl object."

Are these not the same? And if not, what is the difference?

Totally. I can see that in the development of the tidyverse and it makes a lot of sense that things are evolving fast with so many amazing people pushing a relatively new language in totally new (and so exciting) ways.

1 Like

I still see some inconsistencies as regard with the dot. For instance, count():

count(x, ..., wt = NULL, sort = FALSE)

Why not count(.x, ..., wt = NULL, sort = FALSE)

But maybe this is a different case and there is again something I am missing?

Note: I guess I sounded super critical in my inconsistency comment, but I didn't mean it as a derogatory comment (I'll blame it on my French clumsiness :wink: ). The way I saw it (and still see it and @mara confirmed this with concrete elements) is that, as this notation and other new concepts evolve, previous packages are slowly being corrected, but there are still some inconsistencies as it all is a work in progress (which is good and the sign of an active universe).

1 Like

I guess the reason for that is pretty much what @mara said. count pretty much should have .x argument as I understand it. I would expect dplyr to be the tidyverse package with the most such inconsistencies since it was pretty much the start of the whole tidyverse thing.