Meaning of dot (".") in front of certain arguments in the Tidyverse reference

brah · November 4, 2022, 8:48pm

I'm very fussy about ensuring I understand how to read a reference manual correctly.

So, sorry if this is something that's general knowledge, but I'd like clarification on this.

Why is there a dot (".") before certain arguments in the screenshot below (e.g. before "data") but not before others (e.g. "x")?

This screenshot is from here.

Thanks in advance.

technocrat · November 4, 2022, 9:50pm

.data has the leading . to distinguish it from the closure data built-in. (A closure is the function code without an argument.) x is not reserved, so it requires no leading period.

Both .data and x are parameters that are used in the function to refer to the object it receives. But when using the function, these are called arguments. (Yeah, confusing.)

So, we look at help(group_by) to see the arguments to the function signature

group_by(.data, ..., .add = FALSE, .drop = group_by_drop_default(.data))

ungroup(x, ...)

and see below that .data is a data frame (or tibble) and x is a tbl object. A tbl object is what group_by() returns.

The mysterious \dots means that other arguments may be inserted, so long as it's possible to figure how which.

.add has a dot for reasons explained in help—the developers wanted to use [no dot] add for other purposes. It has a default of FALSE that can be explicitly over-ridded for reasons explained in the docs. .drop is similar.

In practice, rather than theory, .data is implicit because the function is lmost always used in a piped expression.

obj <- mtcars %>% group_by(cyl)

Here, the default .data is whatever data frame is passed by the pipe, %>% and cyl which is one of the mtcar variables is the argument being passed to $\dots$ .

If you create an object with grouped_by

ungroup(obj)

or

ungroup(x = obj)

Will undo the grouping.

Andrzej · November 5, 2022, 7:31pm

There are useful explanations here as well:
https://forum.posit.co/t/what-is-the-difference-between-and-data/76330

I have got some questions about it:

What does it mean ?
Sometimes I got an error:

object of type closure is not subsettable

Why does it pop up sometimes ?

Can you please elaborate a bit more clearly what is the difference between parameters and arguments ? Maybe with adding some examples.

brah · November 5, 2022, 7:50pm

Thank you for the deeply considerate explanation. With lots of new breadcrumbs for me to explore. I especially appreciate all the technical descriptions and nomenclature. Thank you.

technocrat · November 5, 2022, 9:05pm

People often name variables after builtins, such as data, D or t for a few. You can often get away with it, but sometimes the naming hierarchy pulls the builtin rather than the completed object.

Here's an example.

> df[1]
Error in df[1] : object of type 'closure' is not subsettable
>

Because no user-defined df is in namespace, this is what you get.

A closure is an object (everything in R is an object) all by itself with no ()


> df[1]
Error in df[1] : object of type 'closure' is not subsettable
> df
function (x, df1, df2, ncp, log = FALSE) 
{
    if (missing(ncp)) 
        .Call(C_df, x, df1, df2, log)
    else .Call(C_dnf, x, df1, df2, ncp, log)
}
<bytecode: 0x7fc71f1ab548>
<environment: namespace:stats>
> class(df)
[1] "function"
> str(df)
function (x, df1, df2, ncp, log = FALSE)  
> df
function (x, df1, df2, ncp, log = FALSE) 
{
    if (missing(ncp)) 
        .Call(C_df, x, df1, df2, log)
    else .Call(C_dnf, x, df1, df2, ncp, log)
}
<bytecode: 0x7fc71f1ab548>
<environment: namespace:stats>

Now, let's create our own df and see what we can get away with.

df <- mtcars
df[1]
#>                      mpg
#> Mazda RX4           21.0
#> Mazda RX4 Wag       21.0
#> Datsun 710          22.8
#> Hornet 4 Drive      21.4
#> Hornet Sportabout   18.7
#> Valiant             18.1
#> Duster 360          14.3
#> Merc 240D           24.4
...{snip}

# now df is a function
df(df[1])
#> Error in df(df[1]): argument "df1" is missing, with no default
# mtcars and our df are identical
identical(mtcars,df)
#> [1] TRUE
# so df() and mtcars() should operate the same?
mtcars(df[1])
#> Error in mtcars(df[1]): could not find function "mtcars"

It's subtle and it's not easy to predict when it will be a problem and when not. So, I just make sure that I don't use a builtin. When in doubt just enter the name in the console.

AlexisW · November 5, 2022, 10:21pm

Another consideration worth mentioning is that these functions use ... so they can accept arbitrarily named arguments. In these cases the tidyverse tends to use .argument for the arguments of the function, to distinguish them from arguments you want to include in ....

As mentioned in the page you linked, it allows you to write:

tibble(add = c(1,1,2,2),
       variable = c(1,2,3,4)) |>
  group_by(add, .add = TRUE)

without conflict in the argument names.

It can even get worse for some functions: as described here, when trying to pass an argument to ... which has the same name as an argument of the main function, it will be used as argument of the main function and ... will be empty.

As described on that page, this is the same reason why in base R lapply(X, FUN, ...) has uppercase argument names, so you can pass a lowercase x and fun in ... without conflict.

AlexisW · November 5, 2022, 10:27pm

Just adding that "closure" means "function" in R.

(there is some subtlety in that, in computer science, a closure is a function that has its own memory; in R functions are all closures so these two words are synonyms, in other programming languages you can have both functions and closures)

So the "object of type 'closure' is not subsettable" is not only for built-ins:

my_function <- function(x){
  1
}

my_function[1:10]
#> Error in my_function[1:10]: object of type 'closure' is not subsettable

^{Created on 2022-11-05 by the reprex package (v2.0.1)}

but it happens a lot with built-ins. You are trying to subset an object that you failed to create, but it so happens that there is already a built-in function with the same name. So R thinks you want to subset that function, it fails.

Andrzej · November 6, 2022, 8:15am

Thank you both very much indeed for detailed explanation. This is not my topic but it comprises interesting to me ideas.

Does it have to do anything with the concept of "passing the dots" ? I watched Lionel's video about it.
I try to understand that as well.

So basically the solution would be just to avoid conflicting (duplicating) names and properly naming of arguments of functions and correctly distinguish them from built-in functions ?

AlexisW · November 6, 2022, 4:47pm

Yes, dots are essentially used for two things: first, to pass arguments to other functions, second, to accept arbitrary arguments.

Here is an example of the first:

mean_of_half <- function(x, ...){
  half <- x/2
  mean(half, ...)
}

mean_of_half(c(1, 10, NA), na.rm = TRUE)
#> [1] 2.75

^{Created on 2022-11-06 by the reprex package (v2.0.1)}

In this case, mean_of_half() is a function which just calls the R builtin mean(), after dividing its input. What is great with the ... is that since I pass them to mean(), when using mean_of_half() I can give any argument that mean() understands, and I don't need to define them myself when writing mean_of_half(). I don't even need to know they exist.

An example of the second case is dplyr::group_by(). In the source code of group_by(), there is no assumption about what the possible column names are, it can accept anything. Here you are not "passing the dots" to another function, you are using them directly. A very simple example could be:

print_dots <- function(...){
  list(...)
}

print_dots(a=5, b=7)
#> $a
#> [1] 5
#> 
#> $b
#> [1] 7

^{Created on 2022-11-06 by the reprex package (v2.0.1)}

Sure, that would be a solution. But how can you ensure it? First, there are two "you" in that sentence: the person writing the function, and the person using it.

Technically, it's on the user to read the manual and ensure that the arguments they use are compatible with the function. But there are several problems: as a typical user, do you fully read the entire manual of each and every function you use? I know I don't, I will typically read the parts that I need, and I will rely on my memory for functions I used before. And what if new arguments get added? For example dplyr::mutate() got its arguments before and after somewhat recently. It's totally possible that I have data frames with columns of that name, that means my code that I had written and that used to work will suddenly break after an update. Oh, and what if you, the user, are combining two functions written by other people? If ... is being passed to an existing function you don't get to choose how its arguments are named.

So, when you're writing such a function, you'd rather assume that the user may be stupid and not read the documentation, and make sure that really there is (almost) no way for a user to accidentally use arguments with the same name in .... Which is why mutate() actually called these arguments .before and .after, because it's very unlikely that these are column names in my data frames.

Andrzej · November 6, 2022, 8:25pm

Thank you for detailed explanation. Much obliged for this.

Very rarely.

I do the same.

system · November 13, 2022, 8:25pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.