Understanding dplyr::distinct source code

Hi all,
Apologise if my question in naive, but I was trying to understand the source code of distinct function from dplyr, it looks something like this:

distinct <- function(.data, ..., .keep_all = FALSE) {
  UseMethod("distinct")
}

Does this mean there is another function "distinct" that gets called by UseMethod?
In docs for UseMethods I found this: When a function calling UseMethod("fun") is applied to an object with class attribute c("first", "second") , the system searches for a function called fun.first and, if it finds it, applies it to the object. If no such function is found a function called fun.second is tried. If no class name produces a suitable function, the function fun.default is used, if it exists, or an error results.

Implying "fun" or in our case "distinct" is an object? I am not sure how to understand this any help appreciated.

Thanks,

That part is easy: everything in R is an object, including fun.

The rest is something most users will never have to worry about to use the dplyr package, but may come up in using distinct as a generic

which means that packages can provide implementations (methods) for other classes

UseMethod is

a special function and it behaves differently from other function calls. The syntax of a call to it is UseMethod(generic, object) , where generic is the name of the generic function, object is the object used to determine which method should be chosen

R Language Definition §5.4

In the case of dplyr::distinct, the call to UseMethod passes only the generic. I think, but I'm unsure, that the object in the argument is implicit, provided by the evaluation environment, usually a data frame or tibble within the tidyverse. Because distinct is also a class of type function it seems that this is an instance of recursion, but I'm not really sure.

methods(class = "data.frame")
#>  [1] [             [[            [[<-          [<-           $<-          
#>  [6] aggregate     anyDuplicated anyNA         as.data.frame as.list      
#> [11] as.matrix     by            cbind         coerce        dim          
#> [16] dimnames      dimnames<-    droplevels    duplicated    edit         
#> [21] format        formula       head          initialize    is.na        
#> [26] Math          merge         na.exclude    na.omit       Ops          
#> [31] plot          print         prompt        rbind         row.names    
#> [36] row.names<-   rowsum        show          slotsFromS3   split        
#> [41] split<-       stack         str           subset        summary      
#> [46] Summary       t             tail          transform     type.convert 
#> [51] unique        unstack       within       
#> see '?methods' for accessing help and source code

Created on 2021-01-10 by the reprex package (v0.3.0.9001)

1 Like

In case you wanted to see more of the internals than only only method stub, you can look here :slight_smile:

"dplyr/distinct.R at master · tidyverse/dplyr · GitHub" https://github.com/tidyverse/dplyr/blob/master/R/distinct.R

Ahh, a recursion makes sense!

> methods(generic.function = "distinct")
[1] distinct.data.frame* distinct.default*    distinct.sf*  

So UseMethod("distinct") when called would go an search for class attributes c("data.frame", "default") and "sf" in my case since I have that package loaded.

Got it, thanks!

1 Like

Hey thanks, I did. My code in the initial post is from the source code distinct.R.
I was trying to understand how it was built.

That's great. Also this book provides explanations on this issue, as well as many others. It might be useful resource to you.

2 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.