Understanding dplyr::distinct source code

Hi all,
Apologise if my question in naive, but I was trying to understand the source code of distinct function from dplyr, it looks something like this:

distinct <- function(.data, ..., .keep_all = FALSE) {

Does this mean there is another function "distinct" that gets called by UseMethod?
In docs for UseMethods I found this: When a function calling UseMethod("fun") is applied to an object with class attribute c("first", "second") , the system searches for a function called fun.first and, if it finds it, applies it to the object. If no such function is found a function called fun.second is tried. If no class name produces a suitable function, the function fun.default is used, if it exists, or an error results.

Implying "fun" or in our case "distinct" is an object? I am not sure how to understand this any help appreciated.


That part is easy: everything in R is an object, including fun.

The rest is something most users will never have to worry about to use the dplyr package, but may come up in using distinct as a generic

which means that packages can provide implementations (methods) for other classes

UseMethod is

a special function and it behaves differently from other function calls. The syntax of a call to it is UseMethod(generic, object) , where generic is the name of the generic function, object is the object used to determine which method should be chosen

R Language Definition §5.4

In the case of dplyr::distinct, the call to UseMethod passes only the generic. I think, but I'm unsure, that the object in the argument is implicit, provided by the evaluation environment, usually a data frame or tibble within the tidyverse. Because distinct is also a class of type function it seems that this is an instance of recursion, but I'm not really sure.

methods(class = "data.frame")
#>  [1] [             [[            [[<-          [<-           $<-          
#>  [6] aggregate     anyDuplicated anyNA         as.data.frame as.list      
#> [11] as.matrix     by            cbind         coerce        dim          
#> [16] dimnames      dimnames<-    droplevels    duplicated    edit         
#> [21] format        formula       head          initialize    is.na        
#> [26] Math          merge         na.exclude    na.omit       Ops          
#> [31] plot          print         prompt        rbind         row.names    
#> [36] row.names<-   rowsum        show          slotsFromS3   split        
#> [41] split<-       stack         str           subset        summary      
#> [46] Summary       t             tail          transform     type.convert 
#> [51] unique        unstack       within       
#> see '?methods' for accessing help and source code

Created on 2021-01-10 by the reprex package (v0.3.0.9001)

1 Like

In case you wanted to see more of the internals than only only method stub, you can look here :slight_smile:

"dplyr/distinct.R at master · tidyverse/dplyr · GitHub" https://github.com/tidyverse/dplyr/blob/master/R/distinct.R

Ahh, a recursion makes sense!

> methods(generic.function = "distinct")
[1] distinct.data.frame* distinct.default*    distinct.sf*  

So UseMethod("distinct") when called would go an search for class attributes c("data.frame", "default") and "sf" in my case since I have that package loaded.

Got it, thanks!

1 Like

Hey thanks, I did. My code in the initial post is from the source code distinct.R.
I was trying to understand how it was built.

That's great. Also this book provides explanations on this issue, as well as many others. It might be useful resource to you.


This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.