How non-trivial should a function be before it's worth suggesting as an inclusion to a major library?

stevecondylios · July 13, 2020, 3:57pm

A random example of a trivial helper function: %*% doesn't accept data.frames (only matrices), yet we often wish to matrix multiply data.frames, so these let us use %*% but on data.frames:

mm <- function(x, y) {
  
 if(!"matrix" %in% class(x)) { x <- as.matrix(x) }
 if(!"matrix" %in% class(x)) { x <- as.matrix(y) }
 out <- .Primitive("%*%")(x, y)
 # out <- as.data.frame(out) # if data.frame output is desired
 out
 
}

# Example usage
x <- data.frame(a=c(1,2,3), b=c(5,6,7))
y <- c(2,2)
mm(x, y)

#      [,1]
# [1,]   12
# [2,]   16
# [3,]   20

Or as an infix:

'%mm%' <- function(x, y) {
  
 if(!"matrix" %in% class(x)) { x <- as.matrix(x) }
 if(!"matrix" %in% class(x)) { x <- as.matrix(y) }
 out <- .Primitive("%*%")(x, y)
 # out <- as.data.frame(out) # if data.frame output is desired
 out
 
}

x %mm% y

#      [,1]
# [1,]   12
# [2,]   16
# [3,]   20

Is it ever worth putting forward similarly trivial functions for consideration for inclusion in a library (the above potentially for dplyr, since it aides math with data.frames, but this is only an example: other functions could do totally different things and belong in other packages).

My guess for the above example is probably not, based on i) how useful it is (not hugely), ii) how many people would use it (not a huge number), and iii) the advantage of keeping libraries no more bloated than they must be, meaning trivial functions should be omitted.

In any case, I would be very interested to learn from others and their policies / views / ideas around how significant a function should be before making a suggestion.

I am also eager to find out what others do to solve this for themselves - when you make something useful (even if trivial) - do you store it somewhere for the future use by yourself and/or others, suggest it in a github issue, send an email to a package maintainer, or simply not store it (or something else) ?

stevecondylios · July 15, 2020, 1:52pm

Updating with some ideas on the topic from another forum:

If it's little effort I would just open the PR and a note on what problem it solves or how it might help others. No biggie if they don't merge it and others can use it on their own if they like.
If it's larger then common courtesy would be to open an issue to discuss first

we should think of open PRs as simply the sharing of an idea (without any need for it to be merged)

a lot of open PRs are not good. But maintainers will either merge or close the PR. People can still find it by searching the same they would find a closed GitHub issue, the PR just has the advantage of having the code attached

Still very interested to hear the views of maintainers of packages in the R community. Are PRs seen as simply sharing an idea, or can they add clutter and be distracting/inconvenient?

system · August 5, 2020, 1:56pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.