Recording package loads

mlevy · January 8, 2018, 5:25pm

I'd like to get a sense of how distasteful folks think it would be to send a record when a package is loaded. My company supports our open source work, but higher-ups would like to see usage metrics. Would trying to send a ping on load be totally beyond the pale in your opinion?

tyler · January 8, 2018, 6:11pm

Not a fan, and I imagine many IT departments in other organizations won't look kindly on unknown outgoing connections to an external server. However, if your package is on the CRAN, you can get a sense of the overall usage with two packages: the cranlogs package (for raw downloads) and adjustedcranlogs package (which adjusts the daily cranlogs record for CRAN-wide automated downloads and re-download spikes, giving a better idea of the number of actual downloads for a particular package).

edgararuiz · January 8, 2018, 7:09pm

It sounds like number of downloads and times a library is called could be easily padded to simulate more usage. Without knowing the true intent of the exec team, I would think that if you have Shiny apps in production, knowing how many times those are used and by how many unique users would be a better metric to track when it comes of the impact R in production. If the intent is to figure how much development is happening in R, in contrast with other products, then I would say that an analysis of the content in your code repositories may be a better KPI. If the analysis is for R packages (which is kind of weird to me that an exec is worried about that) then a text analysis of the code in your repos will tell you know many times a particular library is called, not perfect, but it may get you closer to the truth.

nwerth · January 8, 2018, 7:50pm

If somebody publishes work which used your package, they might also include it in a bibliography. As annoying as package start-up messages are, you could use one to suggest how to word a citation (implicitly suggesting they should cite your package). Then it becomes the problem of citation counting, which has multiple "solutions."

But I agree with @tyler on never doing the ping thing. If a package would send reports of my behavior, I wouldn't use it.

hoelk · January 9, 2018, 6:25am

This gave me a package idea:

Spyr - The opt-in spyware package. When you install it (and add something to your .Rprofile manually to prevent abuse) it logs all your package loads to an external website. Like last.fm but for R packages. Someone with access to a web server should do that :). That way we would have voluntary package use statistics for all packages, which would be nice.

jennybryan · January 9, 2018, 6:42am

This bit of the CRAN Repository Policy is relevant:

Packages should not send information about the R session to the maintainer’s or third-party sites without obtaining confirmation from the user.

mlevy · January 9, 2018, 6:16pm

Thanks, I didn't know about adjustedcranlogs, but that will be much more useful than the raw download numbers.