Estimating the number of downloaders of a specified package for purposes of grant funding & reporting

In my home disciplines of economics and law, teachers need not worry much about reusing the same exam questions every few years, as both tend to change the answers regularly. In the same vein, I have noticed that many aspect of R programming and the R community evolve rapidly. I say this to justify my asking a question that I asked before, a few years ago, because I didn’t like the answer I got then, and I am hoping for a different one now.

I am writing a package that I hope will be widely useful to academics in the social sciences, students, and government officials, but the primary community I am hoping to serve consists of small public interest advocacy nonprofits, and maybe journalists. It is possible that somewhere down the line, after it is finished and published (I hope in the next six months) I will be seeking grant funding for outreach and training and the like (not for programming which will already have occurred). At that point, it would be very helpful if I could produce at least a very rough approximation of the total number of people who have downloaded it from all the CRAN mirrors – not the raw number of downloads, which I understand to be much higher than the number of downloaders, and not the downloaders from a single mirror, which I would expect to be much smaller than the downloaders from all mirrors. Though if I had estimates of total downloads of a package from a single widely used mirror like CRAN or RStudio, the approximate ratio of downloaders to total downloads for the site, and the ratio of total downloads from that site to total downloads from all U.S. mirrors, I could use total downloads of the package by the other ratios as a somewhat defensible estimate.

So my question is, has anyone done this, i.e. produced an estimate of downloaders for any package? If so, could someone point me to it or describe the methodology? Or suggest an alternative methodology to the one I suggest above that would be more feasible, given the data that I could in principle lay my hands on? Thanks!

Andrew

Hello @andrewH,

So for one satRday I actually went about downloading a couple of years worth of cran logs to see if the number of downloads overall have climbed for my country and to see and evaluate the overall trend that was happening across years, within years etc.

From what I remember especially in some countries and places you find straight up anomalies where you get vast downloads (10 000's) spikes within a specific country on a day which is very out of place. That is likely due to some other process and not normal activity. The problem with the cran download logs in general is that you only get an ID related to a person for that day and still then likely able to spoof you depending if they use more than one device etc. There is just no way around this as the logs don't assign a unique ID to you indefinitely. So at least within a day you can start seeing patterns and see how likely the download rate for some vs others are on a day. I was even able to perform some network analysis where I could see which combination of packages were likely to be downloaded together given you were able to see these "broken" chains together of what a person downloaded in a day and sort it in the right to -> from set. So there is definitely a way to quantify and get some insights.

I think you will be able to demonstrateably show an estimate to others of how great the package has been in terms of at least minimum this good and at best this good (i.e. was my package only downloaded on its own, or was it downloaded with x or y etc).

Since you have all the logs available here: http://cran-logs.rstudio.com/

You can actually demo this with a small package which got released on cran relatively similar to your scope and use and then see progressively what happened with that package. The problem with the cran logs as well is that you do need to be cognizant of when certain releases came out etc. as those things can definitely drive up your average or overall number of downloads in a period.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.