What could we do with a random sample of 100 U.S. R users?

Suppose we could somehow pluck out a random sample of 100 R users, and ask themt a few questions. What could we do with that?

Suppose we could ask just three questions. For instance, if we asked:

  1. How many times they update R, across all devices, &
  2. The principle interface they use, across all devices, and
  3. For those that use RStudio, how many times they update RStudio, across all devices

This information could be used together with raw download data of R and RStudio from US CRAN and Bioconductor replications to produce several different (but one hopes similar) estimates of total R users and of total RStudio users and of total users of other interfaces to R.
For instance, by dividing R downloads by average R downloads per user, or by dividing RStudio downloads by RStudio downloads per user, and then multiplying by the inverse of the RStudio share in interface choice.

In addition, this information could be combined with Google Analytics or similar tools to provide estimates of the number of users of R, RStudio, and other R interfaces, by state and city.

Maybe RStudio or the R Foundation could offer a free T Shirt or mug or something to some small, randomly chosen subset of R users if they agree to get a unique (but otherwise anonymous) identifier and some small bit of software that they would use across all their devices that would accurately count the downloads of updates.

With two additional questions:
4. Do you use R primarily work in bioinformatics?
and
5. How many times have you updated BiocInstaller over all devices?

We could also distinguish bioinformatics users from all others, and get state and national user counts of each, with a version of the same methodology described above..

It would also be very nice to get a report of discipline (for students and academics) or industry (for non-academics). But that would take, I think, a considerably larger sample to produce even slightly reliable estimates. Similarly for demographics.

If we used the software monitor trick described above, we could also get the set of installed packages each person has, for no additional burden. Not sure exactly what we could do with that, besides making pretty network pictures, but something I bet.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.