What metrics would help you identify packages? (Either from a "trust" perspective or a “usefulness” perspective?)


#1

What metrics would help you identify packages? (Either from a "trust"; perspective or a "usefulness" perspective?)

R users and admins face a discoverability crisis. With more than 12K packages, how can you decide which packages to use and trust? In RStudio Package Manager we're hoping to help new users discover packages by tracking which packages are used:

However, we'd also like to provide metrics for packages that aren't in the organization yet. There has been some interesting work in this space (here, here, and here). We've also got some ideas of our own, adding badges based on a package’s download frequency, the presence of vignettes and tests, and code coverage.

What do you think?


#2

In deployement and production perspective, I think some criteria to choose packages between similar functionnality are:

  • Is this stable or not ? Where are we in the lifecycle ?
  • How old is the package ?
  • How many issues still opened ? Are there any PR ? Are there at least any maintenance release ?
  • What is the licence ?
  • Is this tested ? what is the coverage ?
  • Are some of the dependency problematic ? (heavy like stringi ? not stable ? needed some not common requirement ?)
  • is it maintained ? Who is the developer ? Is it an organization or a community ?

#3

Those are good questions. The last one especially is something I look for when comparing two packages that do the same thing. If I use an abandoned package and happen to find a problem for my use case, any changes I make (via forking, fixing, and making a pull request) are unlikely to get into the package.


#4

I think you are going to struggle to do this in an automated way. Just glancing at the packagemetrics page in the link, for example, it determines whether continuous integration is being used by looking whether the travis or appveyor badge icons are present! This gives the wrong answer, even in the example it advertises (and I've seen the other type of error -- the presence of a badge but no continuous integration actually used -- too).


#5

If I don’t know the developer, I’ll look at downloads, and issues. If a package seems popular, and the outstanding issues won’t harm what I’m doing, I go for it. Though often I go off recommendations from podcasts, or @mara’s twitter feed.


#6

Eeep! For the record, I do no security profiling whatsoever. Not that any security investigation I did would be worth much, but wanted to put that out there, FWIW. :slight_smile:


#7

I was really thinking more of the usefulness side there. I think discovery can be the hardest thing sometimes.

For “trust” I would say looking at user count and reading issues on GitHub.


#8

Btw @mara I really can’t thank you enough for what you do. I’ve teally learned a lot, and found many resources because of what you post.


#9

Most of the important qualities can't be automatically measured:

  • How easy is it to use?
  • Does it work well with other packages?
  • Does it have good documentation (official and unofficial)?
  • Can I extend it?

A collection of user reviews would be nice. To help people winnow down to likely winners, let people rate packages in different categories. They can decide what's important, get a short list, and read the reviews.