Why doesn't R packages get more Star on Github

R is one of the top 10 programming language in the world as explained in the graph. Yet somehow even the major packages of R don't get enough stars to justify it. I took some snapshots.






[

These are some of the snapshots of some big names in R and I am not even talking about normal or unknown packages. Not getting enough stars and issues on GitHub might discourage people from pursuing further and it happens all the time in open source community when someone just quits a package maintenance or package development after some time because he feels like there isn't any need for it or so on....

Now all I want to know is your personal view on why does it happen and does it have any impact at all...... etc... etc...

2 Likes

I'm not all that surprised that there aren't stars on Github R repositories. For comparison, let's look at a list of all repositories sorted by the number of stars:

https://github.com/search?q=stars%3A>1&ref=searchresults&type=Repositories&utf8=✓

That shows FreeCodeCamp at number one with 292,000 stars. Bootstrap is a distant second with only 126,000 stars. These are repositories aimed at a much broader audience than R is.

Let's also consider that R is a pretty niche piece of software. It may be widely used, but it's used for a specific purpose, whereas a lot of other languages have a more general purpose use. As a corollary, I would imagine that vast majority of R users are not doing a lot of package development. I'd say the typical R user is generally using packages and not spending a lot of time tracking down why things do (or don't) work. Consequently, the audience of R users with an interest in the details of R development is pretty small. I wouldn't expect to see 100,000 stars on package repositories.

2 Likes

Also of interest, according to this listing, R accounts for 0.2% of the repositories on GitHub, ranked as the 27th most popular language.

https://madnight.github.io/githut/#/pull_requests/2018/1

If the user base is proportional, you'd expect the top R repository to have about 584 stars. Most of the repositories you highlight have more than this. So it seems R is doing pretty well.

2 Likes

At the risk of appearing to be a cynic, I do not believe that the value of R packages can be accurately measured by number of GitHub stars or Facebook likes.

Would ggplot2 (the highest ranked R package at the moment) give you twice as utility as it does now if it had six thousand likes? If so, I assure you they can be rather easily crowdsourced, just like social media followers are bought and sold in the open market.

2 Likes

The motivation of package developers is probably not measured by stars.

1 Like

Thanks for sharing your thoughts.

Today I happened to check ggplot2 on github after a twit by hadley and I saw this odd thing I never noticed before. And I thought I should say it somewhere. So I am saying. Not pointing out anything or anybody.

Thanks again I will remember it that most of the R package user are just using them and stars don't mean anything.

1 Like

Somebody recently wrote about the ratio of package downloads to GitHub stars, but I can't remember who it was >_<

1 Like

I would definitely like to read it you could show me where it is that would be a great help.

Is it this blogpost?

4 Likes

awesome post it seems like RCpp is the most downloaded package ( may be because if you open an rcpp notebook in rstudio it installs it by default or most of the packages are dependent upon it.) but it was very insightful blog to read.

Thanks for pointing it out.

:grinning::grinning::grinning:

GitHub stars are not a great measure of popularity for R packages. GitHub is used by software developers (or software engineers), and it's likely that many users of R don't use GitHub.

Consider the software Excel. How would you rate Excel using GitHub? You wouldn't because it's not open source. Even if it was how many GitHub stars would it have? Probably not that many considering that the Stack Overflow question trends for Excel are lower than R. Does this mean that Excel is unpopular? Hardly! Excel is probably the most important piece of software for business intelligence ever built. However, you wouldn't know it by Stack Overflow and obviously not GitHub stars. Note that I do not condone the use of Excel for data science or data analysis or in general because of issues with errors, which are well documented.

image
Excel and R Stack Overflow Trends

Back to R...

A better measure might be downloads of key R packages (dplyr, ggplot2) from CRAN. You can rest assured that the trends are pointing in the right direction.


dplyr downloads (daily frequency)

Further, it's my feeling that the best is yet to come. R is certainly not mainstream (yet). I believe it will be once enterprises truly begin to embrace data. Think of all of the Excel users out there. They are likely to convert to R (a statistics first language) once they take the leap into data.

In sum, GitHub stars are probably not the best measure of popularity for R given it's non-programmer user base. Further, R is in the early stages of a massive movement, and the best is yet to come. If you want to read more of my thoughts on this subject, check out my blog article on 6 reasons to learn R for business. 6 Reasons To Learn R For Business

8 Likes

Bravo :clap::clap::clap::clap::clap::clap::clap:

Excellent reply. I too used excel a lot and turned to R because I was fed up using VBA for automating reports. And I never felt like I learned a programming language. It was exactly like using excel.

I totally get it. I am glad I asked the question. It gave me an entirely different picture of R programming.

Thanks for taking your time in writing the response.

1 Like

You're welcome. It's a great question to ask! I have pondered the same question before (clearly) and others probably have too. That's exactly why I'm glad you asked it!

1 Like

While I posted it I had only_2 possibilities_ in mind

  1. R user don't create stuff
  2. Most of them might not have github account

And I asked it because there had been times when I have seen people starting out a package but never finishing it. And in most cases these packages have very few stars on their git repo.

But now I saw differing ideas and opinion. I wish somebody could point out any other reason.

1 Like

Hi again,

I reread your answer this morning and I found your logic wonderful but I think there is a flaw there.

I am not against your opinion I do believe that GitHub or StackOverflow cannot measure the popularity of a software. But somehow I believe you compared apple to oranges.

excel, SAS, Power BI, Tableau etc... are commercial software
R , python , julia, scala etc... are open source programming language

commercial software don't keep their code on GitHub but open source libraries do

in commercial software you have only one use case and almost one way of achieving it.
open source programming provides a lot of libraries that do almost same things. that should be the reason for low stack overflow questions.

Hope you understand what I mean now. I was trying to compare R with open source programming paradigm where GitHub stars do matter for most of the people. And newcomers check these things too...

but thanks for throwing light on dual nature of R to me. that R can behave wave like in some situations and matter like in another.(pun intended :rofl::rofl::rofl:)

Thanks for rereading my post and applying critical thinking to the logic! :+1:

In my response, what I want to point you too is where I say,

How would you rate Excel using GitHub? You wouldn't because it's not open source. Even if it was how many GitHub stars would it have? Probably not that many considering that the Stack Overflow question trends for Excel are lower than R.

What I'm saying is that It's not a direct comparison because it's not on GitHub, but we can use a proxy with Stack Overflow. Stack Overflow is used by millions of programmers and software engineers to figure out problems as they build code. So I'm using it in place of GitHub since it's the same audience.

If we agree that Stack Overflow (SO) is a proxy for GitHub, we can then use it to examine Excel's "popularity". Excel's SO trends are lower than R, which people might say it's less popular. This is definitely not the case. Excel just gets fewer questions asked on SO.

The same goes for R. R's user base is primarily non-software engineers. Python's user base is software engineers. Therefore, if mainly software engineers use GitHub, then the Python library stars will dominate and the R library stars will be very low even though downloads of popular R packages are in the millions.

Does this make sense?

3 Likes

Yes it does thanks for replying.

What I have found of the years in R community that not every package is on CRAN some R in github because auther didn't want to go through the trouble of CRAN and some are in Bioconductor. They are well maintained too... We should advice people to search these options.

I agree to your point I always did I mentioned it before as well. I was just a bit curious and worried too.

But Thanks for taking time and explaining things to me. Seems like we are fighting for the spot of excel and SAS not for java and C...

Thanks again.

Another point is the real importance. Likes in github are worthless, but utility is relevant at the end. I cannot believe ggplot, that is a package for making nice plots in a hard way have almost ten times more stars than Rcpp. Well, I can believe it, sadly.

1 Like

Rcpp is a core package for thousand of other packages. You don't really need t have RStudio installed to have it, because anything you install will probably have it as a dependency. Actually I bet there are more Rcpp installations than RStudio.

Of course, Rcpp is 'no cool', it is just a really functional package, unusable directly by 91% of R users, as most people just use R as a tool, and are not developers, just some kind of end users, even if they write code

1 Like

I totally agree with your points that stars don't add value to code.

But for some reason entire open sources community and specially the people who start learning a programming find these intuitive and treat them as a measure of support.

In open source world sometimes the projects are started but dropped (like ggvis or rCharts or so on...). And this happens quite frequently and then you have to relearn another package to compensate for the older one. Stars give you a sense of security and trust me I am not talking about R alone but python and go and other open source programmings do that too...

And for newcomers it helps them decide which package to use if both have almost the same functionality.

one more thing sometimes only sometimes when the package becomes famous and many companies start using them it helps the programmer to fund the project too. Entire R world has different parameters than other programming languages but for most of them these stars actually matter. It shows a trend.

I agree to all the points everybody made but I don't think you should undervalue github stars. For programmings who don't have something like CRAN these are valuable.

1 Like