Still don't understand why R fails hard at machine learning

R language has been the default programming language for most stats departments around the world. If you look at the most recent stats models that just get published, R normally have them ready before Python. Take Bayesian stats for example, R has brms and rstanarm packages, which do not have counterparts in Python.

My understanding is that machine learning builds on stats. However, it seems that nowadays Python has become the default programming language of machine learning, which really puzzles me.

I have two questions:

  1. Are there any technical features that make R inferior to Python in terms of machine learning?

  2. Are there any chances for R to catch up with Python in terms of machine learning?

1 Like

Answering from my perspective:

  1. There are not: You can see that in many Kaggle competitions similiar results are achieved with R/Python using the same algorithm (i.e. XGBoost)

  2. Theoreticaly there are. It comes down to the people who are eager to push the language further, and connect the language with enterprise relevant technology in production environments (such as Kafka, Spark, Docker, Kubernetes, etc.). When Scientist come up with a new (state of the art) Algorithm, that very Algorithm is basically ready to use in Python by default, not in R. Look at BERT, a NLP Algorithm developed by GOOGLE AI in 2018. However it took some time so that a wide varity of R users could use this Algortihm in R. The amount of people who contribute to the developement of the R language is way smaller and so it is that the focus of the developement is scattered.

There are many projects that point in the "further developement"-direction. Just browse through the resource section of rstudio.com (tidymodels.org, putrinprod.com, Plumber API, etc.). And for more information on BERT in R see here).

R does not fail at machine learning, see e.g.:

I think it is a matter of what is being used and for some reason, perhaps a simple misunderstanding, the notion is that "classical statistics" is the force of R and for anything ML you need Python.

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.