What language to learn after R

I get that both can be elegant, but the problem I have with the simile is that due to its use of connotation, poetry is almost by definition hard to fully comprehend. I don't want my code to be hard to comprehend; I want it to be concise, elegant and powerful, but immediately obvious in its denotation.

I guess what I'm saying is that I try to write code like a Calvin and Hobbes strip.

2 Likes

I think a lot of this goes back to how Ruby was envisioned by Matz, and how _why wrote about it in his Poignant Guide to Ruby:

My conscience won’t let me call Ruby a computer language. That would imply that the language works primarily on the computer’s terms. That the language is designed to accommodate the computer, first and foremost. That therefore, we, the coders, are foreigners, seeking citizenship in the computer’s locale. It’s the computer’s language and we are translators for the world.

But what do you call the language when your brain begins to think in that language? When you start to use the language’s own words and colloquialisms to express yourself. Say, the computer can’t do that. How can it be the computer’s language? It is ours, we speak it natively!

Less abstractly, the general premise behind Ruby is that it be written for humans - not computers - to understand.

2 Likes

I posted this above, but if you haven't read it before it's a good look at how R came to be: https://www.stat.auckland.ac.nz/~ihaka/downloads/Interface98.pdf

It's probably the S syntax that throws people from R's Lisp heredity.

2 Likes

You certainly can do most of these things with an R script, the challenge is often that R isn't very efficient from a computation perspective. This is why many teams re-engineer their R solutions in Python or other language that are more effective for building large-scale software applications.

To that end, I'd recommend looking into Ruby.

3 Likes

Are there really any areas where python (or ruby for that matter) have speed advantages over a proper R implementation? I hear this argument often but I hold it for a rumor. At least for what I mostly do data.table is still the fastes solution.

2 Likes

I'll admit I was thinking more about building large-scale data-driven products rather than data analyses. I'm not familiar with any major platform using the R stack for that purpose, while there are many using Ruby (Indiegogo, Kickstarter, airBnB, etc.).

I am pretty sure python is better for production stuff than R, I am not convinced it is better for every day data analysis --- except maybe that it provides better frameworks to machine learning libraries (which is not relevant for me and which I don't know much about).

airBNB, Facebook, Google also all use R to some extent (though I don't really know what for :wink: )

There's a paper here for how airbnb use R:

2 Likes

My company implemented the majority of our codebase in Ruby, specifically Ruby on rails framework. This is exactly what you are talking about with Airbnb, etc. Except it is also totally not... Ruby is used for orchestrating the web pages, not the data processing or data products. My company, along with the ones you mention like Airbnb, also use R and Python for a lot of data processing (aka data engineering, aka ETL) and for all recommendations and machine learning products. We're not single language companies, even if the main application is written in a single language.

Ruby on Rails is an extremely useful tool for building a web application to manage users and content. However, it's not wisely used outside of that domain. In addition, a ton of Ruby on Rails companies are now converting their monolithic application (single codebase, single database) to many smaller applications. This is typically called microservices, sometimes service oriented architecture, because each codebase/database has a small unit of work to do, such as handle the creation of users, or processing orders, etc. For that setup, Ruby on Rails is no longer as tempting of a technology choice since you have even more incentive to pick the right tool for the service you're building.

Just because some company you like or are impressed with used a tool, doesn't mean that tool is right for you. You still need to do research and testing and figure out what will help you solve your problems as quickly and efficiently as possible.

And in my experience working on a large Ruby application, Ruby code can go either way:

  • when you want to implement some new feature and you realize it's only a couple lines of code = magic
  • when you spend three weeks hunting down a bug because your code is so implicit it takes a full time investigation to track down where the function messed up originates

The Zen of Python addresses this issue directly (import this), particularly lines 2 and 12.

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than right now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

7 Likes

You could take advantage of SO's developer survey for this purpose - spy on different coding communities 'most loved' / 'most wanted' languages, language popularity by occupation and top paying technologies by region, for example.

You can even dig in to the data yourself and see what languages are growing amongst your peers (folks in your industry/role/experience level).

Of course, this should be considered in tandem with personal testimonies like those being shared in this thread. Personally, I'm now learning Javascript because I want to create more impactful data visualisations. I have existing needs for this in my work and personal projects that means I will be able to practice and reinforce these new skills regularly. I wouldn't find the same opportunities to flex something like Scala at the moment, so that can wait. In short, learn something you have a use for...

5 Likes

Matlab / Octave - This Language gets other some things right that R doesn't. I personally think using Matlab more often results in "Do what I mean" than in R. Typesafety is a bit more enforced than in R . Some topics such as optimization, image processing and signal processing are better supported (or easier to learn) in this language, just because of Matlab's engineering background. Not recommended for heterogeneous input data.

Racket- it's a Lisp dialect and a teaching language. It has evolved from PLT Scheme. I think R shares a mindset with Lisp, for instance the lazy-evaluation stuff that is used at a low level inside R's tidyverse packages. 1970s Lisp also has some advanced constructs that still haven't made it into modern programming languages. To learn Racket, I recommend taking the Course "How to Code: Simple Data" on EdX.org. It teaches the "How to Design Programs" methodology.

Perl 5 - just for its superior text processing capabilities (read the Unicode chapter in tom Christiansen's et al's book "Programming Perl") which often comes very handy during data preprocessing tasks and during system administration. The language is designed to be as forgiving as possible (like R), and also to be compact at the same time. Some people hate it for that.

@raybuhr wanted to say thanks for this book recommendation. I looked at its website's "List of Command-line Tools" section a bit and loved the csvkit stuff immediately. My copy of the book got delivered today, can't wait to see what other gems it has! :smile:

1 Like

Agree with this. In my experience, SQL is very, very easy to pick up but massively useful as a data scientist.

3 Likes

Granted you say you are not overly concerned with the practical applications, but as a data scientist, I think R, SQL and Python is a good triumvirate to have.

1 Like

Thanks for the comment. I would love to learn javascript and Bash. But Should python will just replicate the same problems that R has. How does it help an R user.

My personal 2cent.

It's pretty clear to me what interpreted languages I want to master after R. The first will be javascript/typescript, because of my need to create better interactive visualizations and frontends for my R models. If you invest some time in learning js you can create very nice dashboards using twitter bootstrap + d3.js for the frontend and something like Rserve or, better, openCPU to feed models/data to the static pages. This setup is more useful to me than shiny because I work at a tech company (rather than, e.g., a consultancy) and so this way of delivering my data science work allows me to somewhat bridge the gap between data science and software developers (my d3 visualizations can be reused/used as a template for the visuals in production, my R code can be exposed as an API and called from the C# backend, ...).
The second interpreted language will be python because, well, machine learning and production.

What I find more difficult to decide upon is which compiled language I want to master. At some point I want to become proficient in a high performance compiled language, and here the choices are multiple: c++, scala, go, rust?

2 Likes

FWIW, if I were going to learn a compiled language, it'd probably be Fortran. I'm heavily biased by being a climate scientist, though: Global Climate Models are invariably written in Fortran, and if I was going to move into model development (not on my roadmap, but who knows) or into research support for climate science (somewhat more likely), I'd need to have some Fortran knowledge.

Global Climate Models are invariably written in Fortran

This really surprises me. You're talking about those enormously complex general circulation models that NASA and a couple dozen other international agencies run to ensemble into upcoming IPCC reports? I don't have any grounds to be surprised b/c I don't know much about how they're made, but I just figured they used C or C++.

Any idea why they use Fortran? Is it a legacy thing, or are there advantages to it? Is Fortran the foreseeable future of GCMs?

This isn't my area of expertise, but my understanding is that Fortran is generally a bit faster (and a bit more expressive) than C/C++ when it comes to the sorts of array maths required for simulating physical systems. C/C++ is powerful and fast generally, but those complex array operations aren't quite as fast and tend to require either complex code or OO abstraction that slows things down.

I swear I've seen a great article explaining this better, but I can't find it now >_>

Most climate scientists also learn Python (or sometimes R, depending on how the balance falls between physical and statistical modelling in their job) and would love to use that, but obviously it's not gonna be performant. Higher-level languages tend to either be used for pre-/post-processing or to script the running of climate models.

Maybe Julia or something one day, but then you also have a legacy code issue. Although coupled models are somewhat modular, I'm not sure to what extent new modules in future models could be written in another language.

EDIT: ahh, here it is! I mean, it's talking about physicists, but the same applies to climate science.

Thanks for your reply and the article link! The article covers things quite nicely and serves to remind me that there's a world of work going on outside of my little bubble of familiarity. :grinning:

2 Likes