What language to learn after R


#1

Hopefully this isn’t too off topic for this community…

I’m interested in becoming a more well rounded programmer and heard that some languages (e.g. Lisp, Haskell, Prolog) are particularly good choices toward this goal. Mainly because they are very different.

Are there any languages which are an especially good compliment to R? I’m not overly concerned about practical applications - my goal would be to understand R at a deeper level by contrasting with another language.


Help me choose a second language
#2

Python might be a good choice. With jupyter notebooks and the rpy2 package, you can write R and Python code simultaneously.


#3

In terms of thinking about programming/algorithms, Lisp & Haskell aren’t very different from R: they’re all ‘functional’ (in spirit, if not in implementation).

Go is an increasingly popular language for systems programming: it’s strongly typed, compiled, has pointers, and supports concurrent paradigms (i.e. goroutines) right out-of-the-box. Unlike C/C++, however, there’s no pointer arithmetic and it has automatic memory allocation/reclamation, which helps to avoid some of the early pitfalls with learning C/C++.

On the other hand, if you learn C/C++ you can write fast, compiled routines that are super-easy to call from R (as described here, here, and here).


#4

Swift

Designed by top designers.

For example: you can write big numbers with thousands separators:
X = 1_000_000_000


#5

Learning the first language is hardest because you also have to learn how to program. Second is still hard, but less so because you’re learning how a different language implements the same things, avoids others, and offers entirely new features. After the second, it’s more about learning how to write idiomatic code in a new language than learning new techniques.

This may not be popular, but why not look at source code from a bunch of languages and pick one that you think you might like writing in? Though many people recommend learning something completely different because it will expose you to more new things, that often makes it harder to comprehend. Maybe try pick something different, but not too different.

My recommendations are:

  • python
    • if you want to be able to work on data science projects where R isn’t standard
    • if you want to build MVP web applications but not have to immediately abandon them
    • sometimes has that same magic of R
    • easy to learn the basics, easy to read code
  • C
    • if you want to learn defensive programming and build a better intuition of what the computer is actually doing
    • if you want to learn (probably a little C++ as well) to write sections of R code for better performance
    • requires learning more about computers and memory allocation
    • language itself is easy to learn, though requires a lot of mental effort to remember to check everything
  • Go
    • if you want to quickly build APIs, CLIs, work in kubernetes land, or generally just get stuff done
    • benefits of being easy to share (compiles to binary so can run on any OS)
    • easy to write programs that use concurrency in a safe way
    • great open source tools available to help you be more productive and write better code
    • not very elegant/pretty, but also pretty easy to learn because you don’t worry as much about writing elegant/pretty code
    • has a really good community

I would recommend the following only if your goals meet these:

  • JavaScript
    • you want to write custom data visualization and/or you want to do a lot of interactive bits on your apps
    • vanilla (base) JavaScript is fine, but the ecosystem is redonk huge and complicated and everyone is convinced their framework is the best, most simple solution so it’s hard to know who to trust
  • Scala
    • you want to work with Apache Spark and Apache Kafka a lot
    • you want to get paid big bucks because of the language you use since you kind of need it to debug the big data frameworks huge companies invested in
    • you want the freedom to program code in multiple paradigms (i.e. however you want) and a nice build tool (sbt)
  • Closure
    • someone told you that you’d like Lisp ((((maybe) you) will) ?)
    • you want a modern functional language with the benefits of the JVM (i.e. you can use Java libraries)
  • Elixir
    • someone told you that you’d like Erlang
    • you want to a modern functional language with great support for concurrency

If you got through all that and still aren’t sure, try Go. It will not replace R, but it complements it really well.


#6

The next language I’m planning to learn in Python.

After I’ve gotten the hang of Python, I’ll probably take up C#. Visual Studio 2015 (or 2016) and SQL Server 2016 have tools to integrate R routines within C# and SQL code. We use C# to define our database and UI here, so there’s a lot of potential for combining the things these languages do well to provide a better product to our users.


#7

I think there is a good argument to be made for learning bash/shell scripting pretty well next. I wish I would have learned it much earlier, I just recently took the Unix Workbench class by @sean and starting to feel somewhat competent at the command line has completely changed my workflow and feels so much more efficient.

Someone with more bash experience might want to weigh in on the benefits but to me:

  • No matter what language you pick up after this, you’ll need to manipulate things from the command line.
  • Custom bash scripts you can call for tasks you do often can save you tons of time.
  • If you want to use remote servers for R or any other language you pick up in the future, bash skills will be invaluable.
  • You can run it from Rmd files.
  • The word bash is fun to say.

I used to treat bash things kind of like git, where whenever I needed to do anything complex I just copied something from StackOverflow, tried to understand it for a minute, gave up, then ran it and hoped for the best. But now I get so excited whenever a situation comes up where I get to use it!


#8

If you use R as part of your profession, I think there are strong arguments for the following three languages:

  • Python, since R and Python dominate data science work it’s good to be able to work in both which will help you fit into virtually any data science team.

  • JavaScript, the main language of the web, which will also help you create data visualizations which can be glued together with R nicely via the htmlwidgets package.

  • Bash, one of the most popular shell languages, which is always good to know as the basis for using the command line.


#9

That is something I still don’t get… can’t i do most if not all stuff that I can do with bash also from an R script? Do you have a few use cases of where that is not true?


#10

I’m probably not the best person to answer this since I’m fairly early in my bash journey but so far there are two ways I think about using bash:

Interactively

  • I also use python, some javascript, etc. while you can do most of the same stuff via R (and each of these languages) having to remember different commands to meet the same goal in multiple languages is harder to me than just knowing it in one somewhat universal language.
  • When setting up a new project, creating directories, etc. I can do it from the command line instead of having to open an interpreter first.
  • bash scripts that can be used by a terminal command I can write once and use for any type of project. ex. I’m working on a script that will copy the n most recently downloaded files from Downloads to the current directory (since people emails me csv files a lot. I can write this script once then use it regardless of the language of the current project.

Automation

  • Setting up a new environment is a hassle but if you record your steps, you can do it the next time just by running a bash script that will download R, install things like libcurl and libxml, add postgres, and run an R script that installs all the libraries you depend on.
  • Using a shell script you can have a lot of control over how your R scripts you want ran on a regular basis are executed when having the shell script run via crontab.

There are probably really interesting better arguments to be made for doing it this way (or in strictly R) but this has worked for me so far. I’m excited to keep learning it!


#11

I would echo what some others are saying here as well; specifically what @sean recommends.

But it also sounds like you really want to deepen your knowledge of R rather than adding a programming language to your resume, so I’ll try to answer differently than the good advice already added above.

In your question you mention Lisp, Haskell, and Prolog–it sounds like you’re interested in a language that shifts the paradigm a little rather than a different implementation of C++ (I could be wrong). I don’t know if you’ve already looked at R on wikipedia and looked at its ‘influenced by’ section, but it seems like there are two interesting ways to go–with Scheme or Common Lisp.

Also, thank you for asking this question because it gave me a chance to read this: https://www.stat.auckland.ac.nz/~ihaka/downloads/Interface98.pdf

Lots of things I didn’t know before :+1:t2:

Scheme

The cool thing about this option is that you can also leverage your new knowledge of Scheme into JavaScript if you follow Douglas Crockford’s JavaScript: The Good Parts (http://shop.oreilly.com/product/9780596517748.do)

He recommends this book for Scheme https://mitpress.mit.edu/books/little-schemer

Lisp

I really tried going this route before using this book http://landoflisp.com/ but it is languishing on a directory somewhere. I still plan to learn some Lisp, just don’t know when I’ll do it!

Clojure

Like @raybuhr mentioned, you can also learn Clojure which is a Lisp, but slightly different. It’s also a bit controversial, as many are both for and against it. But there are companies that use Clojure a lot (I also think PNW is the Clojure mecca). Clojure can access the JVM (which is either a feature or a bug) and it has a statistical and graphics computing framework as well (http://incanter.org/). I don’t know a ton about it, but when I thought about learning it I started here: https://www.braveclojure.com/

Another interesting connection to another language is with ClojureScript https://clojurescript.org/

My two cents Either of the languages above would be cool to learn (I would like to spend more time with both in the future). However, if Python at all interests you I think it’s also a good option even though it’s not really like R, S, or Lisp. Effectively, I think of Python as pedagogically useful for learning computer science and R as pedagogically useful for learning data analysis/statistics. Both were really important and useful to me personally. If this sounds good then I would recommend you use this site: https://interactivepython.org/courselib/static/thinkcspy/index.html rather than buy a book. I think it’s really clear in explaining how the language works.

Hope that helps–I ask myself this question a lot and am so grateful there’s a community here now to ponder ensemble


#12

Python is a practical choice, for all the reasons people have mentioned. However, since you say you’re not too worried about practical applications, I would recommend ruby.

Writing ruby code has been likened to writing poetry. It’s similar to python in terms of versatility, but I prefer ruby’s “feel” and how readable and simple well-written ruby code can be. The other big thing ruby has going for it is the accessible community/resources for newcomers, which I’m guessing you can appreciate since you’re here on this site : )

And some of my favorite ruby resources are applicable to general programming, even though they came out of the ruby ecosystem. Sandi Metz’s Practical Object-Oriented Design in Ruby (POODR) is a great example. It really changed the way I thought about developing a codebase from scratch.

I think a lot of people don’t think about ruby outside of ruby on rails. Which is unfortunate, because even though it’s a solid web framework, pure ruby has so much more to offer!


#13

One big plus for learning bash is you can run multiple R scripts at the same time. This can be really useful if you can think of it when planning how to solve a problem.

For example, if you are running a simulation and want to record results to a file you could do that in a single R script. Or you could write the script to only process one simulation at a time and write the results to a database (which allow multiple connections at a time). With the second method, writing a bash script to take advantage of command line tools like GNU parallel can let you use that simple R script to run 100 simulations at a time.

Another good example would be web scraping. Instead of scraping one site at a time in a long script, you could just pass in a domain to the script as ccommandArgs and scrape multiple at once by running the R process in the background.

This simple bash script will run the scraper.R script on three different websites as background processes and log both stdout and stderr (the stuff that gets sent to console when you run interactively) to separate log files.

#!/bin/bash
Rscript scraper.R www.example.com > example.out 2>&1 &
Rscript scraper.R www.website.com > website.out 2>&1 &
Rscript scraper.R www.anothersite.com > anothersite.out 2>&1 &

Fwiw, I didn’t think and command line scripting at all in terms of programming. I’m still not sure I would, but getting friendly with the command line can be a huge productivity booster, as well as help you unbreakable/debug problems in programming. It’s totally worth spending time learning, though best over a long period of time in practical learning situations.

There’s a book related to this that I highly recommend:


#14

Good question, thanks for posting it to this community! I’ve learned a lot from the great answers so far.

I’m not so familiar with the three languages you listed, but I can offer you an answer from within my scope of knowledge. Your goals really drive this choice–I’m reading your goals to be 1) become a more well-rounded programmer, 2) better understand how R works.

I’m gonna go against popular opinion here and not recommend Python. Although Python is an outstanding language and pretty easy to quickly pick up and be productive in, I don’t see it meeting your goals as well as some other languages might. Instead, I recommend:

  • Scala–a really complex language that can be used as a functional language and/or in an object-oriented way. You can spend a really long time poking around this language learning how it works. There’s a lot of stuff you can do with it, and it’s an important part of the “big data” space, if you’re into that. Personally, I started using Scala third (after R and Python) but I learned more computer science & programming from Scala than the other two combined. And I learn more every day. And that sounds like the direction you want to head.
  • Go–reasons stated in other answers. It’s just unbelievably pragmatic, in my opinion.

And, although I have very little experience with it myself, I love the idea C++ as a second language too, since it’s pretty ubiquitous and can integrate directly into R.


#15

I’d like to take up the cause of “if you like R, just keep learning R”. I’ve done some projects in Python, JS, C#, and Java, and while I think it’s worthwhile to get a cursory knowledge of how web development works, looking back on it, the time I spent learning other languages would probably have been put to better use if I had spent the time focusing on the language I know best (R!) better.


#16

I’ll second the Ruby recommendation - it’s been an absolute joy learning the language, and so far it seems to complement R nicely.


#17

Well …

  1. R at its core is a dialect of Lisp with data types for scientific / statistical / graphical computing added. So if you know R, you already know Lisp.
  2. Prolog and Haskell also derived from Lisp, but with different aims. Prolog is optimized for mathematical logic, including theorem proving, and Haskell is a purely functional language.
  3. So I would pick a programming paradigm - object-oriented, reactive, etc. - and learn a language that uses it as the dominant paradigm. Or pick a “massively multi-paradigm language”, like Scala. :wink:

#18

Huh, I didn’t know this.


#19

Maybe this is a bit off topic, but I am not a fan of the idea of “code is poetry”. This idea isn’t limited to Ruby. Poetry is poetry, and code is code. We are not wrapping metaphor and allusion into our while loops. Code is amazing and enables us to do so many incredible things! But it’s not poetry, and I don’t find much value in saying that it is.


#20

I don’t know if it makes one a better R programmer. But since the topic is “What language to learn after R”, I’d say that SQL and practical experience with databases is extremely important for independent work (but also for collaboration) as a data scientist/ r programmer.