Question for office hour: R vs Python

I think this python example is actually more consistent with how object oriented programming is usually taught in university than the R one. In Python, you instantiate an object of a certain class. That class can have methods and attributes unique to it just like R. However, calling the fit method doesn't just pass in arguments to a function like R would (and why you need to save that result to an object in R), but instead it updates the instantiated object it is called on.

I agree that this can be confusing. I also find the object oriented aspects of R to be confusing. Take the print method. You get a different result depending on what you pass in. It looks like a regular function that takes arguments and returns an output, but it is not. Instead, depending on what you pass to print, a different implemention of that function gets executed. This is really nice because you don't need to worry about whether you explicitly created an object that has a print method, but is less obvious what you are going to get when you pass your object into print unless you are specifically testing for the class beforehand. In Python, that confusion is less common when using methods because you have to be explicit by creating the object of a specific class first. That means those methods and accessible until an object of the class gets created, where R you can just call the print method from the get go without instantiating anything.

In my opinion, it is a trade-off of ease of development for understanding what exactly is happening as R favors ease of development and python favors being explicit.

1 Like

@raybuhr

I quarter-disagree with R's methods returning different types of values based on input. The quarter is something which trips up a lot of new R users: some functions are for interactive use only.

Stuff like summary, sapply, subset, and especially $, are simple to use in the console for exploring data. So users learn these functions, write scripts with them, and are in for a nasty surprise when the output is nothing like their expectation.

Back to the language war, Python has the same problem with unknown returned values. It uses duck-typing, just the same as R. An object's class is obvious in a procedural script, but functions don't have built-in type checking. They just assume everything's fine until an error is raised.

1 Like

100% agree with you and have learned all that the hard way as well. I wasn't trying to favor one language vs another, just stating my thoughts on the difference in usage between the two. Like you alluded to, more error checking and writing tests to avoid these typing problems helps. That's stuff that I find slightly more difficult in R than needs to be and often overlooked by R tutorials, which ultimately means less often used by people programming in R.

I get the impression that there is an incredible amount of work going on developing the R ecosystem (and thank you so so much everybody who's doing that!), but that R as a language itself isn't moving forward at the same pace.

That's what makes me worry about R in the long run.

If this isn't the case, I think some PR would super valuable.

1 Like

There is a simple explanation for the rapid growth of Python. Most people who are from a software engineering background prefer to use Python over R as they find it easier to learn. R is generally used by people who have a background in statistics (not to say that others don't use R).

R language is getting some serious work done, just not at the forefront of news. Checkout this blog post from Revolution (part of Microsoft) from last month. There's a good amount of funding going towards making the language itself better, but a lot of PR comes from companies like RStudio and Microsoft since they directly benefit from more people using R.

2 Likes

@Mike, I got a job on the back of R alone. I know no python to speak of.

2 Likes

@Mike, theres the sparklyr work that was the subject of some good webinars recently.

Is that really the reason?

Haven't been here for a while, hope yall are doing well.

@DaveRGP Congrats. Without knowing what you use R for, it's hard to say. Just like stack overflow on R vs. Python search post above.

I mean, it really depends. There are people who got jobs using Excel.

Take a case point, I heard the government is still Windows 2000, don't quote me! It's not that they don't want to/know how to upgrade. But they've got too much stuff the system to migrate.

Similarly, in pharmaceutical companies, if all previous codes are written in SAS, they will continue to use SAS. But, if you look at the start-ups, many chose to switch out.

@danklotz It's my impression. What's your take?

2 Likes

There will always be use cases that lend themselves to different technologies. R has an amazing ecosystem of tools that many companies are investing in and will be around for years to come. R helped me get my current job in the financial services industry, and our firm loves hiring people who are fluent in rmarkdown, shiny and tidyverse. Regardless, I would say that statistics, ML concepts, general programming and software development skills that can be applied to different problems in a reliable way are more important than the language itself. A good R programmer can pick up Python quickly and vice versa. But a bad programmer is going to muck things up regardless of the language.

3 Likes

Ah, I don't know, I did not put to much energy into that question (yet).
I was hoping that you have some insights/data which suggests your hypothesis

I've heard that some G agencies indeed are running on Windows XP.
And some on Mainframe.

The conversation pivoted into this direction a few times, but let me explicitly ask this again:
Put all your resources into mastering R, or split the resources and start learning Python, too?

What do you think?

Well, I speak from my experience, so other may differ. I had an introduction on both R and Python, but I sticked with R until I had some good bases and built projects of my own in R. And later on, I found that what I learned was transferable: I could use the Split/Apply/Combine strategies I learned with R to pandas (almost) directly, and so on.

I think that if R floats your boat, and you company is OK with that, stick with R. If everybody at your companyuse Python (which is my case now), well, you'll find help more easily if you use Python as well. And the learning curve won't be as steep.

The most important for me now is that I get to do data science :smile: I might achieve to persuade the others to give a shot at R for exploratory data analysis, because ggplot is awesome :wink:

3 Likes

put your resources into generell programming skills. learn about refactoring/clean code, unit testing, git, how to document your code properly, how to set up packages/libraries that are usable by other people. Except for syntax details this will be almost identical between python and R, and many other programming languages.

6 Likes

I think this is a very sound advice, and obvious to me these days (just wishing it was obvious 2 years ago or more)
But assuming you learn all the general concepts, you learn them as you learn something else. In our example - R or Python or both.

So, let's say in the next 5 years, would you prefer to double down on your R skills, or defer some of the resources to start learning Python (assuming you also learn clean code, git, reproducibility et cetera as you go)?

I guess there is also the law of diminishing returns in place, where at some point you know enough R that you don't gain too much from each extra hour/day/week learning more of R, and may start thinking about learning Python, but first you'd probably have to hit that plateau...

I sometimes think if I should learn Py on the side, but then I realize I'm better off channeling my resources into R for the foreseeable future...

Python is a well designed language and It's probably a better vehicle to learn about programming than R (I only know Python from uni though). For my current job (working mainly with tabular data that doesn't get to big) R is pretty perfect and there is little incentive to switch. If I had the opportunity to use Python at work I would definitely try to get more into it, but we use R and SAS (shudder) exclusively.

I thought a lot about which programming language to learn next, and I ended up deciding on (plain) C. Why? Mainly because it teaches you a lot about computers. It also helps you understand the inner workings of R (and Python) better.

On the other end of the spectrum, Javascript is probably also a good choice (if you are into Visualizations).

1 Like

@hoelk
I think that was a smart decision to learn C next. For one, it's a surprisingly simple language to learn in terms of syntax and basics. But you're absolutely right that it requires more thought, specifically handling memory errors, understanding pointers and compiling your programs before you can run them.

@taras
If you're looking to become a better programmer, my advice is to become professionally proficient in one language first -- i.e. in R I would expect you to be able to write applications, libraries, data processing scripts, use version control, manage packages/dependencies, and safely handle secrets/credentials. If you already feel very comfortable with those things, I think it makes sense to learn a second language.

I've spent some time programming projects in 6 languages so far (and another 4 that I started trying to learn and decided I didn't want to keep going) and I think it has made me a better programmer*. Not because the languages themselves allowed me to accomplish anything the others couldn't (though sometimes that was the case, e.g. JavaScript), but because they each have different people contributing to them so you get more diversity of thought behind programming. R is cool because you can learn and apply a few different programming paradigms, though they often feel different in other languages. I think it is worth trying them out because even if you don't like the language overall, you'll most likely find something you like that you can take back to R.

*Note: I am not saying I am proficient in all 6 languages, just that I have tried and completed projects that actually do something non-trivial.

For example, in Scala I built an application that uses the Twitter API with Spark Streaming to process all incoming retweets and store them in a Cassandra database. That is not a complex application, but it is one that requires some effort to implement.

I have a very opinionated take on what second language someone who only knows R should be, but I'm going to save it for a blog post. The tl;dr is that is really depends on what your goal is.

5 Likes

@raybuhr just out of curiosity, can I ask which are the six languages? About C, it definitely requires more thought, but it also requires writing much, much, much more code...I (sort of) learnt C 10 years ago and I shudder at the thought of the quantity of boilerplate code I'd have to write, to achieve anything like what I currently do now. If one wants to learn a general-purpose, compiled programming language today, wouldn't it better to go for Java?