Started learning R 2 plus years ago without ANY coding background. Use R every single day and love it. I do all things data for my high school (I’m a teacher). Should I begin to learn SQL as my next project? What would the community suggest?
SQL should be an easy extension- in R you are often manipulating data by groups, which is the same thing as SQL (for people coming from a database background I often describe R as a database without a data store to help them get started).
Now that dplyr has
show_query(), SQL is a ridiculously low bar (see below). Where to go next depends on your career goals, but a possible extension to your current R skills would be building out better ways to share your work and tools. I.e. R Markdown, Shiny and Blogdown.
SELECT "name", COUNT(*) AS "n" FROM "vwFlights" GROUP BY "name"
flights %>% group_by(name) %>% tally() %>% show_query()
I’d recommend Jennifer Widom’s course on databases:
It teaches SQL and related tools and concepts, through videos and auto-graded exercises.
I think SQL is a great suggestion because of how often analyses grow beyond what R can comfortably handle in-memory.
Other ideas: Testing, reproducible reporting with RMarkdown, writing R packages. Perhaps you’re already familiar with these.
I’ll also just throw out there that C++ is a pretty good compliment if you’re up for it. There are a good number of resources out there to help you get started, many from Dirk and Romain (two of the main creators), and Hadley addresses it in Advanced R here. I highly recommend reading that first.
Rcpp itself is an amazing tool to quickly test C++ code right from R (using
sourceCpp()). Sometimes I have code where loops are unavoidable, and when the computation done at each iteration is fairly simple a C++ solution can save mind boggling amounts of time.
I have found it a bit more difficult to debug my code, and I shamelessly crash RStudio far too often with it, but that’s more my lack of knowledge than anything else.
I did the relational algebra and SQL ones to get started a long time ago, and found them simple and useful for understanding how joins work.
I think where to go to broaden your programming skills depends also on what you want to do.
If you manipulate lots of data stored in database, SQL is a very good choice. Yes
dplyr ease a lot the work with DB but it is nice to have so skills to go further and know what could be done in DB. It is useful outside of the R world too to play with data.
I second the recommendation on C++ very useful if you need to improve some performance or use R as a friendly DSL around some C++ library that could be of any use to you.
As datascientist, I find useful to have some skills and understanding in Python if you want to use machine learning models as Keras or tensorflow. Even if RStudio made easy the use of python through with reticulate and used it to offer [Keras}(https://tensorflow.rstudio.com/keras/) and tensorflow model to the R community.
These are the language I am looking to in complement with R to go further in some situations where I found myself limited.
Every time I ask myself this question, I wind up deciding to learn more R. There’s just so much growth in recent years and so many ways to apply it. Most recently, I learned how to use R’s mapping features to replace everything I used to do in QGIS (plus some additional stuff I couldn’t do in QGIS but now can in R).
Ultimately, what to learn depends on what projects you want to use it for - but if you’re just looking to experiment with data and grow as a programmer, I’d say stay within R and branch into a new area of data analysis and visualization.
Thanks for your reply. Don’t know how many times I’ve said to myself and
others that I could live to be 100 and still never know all that R can do.
As someone late to the party, it just “feels” like I should be stretching
myself, but honestly have (as of yet) not been able to do what I needed to
do just with R. The R community is so great and I am so grateful to the
work of so many.