Seeking advise on learning SQL

database

#1

Hi

Attending the last RStudio conference strengthened my new-year resolution to learn SQL. Over the past few years, I have intermittently integrated ROracle data extract statements or procedures in my workflows, but I have only adapted to R the statements and procedures that were kindly created by my company's DBAs... So, I know how to connect to a DB and execute statements, but I don't know how to write said statements or procedures.

I would like to build my own DBs on my linux machine (a variant of Ubuntu) and learn SQL to interact with them using R

So my questions are the following:

  • which DB (preferably free) should I install?
  • which "flavor" of SQL would you recommend I learn?
  • I have this vague notion that Oracle SQL is different from other flavors of SQL. How much disservice would I do to myself by learning another flavor of SQL if Oracle SQL is the one I end up needing?
  • which resources among the gazillions of learning books / tutorials / learning websites would you recommend?

Thanks in advance for you insights.


#2

It depends on what you want to do... Does your company use any specific SQL DB?


#3

I am pretty sure we are using an Oracle DB which has existed prior to the purchase of mySQL by Oracle, so I am pretty sure that this is not a mySQL DB.

That said, I don't think IT will allow me to create my own schemas and tables in this DB just so I could learn.


#4

My vote goes to Postgresql, because it's fast, very customizable (e.g. you can use it as a spatial database with postgis), it's ANSI SQL compliant, allows to write stored procedures and functions in numerous programming languages (Including R and Python), you can define additional data types, It also can handle NoSQL data (e.g. JSON, XML) and It´s free.

Since both Postgresql and Oracle SQL are (more or less) ANSI SQL compliant, it should be easy to move your code from one to another with minimum changes. Also, there are books that can help you with the translation problems

This actually depends on how much sql you already know, but I like this book for featuring the key points of PostgreSQL.


#5

I concur with @andresrcs suggestion of Postgres. It adheres pretty closely to the ANSI standard, and so does Oracle. You will find the code very familiar.

There are differences - e.g. table partitioning or execution hints are better in Oracle, and on the other hand the PostGIS implementation of spatial features positively rocks - but these are pretty advanced features unlikely to be an issue for a user just starting his way in SQL.

As is usually the case stay away from Microsoft products unless you plan to use them exclusively - their SQL server is actually pretty good, but their implementation of ANSI SQL standards is rather loose and it is optimized to perform the best inside their walled garden.


#6

Thanks @jlacko and @andresrcs for your advises


#7

the Datacamp SQL courses are really good. And you can easily take them during the free 2 week trial period.


#8

For your first steps with SQL I think the DBMS doesn't matter much, the main thing is to know that there are differences so you don't panick when you need to adapt your knowledge and stack overflow will help you translate your queries very promptly.

Check out the package sqldf and you can practice some sql directly from R on R data frames. For more "real SQL" you can use package DBI with RSQLite (or pretty much any DB connection package), and play with dplyr on it with the show_query function, which will show you how things translate between dplyr / R language and SQL.

And the datacamp courses are indeed good, my friend had a lot of trouble with the simple stuff then i showed him the courses and he was super happy with it .