I was under the impression that the creators of dplyr
are familiar with SQL [1] and did (and still do) use it as a direct inspiration [2,3]. But SQL is not very well suited for data analysis [4] so the design of dplyr
is about taking the good parts but reformulating other parts with data analysis in mind [5,6].
(not speaking with authority, but I have sources!)
1:
That said, I am very familiar with SQL
see: Disagree with Hadley's comment about databases
2:
SQL is the inspiration for dplyr’s conventions, so the translation is straightforward
source: https://r4ds.had.co.nz/relational-data.html
3:
Thanks to Kirill Müller, dplyr has a new experimental family of row mutation functions inspired by SQL’s UPDATE
, INSERT
, UPSERT
, and DELETE
.
source: https://www.tidyverse.org/blog/2020/05/dplyr-1-0-0-last-minute-additions/
4: for example https://blog.exploratory.io/why-sql-is-not-for-analysis-but-dplyr-is-5e180fef6aa7
5:
If you’ve used a database before, you’ve almost certainly used SQL. If so, you should find the concepts in this chapter familiar, although their expression in dplyr is a little different. Generally, dplyr is a little easier to use than SQL because dplyr is specialised to do data analysis
source: https://r4ds.had.co.nz/relational-data.html
6:
[...] dplyr maybe might be better than SQL in some ways. But I think it is, because it's trying to solve a much, much smaller problem than SQL is trying to solve. [...] I think you can rethink the language and the interface, and of course, we've learned a bunch about programming and programming languages and the 40 years since SQL has been around. So I think there's some really nice things about dplyr that just make life a little bit more pleasant.
source: https://www.superdatascience.com/podcast/hadley-wickham-talks-integration-and-future-of-python-and-r