dplyr function names vs SQL

I was under the impression that the creators of dplyr are familiar with SQL [1] and did (and still do) use it as a direct inspiration [2,3]. But SQL is not very well suited for data analysis [4] so the design of dplyr is about taking the good parts but reformulating other parts with data analysis in mind [5,6].




(not speaking with authority, but I have sources!)


1:

That said, I am very familiar with SQL

see: Disagree with Hadley's comment about databases - #15 by hadley
2:

SQL is the inspiration for dplyr’s conventions, so the translation is straightforward

source: 13 Relational data | R for Data Science

3:

Thanks to Kirill Müller, dplyr has a new experimental family of row mutation functions inspired by SQL’s UPDATE , INSERT , UPSERT , and DELETE .

source: dplyr 1.0.0: last minute additions

4: for example Why SQL is not for Analysis, but dplyr is | by Kan Nishida | learn data science

5:

If you’ve used a database before, you’ve almost certainly used SQL. If so, you should find the concepts in this chapter familiar, although their expression in dplyr is a little different. Generally, dplyr is a little easier to use than SQL because dplyr is specialised to do data analysis

source: 13 Relational data | R for Data Science

6:

[...] dplyr maybe might be better than SQL in some ways. But I think it is, because it's trying to solve a much, much smaller problem than SQL is trying to solve. [...] I think you can rethink the language and the interface, and of course, we've learned a bunch about programming and programming languages and the 40 years since SQL has been around. So I think there's some really nice things about dplyr that just make life a little bit more pleasant.

source: SuperDataScience

2 Likes