Thanks everyone for the great advice on this thread!
I'll add my 2c too. I am in a slightly less difficult situation than you are but some aspects are similar (I teach data wrangling/analysis to grad students but who chose biology to avoid maths and see statistics as a chore). So there are two obstacles:
- getting them interested in computing statistics;
- convincing them to do it with R (or, more generally, a programming language) rather than point and click interfaces.
I actually start with 2. To get them interested in programming, I appeal to everyone's inherent lazyness . I choose them a real problem, relevant to their field; I start from the raw data files then ask them to time me while I solve it "manually" (copy/paste + formulas in Excel) and then in R. Of course, I choose the problem so that programming is faster and possibly simpler than Excel: it requires merging data from different sources, cleaning it up, repeating simple operations, etc. The goal is to show how coding will save them time and convoluted/repetitive/dangerous operations; basically that it will allow them to be lazy. The timing/competition aspect keeps them interested (I stop often asking how I am doing on time, set time goals, etc.).
Also, I explain what I do while doing it but I leave out the details so that, initially, R looks kind of "magical". Several functions have this "magic" component: separate()
is simple but has a great wow factor, so do facets in ggplot (the final result is a facetted plot, so it ends with a bang); ***_join()
and group_by()
+ summarise()
are more difficult to grasp but since I have been doing the same thing manually in excel before, they can appreciate the automation; finally group_by()
+ filter()
where the condition depends on the data (e.g. extracting the first half of the data where the value of the median is different in every group) leaves them a bit bewildered but conscious of the power of the method. The goal here is to reach a point where (some) figure out: "we can learn how to do that in just a few hours of class? cool!". Of course I still insist on the fact that nothing is magical, just logical!
It can take 30 to 45 min in the first class but I find it worth it if it "hooks" some of them in. It took me a few years to find a good problem (something that most would want to solve but is not obvious) and refine it to be demonstrative (and in the meantime I've had to adjust from aggregate()
to ddply()
to group_by()
+ summarise()
).
Regarding 1, for me, it all rests on finding good examples. I agree with the posters above that having them work on their own data helps, but it is more difficult. One thing that works well for me is getting them to find stuff about each other from the data. In our case, it is trying to highlight their personal bias in sorting some biological samples taxonomically (yeah, not terribly fun... but it works!). In your case it can be even more engaging if you base it on social-related questions.
I also work with data on which everyone has a preconceived opinion and try to focus on cases where "statistics" can show it is wrong. In my case it is weather records: some places are supposed to be rainy, others sunny, while in fact the total precipitation is the same (the distribution of rain is different, which gives the initial impression), same for peak temperatures compared to average ones, etc.
Good luck!
PS: Regarding read_csv()
and the GUI in RStudio, I also use the GUI initially but insist that they then copy-paste the code in their script and make it into a path relative to the project. They learn the options through the GUI, by trial and error. Some even get tired of clicking the buttons and start to write the functions directly but even if some don't, they still end up with a self-contained, reproducible, script which is the most important.
PPS: Small rant: it should be easy for the GUI to make the path relative on its own, when one is in a project and the data file is within the project, and that would be more demonstrative of the advantage of using an RStudio project! I should report that somewhere.