Hello. This might be a bit of an off-topic but I hope I found the ok-place to pose this question. I'm a teaching assistant at Psychology BA university class. The professor had never worked with R so I am basically trying to construct a nice introductory syllabus for the students.
Main focus is less on proper programming and more of providing students with analytical tools.
There are tons of resources online but I feel like going straight with the Tidyverse approach, and was wondering what's your thoughts. I started with learning base R, much like most of courses out there but I feel like it is so much more valuable to teach students the tidyverse right from the beginning.
I have about 10 in-class hours thought the semester dedicated for R. I think I'll spread them over 5 face-to-face meetings and of course assignments as well.
My goal is to reach to a point where we go through stuff like mutate |> across(where(is.*),as.*)), even the mighty map
which is preety darn advanced in my opinion and provide great tools.
I talked so much i might have just put everything in GPT4 and see what she got to say about it
Thank you for your genuine advice, this is an invitation for newbies, advanced and professional users alike! All opinions are welcome!
Moderators, I hope to not have broken any rules here...
--
I actually consulted with chatGPT (not 4) after posting. It insisted on doing dplyr and ggplot before introducing students to data frames so Im not sure itll replace your honest advice so fast
Here's one approach, but I don't know if I've under-estimated or over-estimated the readiness of the general run of contemporary undergraduates.
Introduction
Everyone already knows how to use R in theory—it's school algebra f(x) = y where x is some set of data, y is some information to be extracted from it and f is one or more functions that turns x into y
To put that mental model to use, we will be discussing set-up today, including
a. Installing the R programming language
b. Installing the RStudio wrapper that provides a browser-like way to use R
c. Installing the tidyverse suite of packages and the {ds4psy} package
a. Naming of parts: source and console
b. Console as calculator + - / * ; concept of operator precedence
c. Hello, World
d. Typical session using a script skeleton
# name_of_script.R
# description
# author: who wrote it
# Date: 2023-04-20
# libraries
# functions
# constants
# data
d <- mtcars
# preprocessing
# main
head(mtcars)
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
#> Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#> Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#> Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
#> Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
#> Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
str(mtcars)
#> 'data.frame': 32 obs. of 11 variables:
#> $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
#> $ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
#> $ disp: num 160 160 108 258 360 ...
#> $ hp : num 110 110 93 110 175 105 245 62 95 123 ...
#> $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
#> $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
#> $ qsec: num 16.5 17 18.6 19.4 17 ...
#> $ vs : num 0 0 1 1 0 1 0 1 1 1 ...
#> $ am : num 1 1 1 0 0 0 0 0 0 0 ...
#> $ gear: num 4 4 4 3 3 3 3 4 4 4 ...
#> $ carb: num 4 4 1 1 2 1 4 2 2 4 ...
summary(mtcars)
#> mpg cyl disp hp
#> Min. :10.40 Min. :4.000 Min. : 71.1 Min. : 52.0
#> 1st Qu.:15.43 1st Qu.:4.000 1st Qu.:120.8 1st Qu.: 96.5
#> Median :19.20 Median :6.000 Median :196.3 Median :123.0
#> Mean :20.09 Mean :6.188 Mean :230.7 Mean :146.7
#> 3rd Qu.:22.80 3rd Qu.:8.000 3rd Qu.:326.0 3rd Qu.:180.0
#> Max. :33.90 Max. :8.000 Max. :472.0 Max. :335.0
#> drat wt qsec vs
#> Min. :2.760 Min. :1.513 Min. :14.50 Min. :0.0000
#> 1st Qu.:3.080 1st Qu.:2.581 1st Qu.:16.89 1st Qu.:0.0000
#> Median :3.695 Median :3.325 Median :17.71 Median :0.0000
#> Mean :3.597 Mean :3.217 Mean :17.85 Mean :0.4375
#> 3rd Qu.:3.920 3rd Qu.:3.610 3rd Qu.:18.90 3rd Qu.:1.0000
#> Max. :4.930 Max. :5.424 Max. :22.90 Max. :1.0000
#> am gear carb
#> Min. :0.0000 Min. :3.000 Min. :1.000
#> 1st Qu.:0.0000 1st Qu.:3.000 1st Qu.:2.000
#> Median :0.0000 Median :4.000 Median :2.000
#> Mean :0.4062 Mean :3.688 Mean :2.812
#> 3rd Qu.:1.0000 3rd Qu.:4.000 3rd Qu.:4.000
#> Max. :1.0000 Max. :5.000 Max. :8.000
complete.cases(mtcars)
#> [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
#> [16] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
#> [31] TRUE TRUE
stem(mtcars$mpg) # or mtcars[1]
#>
#> The decimal point is at the |
#>
#> 10 | 44
#> 12 | 3
#> 14 | 3702258
#> 16 | 438
#> 18 | 17227
#> 20 | 00445
#> 22 | 88
#> 24 | 4
#> 26 | 03
#> 28 |
#> 30 | 44
#> 32 | 49
fivenum(mtcars$mpg)
#> [1] 10.40 15.35 19.20 22.80 33.90
quantile(mtcars$mpg)
#> 0% 25% 50% 75% 100%
#> 10.400 15.425 19.200 22.800 33.900
hist(mtcars$mpg)
Given the response by @Technocrat as a foundation, consider preparing data-tibbles ahead of time with data relevant to the topics covered in the course. Then the students can open up their R script (in a project) template and they can work on the data visualization stuff.
Which of the course learning objectives and outcomes can be achieved through programming? Work your R lessons into supporting those objectives and outcomes.
Thank you for your insights.
I did not know that source [ Text: Data Science for Psychologists (ds4psy) Data Science for Psychologists ] and it is really great.
I actually shared this topic with some others TAs to get inspired and update their syllabus if needed.
(sorry, I think I had not sent this reply to the list)
why do you want your students to know R? Do you want them to become programmers, do you want them to analyse their own data, do you want them to run statistical analyses on large, messy and complex human behavioural data? I suspect it’s mainly the latter than the former, and I would agree that going for the tidyverse is the way to go (http://varianceexplained.org/r/teach-tidyverse/, see also this perspective from Roger Peng on base vs tidyverse https://simplystatistics.org/posts/2018-07-12-use-r-keynote-2018/).
Ten hours is not a lot of time. Have a look at teaching materials for psychology students in Glasgow, from Emily Nordmann and Lisa DeBruine (and others) – they are very nice (both materials and their authors ) and at the very least they should give you a good foundation to adapt/develop your materials:
From my perspective on teaching R to unsuspecting biology students, the big thing is to show them quick way to a big payout. All the talk about reproducibility, collaboration, open source and power doesn’t mean much to them (unfortunately) and they will not engage if you begin with vectors and data frames etc. In my first class, which is the only class where I demonstrate things rather than live code, I literally solve several assignments from their previous years’ classes using R, all with good visualisations and full reproducibility in an R Notebook.. GGplot is the gateway drug here – show them something cool that they can do in R and that is relevant to their study right away. Andy Heiss’ https://datavizm20.classes.andrewheiss.com/ and Claus Wilke’s https://clauswilke.com/dataviz/ materials are very good.
I was just thinking that I could show them a simple Shiny dashboard with real analysis we are conducting in my lab. Not necessarily to teach them Shiny (well, Necessarily not), but to show them the power and cool things you can do with R - especially as you mentioned, how quickly they can clean and analyze real-world data