I am learning R programming by way of reviewing basic stats with Julian Faraway's book "Linear Models with R". Very early in the book he gives example code to implement a Permutation Test on one of his datasets (included in the "faraway" package).
Am I correct that using a for() looping construct in this manner is a poor habit to get into when programming in R?
In theory I'd think the sort of thing he's doing is best implemented with one of the apply-family functions. But for some reason I seem to have a total mental block about how exactly one passes parameters into and out of a non-built-in function supplied to the apply().
Here's the way he does it. I just spent an hour-plus looking at various examples of apply() and can't seem to get my mind around it. Not sure what it is about apply() that I find so opaque.
As see you use tidyverse . In the tidyverse, there is the purrr that helps with everything that is iteration. You'll find a function purrr::rerun that will repeat a function n times. I think this is what you are looking for.
library(faraway)
library(tidyverse)
data(gala, package="faraway")
lmod <- lm( Species ~ Nearest + Scruz, gala)
lms<-summary(lmod)
nreps <- 4000
set.seed(123)
# Use rerun to repeat the operation nreps times
fstats <- purrr::rerun(nreps, {
lmods <- lm(sample(Species)~Nearest+Scruz,gala)
summary(lmods)$fstat[[1]]
}) %>%
# flatten the list to a vector
flatten_dbl()
# get the mean search
mean(fstats > lms$fstat[[1]])
#> [1] 0.55825
Wow, that's better advice than I'd even hoped for. Since I'm learning R nearly from scratch I've been trying to whenever possible go ahead and learn the tidy/dplyr/ggplot2/etc. forms of things rather than sticking to base R. I will take the second suggestion (using broom) under advisement!
In the meanwhile I ended up slightly elaborating your suggested approach by moving the comparison to the original fitted model F-statistic inside the loop. So the vector we end up with has 1's and 0's that can be piped directly into mean().
The drawback would be that I'm basically discarding the actual F-stats so in the big picture brooming everything into a summary "tibble" is going to be worth doing.
# Use rerun to repeat the operation nreps times
fstats_gt <- purrr::rerun(nreps, {
lmods <- lm(sample(Species)~Nearest+Scruz,gala)
summary(lmods)$fstat[[1]]>lms$fstat[1]
}) %>%
# flatten the list to a vector
flatten_dbl() %>%
mean()
broom is really awesome to get tidy model ouput.
You should also take a look at tidymodels
And If your question's been answered, would you mind choosing a solution? It helps other people see which questions still need help, or find solutions if they have similar problems. Here’s how to do it: