Oneway() residuals

pollycp · October 23, 2018, 12:36am

I have read through the documentation for oneway model in the lattice package, but I am unclear as to whether the residuals generated are based on the full data set (pooled) or the grouped (by factor) data.
Does anyone have insight on this? Thanks!!

joels · October 23, 2018, 5:36am

I haven't used the oneway function before, but we can compare it with the same model generated with aov (ANOVA) and lm (linear regression) and see that they're all reporting exactly the same residuals and fitted values and also that the residuals are the difference between each observation and the model fit for that observation. Since we only have Species in the model, the model fit is always the same for a given level of Species.

library(tidyverse)
library(lattice)

# Models
f = formula(Sepal.Length ~ Species)
m = list(oneway = oneway(f, data=iris),
         aov = aov(f, data=iris),
         lm = lm(f, data=iris))

# Extract residuals and fitted values
d = map_df(m, ~data.frame(fitted=fitted(.x), resid=resid(.x)), .id="model")

ggplot(d %>% group_by_all() %>% summarise(), 
       aes(fitted, resid, colour=model, size=model)) +
  geom_point() + 
  scale_colour_viridis_d() +
  scale_size_manual(values=c(4,3,2)) +
  theme_classic()

Rplot02

Direct check for all three models that reported residuals are equal to Sepal.Length (the observations) minus the fitted values from the model:

d %>% 
  mutate(Sepal.Length = rep(iris$Sepal.Length, 3),
         check = all.equal(Sepal.Length - fitted, resid)) %>% 
  group_by(model, check) %>% 
  tally

  model  check     n
1 aov    TRUE    150
2 lm     TRUE    150
3 oneway TRUE    150