I often use geom_smooth() to plot smooths to my data. I just discovered that when the y-axis is transformed (e.g. log axis), geom_smooth unexpectedly uses the transformed data for its smooth. This gives a biased smooth compared to smoothing the raw data. As far I can see there is no warning in the documentation that this is happening.
library(ggplot2)
library(dplyr)
library(mgcv)
set.seed(1)
df <- data.frame(x = seq(0, 1, 0.01)) %>%
mutate(y = exp(runif(n()) + 6 * (x * (1 - x) * (0.5 - x) + 0.1 * x))) # data
mod <- gam(y ~ s(x, bs = "cs"), data = df, method = "REML") # smooth
df$pred <- predict(mod)
mod2 <- gam(y ~ s(x, bs = "cs"), data = df %>% mutate(y = log10(y)), method = "REML") # smooth in log10 space
df$pred2 <- 10 ^ predict(mod2)
df %>%
ggplot() +
labs(colour = "Smooth") +
geom_point(aes(x = x, y = y)) +
geom_smooth(aes(x = x, y = y, colour = "geom_smooth"), method = "gam", size = 4) +
geom_line(aes(x = x, y = pred, colour = "normal space"), size = 1) + # does not match geom_smooth
geom_line(aes(x = x, y = pred2, colour = "log space"), size = 1) + # matches geom_smooth
scale_y_log10()
#> `geom_smooth()` using formula 'y ~ s(x, bs = "cs")'
Created on 2021-12-20 by the reprex package (v2.0.1)