Why are the results from the linear spline regression showing physiologically implausible coefficients?
For example, consider the following formula:
model_nstatin <- lm(ldl ~ bs(pa, knots = c(600, 2400, 4200), degree = 1) +
bs(age, knots = c(45, 55, 65), degree = 1) + sex +
education + smoking + bs(mds, knots = c(4, 7), degree = 1),
data = chol_nstatin)
This model yields an intercept of 3.42 and spline coefficients for physical activity (measured in MET min/week) as follows:
- -0.047 for values less than 600
- -0.091 for values between 600 and 2400
- -0.112 for values between 2400 and 4200
- -0.102 for values greater than or equal to 4200
These coefficients are for LDL cholesterol measured in mmol/L. However, the predictions and the plot provided below are mathematically and physiologically reasonable. Why do the spline coefficients appear incorrect physiologically?
Here's the code used for prediction and plotting:
nstatin <- data.frame(
pa = seq(min(chol_nstatin$pa), max(chol_nstatin$pa), length.out = 117283),
age = mean(chol_nstatin$age),
sex = factor("Male", levels = levels(chol_nstatin$sex)),
education = factor("Vocational", levels = levels(chol_nstatin$education)),
smoking = factor("Ex-smoker", levels = levels(chol_nstatin$smoking)),
mds = mean(chol_nstatin$mds)
)
predictions_nstatin <- predict(model_nstatin, newdata = nstatin, interval = "confidence")
nstatin$ldl_pred <- predictions_nstatin[, "fit"]
nstatin$lower <- predictions_nstatin[, "lwr"]
nstatin$upper <- predictions_nstatin[, "upr"]
knots <- c(600, 2400, 4200)
x_limit <- 6000
x_interval <- 600
ggplot(nstatin, aes(x = pa, y = ldl_pred)) +
geom_line(linewidth = 1.2, color = "red") +
geom_ribbon(aes(x = pa, ymin = lower, ymax = upper), fill = "lightblue", alpha = 0.2) +
geom_vline(xintercept = knots, linetype = "dashed", color = "blue") +
scale_x_continuous(
limits = c(0, x_limit),
breaks = seq(0, x_limit, by = x_interval),
labels = seq(0, x_limit, by = x_interval)
) +
labs(title = "Linear Spline Fit with Confidence Intervals",
x = "Physical activity (MET min/week)", y = "LDL cholesterol (mmol/L)") +
theme_minimal()