Why are the results from the linear spline regression showing physiologically implausible coefficients?

Why are the results from the linear spline regression showing physiologically implausible coefficients?

For example, consider the following formula:


model_nstatin <- lm(ldl ~ bs(pa, knots = c(600, 2400, 4200), degree = 1) + 
              bs(age, knots = c(45, 55, 65), degree = 1) + sex + 
              education + smoking + bs(mds, knots = c(4, 7), degree = 1), 
            data = chol_nstatin)

This model yields an intercept of 3.42 and spline coefficients for physical activity (measured in MET min/week) as follows:

  • -0.047 for values less than 600
  • -0.091 for values between 600 and 2400
  • -0.112 for values between 2400 and 4200
  • -0.102 for values greater than or equal to 4200

These coefficients are for LDL cholesterol measured in mmol/L. However, the predictions and the plot provided below are mathematically and physiologically reasonable. Why do the spline coefficients appear incorrect physiologically?

Here's the code used for prediction and plotting:


nstatin <- data.frame(
  pa = seq(min(chol_nstatin$pa), max(chol_nstatin$pa), length.out = 117283),
  age = mean(chol_nstatin$age),
  sex = factor("Male", levels = levels(chol_nstatin$sex)),
  education = factor("Vocational", levels = levels(chol_nstatin$education)),
  smoking = factor("Ex-smoker", levels = levels(chol_nstatin$smoking)),
  mds = mean(chol_nstatin$mds)
)

predictions_nstatin <- predict(model_nstatin, newdata = nstatin, interval = "confidence")

nstatin$ldl_pred <- predictions_nstatin[, "fit"]
nstatin$lower <- predictions_nstatin[, "lwr"]
nstatin$upper <- predictions_nstatin[, "upr"]

knots <- c(600, 2400, 4200)
x_limit <- 6000
x_interval <- 600

ggplot(nstatin, aes(x = pa, y = ldl_pred)) +
  geom_line(linewidth = 1.2, color = "red") +
  geom_ribbon(aes(x = pa, ymin = lower, ymax = upper), fill = "lightblue", alpha = 0.2) +
  geom_vline(xintercept = knots, linetype = "dashed", color = "blue") +
  scale_x_continuous(
    limits = c(0, x_limit),
    breaks = seq(0, x_limit, by = x_interval),
    labels = seq(0, x_limit, by = x_interval)
  ) +
  labs(title = "Linear Spline Fit with Confidence Intervals", 
       x = "Physical activity (MET min/week)", y = "LDL cholesterol (mmol/L)") +
  theme_minimal()


This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.