Best way to graph a regression??

EphraMP · March 16, 2022, 5:35am

I've done simple polynomial regression. By some reason, this is the plot I'm getting:

Rplot

This is my code:


  mod = lm(D$Life.expectancy ~ poly(D$Adult.Mortality, 2))
  summary(mod)
  
  x4 = seq(min(D$Adult.Mortality), max(D$Adult.Mortality),
           length.out=length(D$Adult.Mortality))
  
  y4 = predict(mod, data.frame(x4), interval = "confidence")
  
  plot(D[,5], D[,4], col = as.factor(D$Status), pch = 16, 
     ylab = "Life expectancy", xlab = "Adult mortality")
  lines(x4, y4[,1], col = "blue", lwd = 3)

What's wrong with my code? Why am I getting such a weird graph? Also, is there a better way to graph the model? I feel that my code is not the best, by some reason.

rwalker · March 16, 2022, 6:26am

First, the best way to get help is to provide a reproducible example [reprex]; there are references for how to do it in the help here in the RStudio community. FAQ: What's a reproducible example (`reprex`) and how do I create one?

Second, I am pretty sure the problem is the lm call. Try modifying your lm to use the formula and data form.

mod = lm(Life.expectancy ~ poly(Adult.Mortality, 2), data=D)

You may also need to name the input, so alter the definition of x4

x4 = data.frame(Adult.Mortality = seq(min(D$Adult.Mortality), max(D$Adult.Mortality),
           length.out=length(D$Adult.Mortality)))

And remove the duplicate data.frame() call.

y4 = predict(mod, x4, interval = "confidence")

That should fix it.

EphraMP · March 17, 2022, 5:12pm

Thanks!!

This was very useful and worked as expected.
Thank you very much.

Is there some book you would recommend about the best R programming practices?

rwalker · March 17, 2022, 6:44pm

Glad to help. If you think your question was answered, can you mark it as such so that the question is shown as answered?

The logic behind the formula/data syntax is that the variables are defined in a context -- the data frame -- in the formula syntax while they are defined in a fashion that can't be separated from their context in something like df$x. This then induces a problem when we try to replace them to predict because R really only knows to look for df$x as a pair rather than a variable name in whatever new context we supply in the newdata= argument to predict. That said, I learned this because a student tried it, it failed, and they asked why. I don't know of a reference for it.

system · April 3, 2022, 12:45am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.