Best way to graph a regression??

I've done simple polynomial regression. By some reason, this is the plot I'm getting:

Rplot

This is my code:


  mod = lm(D$Life.expectancy ~ poly(D$Adult.Mortality, 2))
  summary(mod)
  
  x4 = seq(min(D$Adult.Mortality), max(D$Adult.Mortality),
           length.out=length(D$Adult.Mortality))
  
  y4 = predict(mod, data.frame(x4), interval = "confidence")
  
  plot(D[,5], D[,4], col = as.factor(D$Status), pch = 16, 
     ylab = "Life expectancy", xlab = "Adult mortality")
  lines(x4, y4[,1], col = "blue", lwd = 3)

What's wrong with my code? Why am I getting such a weird graph? Also, is there a better way to graph the model? I feel that my code is not the best, by some reason.

First, the best way to get help is to provide a reproducible example [reprex]; there are references for how to do it in the help here in the RStudio community. FAQ: What's a reproducible example (`reprex`) and how do I create one?

Second, I am pretty sure the problem is the lm call. Try modifying your lm to use the formula and data form.

mod = lm(Life.expectancy ~ poly(Adult.Mortality, 2), data=D)

You may also need to name the input, so alter the definition of x4

x4 = data.frame(Adult.Mortality = seq(min(D$Adult.Mortality), max(D$Adult.Mortality),
           length.out=length(D$Adult.Mortality)))

And remove the duplicate data.frame() call.

y4 = predict(mod, x4, interval = "confidence")

That should fix it.

Thanks!!

This was very useful and worked as expected.
Thank you very much.

Is there some book you would recommend about the best R programming practices?

Glad to help. If you think your question was answered, can you mark it as such so that the question is shown as answered?

The logic behind the formula/data syntax is that the variables are defined in a context -- the data frame -- in the formula syntax while they are defined in a fashion that can't be separated from their context in something like df$x. This then induces a problem when we try to replace them to predict because R really only knows to look for df$x as a pair rather than a variable name in whatever new context we supply in the newdata= argument to predict. That said, I learned this because a student tried it, it failed, and they asked why. I don't know of a reference for it.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.