error: abline only using first 2 of six regression coefficients in R

John2 · April 6, 2020, 8:38pm

I have a data frame of 392 row and 7 independent variables, with mpg being the dependent variable. I'm not using the last variable 'name' in my model as it is a factor.

Here is the top of the table:

          mpg cylinders displacement horsepower weight acceleration year origin
1  18         8          307        130   3504         12.0   70      1
2  15         8          350        165   3693         11.5   70      1
3  18         8          318        150   3436         11.0   70      1
4  16         8          304        150   3433         12.0   70      1
5  17         8          302        140   3449         10.5   70      1
6  15         8          429        198   4341         10.0   70      1
                       name
1 chevrolet chevelle malibu
2         buick skylark 320
3        plymouth satellite
4             amc rebel sst
5               ford torino
6          ford galaxie 500

Here is the structure of the data frame:

data.frame:	392 obs. of  9 variables:
 $ mpg         : num  18 15 18 16 17 15 14 14 14 15 ...
 $ cylinders   : int  8 8 8 8 8 8 8 8 8 8 ...
 $ displacement: num  307 350 318 304 302 429 454 440 455 390 ...
 $ horsepower  : int  130 165 150 150 140 198 220 215 225 190 ...
 $ weight      : int  3504 3693 3436 3433 3449 4341 4354 4312 4425 3850 ...
 $ acceleration: num  12 11.5 11 12 10.5 10 9 8.5 10 8.5 ...
 $ year        : int  70 70 70 70 70 70 70 70 70 70 ...
 $ origin      : int  1 1 1 1 1 1 1 1 1 1 ...
 $ name        : Factor w/ 304 levels "amc ambassador brougham",..:

For variable origin: 1 American, 2 German, 3 Japanese
I left this variable as integer for the lm model.

I ran lm and reduced the model to the significant variables:

 Call:
   lm(formula = auto.mpg$mpg ~ auto.mpg$displacement + auto.mpg$horsepower + 
    auto.mpg$weight + auto.mpg$year + auto.mpg$origin)

Residuals:
    Min      1Q  Median      3Q     Max 
-9.4882 -2.1157 -0.1645  1.8650 13.0544 

Coefficients:
                        Estimate Std. Error t value Pr(>|t|)    
(Intercept)           -1.669e+01  4.120e+00  -4.051 6.16e-05 ***
auto.mpg$displacement  1.137e-02  5.536e-03   2.054   0.0406 *  
auto.mpg$horsepower   -2.192e-02  1.078e-02  -2.033   0.0428 *  
auto.mpg$weight       -6.324e-03  5.685e-04 -11.124  < 2e-16 ***
auto.mpg$year          7.484e-01  5.089e-02  14.707  < 2e-16 ***
auto.mpg$origin        1.385e+00  2.772e-01   4.998 8.80e-07 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

All of these variables are either numeric or integer.
The only factor variable is "name" and that is not used in the lm model.

Once I have the lm model, I plot the residuals which display nicely (random and equal variance).

When I try to add the best fit line, I get the error:
Warning message:

In abline(auto.mpg.linear, col = "red") :
  only using the first two of 6 regression coefficients`

My research shows this error is due to a variable being a factor in the linear model,
but none of my regression inputs is a factors.

Any suggestions on how to get this to work.

FYI - I'm new to R programming and am trying to make this work with base functions in R.

technocrat · April 7, 2020, 9:52pm

Please see the FAQ: What's a reproducible example (`reprex`) and how do I do one? Using a reprex, complete with representative data will attract quicker and more answers. This question is more conceptual than coding, however, so it isn't strictly needed.

Let's consider the object returned by plot. It's a two-dimensional representation of data along two axes, x and y. By convention, in lm models the dependent variable is plotted along the x axis, so that it's possible to see the return value of the function lm and its arguments.

This works well with a dependent variable for y and a single dependent variable for x. For every one unit increment of x what is the change in y is the question it answers.

But what happens with six independent variables, that have different scales plus one is categorical and the others continuous?

Two dimensions won't cut it. The vector through 6-space is the trend line and that's difficult to show in 2-space.

In my admittedly limited understanding of intermediate to advanced multiple linear regression is that the alternative is to display the partial residuals piecewise.

A good resource (so far) for me Harrel's Regression Modeling Strategies.

system · April 28, 2020, 9:52pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.