Can I use a multiple linear regression to predict a certain variable

I am trying to generate a linear regression to predict the sale price of some cars from the library "imports85", My code is as follows: ´


library(tidyverse)
library(rpart)
library(rpart.plot)
library(randomForest)
data("imports85")
db<-imports85
View(db)
db<-db[,-1 ]
db<-db[,-1 ]
set.seed(0)

library(fastDummies)
library(naniar)

vis_miss(db)
db <- na.omit(db)
vis_miss(db)

db2 <-dummy_cols(db, select_columns=c("make", "fuelType", "aspiration", "numOfDoors", 
                                     "bodyStyle", "driveWheels", "engineLocation", 
                                     "engineType", "numOfCylinders", "fuelSystem"),remove_first_dummy=T,
                remove_selected_columns=T )


ind <- sample(2, nrow(db2), replace = TRUE, prob = c(0.5, 0.5))

train2 <- db2[ind==1,]
test2 <- db2[ind==2,]

model <- lm(price ~ ., data = train2)
summary(model)

classPred2 <- predict(object = model, test2)
classPred2

My first question comes from model <- lm(price ~ ., data = train2). Since I have a lot of columns in the matrix, I cannot write all of them at the right side of the ~. Am I using this way the price to predict the price. Should I remove it from the right part somehow?

My second question comes from classPred2 <- predict(object = model, test2), I don't know how the prediction works, since I am using test2, which includes the price, which is the variable that I am trying to predict. Should I remove the column in question?

Any answer is appreciated.
Best regards.

This runs an ordinary least square regression of price against all other variables in train2 (meaning of ~ .) to create an object named model. Whether all of them are useful depends on the data.

this applies the model named model to the test2 object to assess how well model predicts the fit in terms of some goodness of fit evaluation.

Creating a model with lm is very straightforward; however, interpreting the model, especially if it has many coefficients can be difficult.

The {olsrr} package

provides tools for evaluating the results. The {rms} package and accompanying text provide advanced tools for the same job.

Thanks for the answer, now I undestand it all way better

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.