Hi, i'm trying to do the House Price Prediction (dataset from Kaggle), i want to use the lasso régression and the redige régression. But the probleme is that after using model.matrix, the number of columns reduced. I've already checked my dataset, i don't have any NA value. So is there any other raison that this situation happen? Can some one please help me to resolve this probleme, thank you a lot !
Here is what i do to the train dataset and the same thing to the test dataset : I only use the numeric variables and removed the ones with a lot of NA values. I just want to practice lasso and ridge régression.
Then, here is what i do to transform my data frame to matrix.
train_x <- model.matrix(SalePrice~., data = PBdata[, -SalePrice])
train_y <- PBdata$SalePrice
hi thank you for your response, i tried your code. I couldn't find any problem about the dropped variable, and even if i deleted these dropped variables, and tried again the transformation, there are still some dropped variables. So i'm totally confuses now.
They're the dropped variables at the first time :
"TotalBsmtSF""X1stFlrSF" "X2ndFlrSF" "LowQualFinSF"
Then i deleted them, and here're the dropped variables at the second time :
"GrLivArea" "BsmtFullBath" "BsmtHalfBath" "FullBath"
hi thank you for your response, i tried your code. I couldn't find any problem about the dropped variable, and even if i deleted these dropped variables, and tried again the transformation, there are still some dropped variables. So i'm totally confuses now.
They're the dropped variables at the first time :
"TotalBsmtSF""X1stFlrSF" "X2ndFlrSF" "LowQualFinSF"
Then i deleted them, and here're the dropped variables at the second time :
"GrLivArea" "BsmtFullBath" "BsmtHalfBath" "FullBath"
I'm looking at this and it seems very unusual syntax to me.
is PBdata a conventional data.frame or something else ?
The only way I can think that your code here might run is if SalePrice is not only a column in PBdata, but also some simple object that is maybe an integer vector, size 4, and it identifies 4 columns to drop each time you run this code.
Sale Price is a name of column in the PBdata, PBdata is a name that i define to a dataset.
So i should try to rename the Sale Price column, or try to find the object named by SalePrice and delete it ?