Thank you so much @technocrat, you have been so helpful. I have rewritten the code and I think it makes sense now. My goal is to see those customers whose status are either 1 or 0. Since my RAM wasn't enough to perform the assigned task, I had to resort to removing some columns that slowed down my machine.
"Then, if pred_value
is supposed to be a predicted value of status
in a logistic model, you would need to dig out the log likelihood. (See my post here , which is based on the standard text.) If, on the other hand, it's supposed to be the estimates of the independent variables, those need to be extracted from the model output."
can you buttress further on the log-likelihood part and the estimates?
Below is my new codebase:
#remove unwanted columns
model_input_df <- ml[, c(-1,-2,-3,-4,-5,-6,-7,-9)]
glimpse(model_input_df)
#Preliminary casting to the appropriate data type.
model_input_df$Status <- as.factor(model_input_df$Status)
model_input_df$Feeder <- as.character(model_input_df$Feeder)
model_input_df$group_cons <- as.factor(model_input_df$group_cons)
#...........................................................................
#...........................................................................
#BUILDING THE MACHINE LEARNING MODEL/partitioning the data
intrain<- createDataPartition(model_input_df$Status,p=0.75,list=FALSE)
set.seed(2017)
training<- model_input_df[intrain,]
testing<- model_input_df[-intrain,]
#memory.limit(size = 56000)
#............................................................................
#Confirm the splitting is correct:
dim(training); dim(testing)
#Fitting the Logistic Regression Model:
LogModel <- glm(Status ~ .,data=training,family=binomial, maxit=100)
print(summary(LogModel))
#...............................................................................
#colnames(model_input_df)
#LogModel <- c(1, 2, 3, 4, 5,6,7,8,9)
# binding them together using rbind function of Base R
#final_df <- rbind(ml[, c(-1, -2,-3,-4,-5,-6,-7)], "pred_values" = LogModel)
#head(final_df)
#saveRDS(LogModel, "logmodel.rds")
#..............................................Adding Acc No back.....................
#Feature Analysis:
anova(LogModel, test="Chisq")
head(testing)
#Assessing the predictive ability of the Logistic Regression model
#testing$Status <- as.character(testing$Status)
#testing$Status[testing$Status=="0"] <- "0"
#testing$Status[testing$Status=="1"] <- "1"
fitted.results <- predict(LogModel,newdata=testing,type='response')
fitted.results <- ifelse(fitted.results > 0.5,1,0)
misClasificError <- mean(fitted.results != testing$Status)
print(paste('Logistic Regression Accuracy',1-misClasificError))
#class(testing$Average.Consumption)
final_df <- rbind(ml[, c(1,2,3,4,5,6,7,9)],"Pred_values"=fitted.results)
But it throws in a warning message
Warning messages:
1: In `[<-.factor`(`*tmp*`, ri, value = 0) :
invalid factor level, NA generated
2: In `[<-.factor`(`*tmp*`, ri, value = 0) :
invalid factor level, NA generated
3: In `[<-.factor`(`*tmp*`, ri, value = 0) :
invalid factor level, NA generated
4: In `[<-.factor`(`*tmp*`, ri, value = 0) :
invalid factor level, NA generated
5: In `[<-.factor`(`*tmp*`, ri, value = 0) :
invalid factor level, NA generated
6: In `[<-.factor`(`*tmp*`, ri, value = 0) :
invalid factor level, NA generated
7: In `[<-.factor`(`*tmp*`, ri, value = 0) :
invalid factor level, NA generated
I just want to merge the predicted outcome with the list of customers in this case.