Thank you so much @technocrat. I see the data scrubbing huddle you had to go through. Yeah, a few observations though:\
Using input derived from Average_Consumption > Availability,1,0 shouldn't suffice as that wasn't the response variable. The response variable is Status as I have indicated before now.
It still does not answer my question, that is, outputting a list of customers that bypassed based on the result of the test set.
I have cleaned the dataset in my Github page, please revisit my GitHub page for the new dataset. I really need to get off this and move on to building a shiny dashboard based on the results of the model.
Shiny deployment, I can't help you with. It's data science in the same way that PowerPoint is rhetoric in the way it is typically used— p-hacking for the masses. It's great as an EDA tool for users who know what they are about.
A model is a description of a population, not of an observation. When we say that a patient has a 0.02 probability of dying of COVID-19 exposure, that does not mean that an observed patient is 0.02 dead. The patient is either dead or not, 0 or 1. Only if the patient is a random observation from a normal population can we say anything useful about their status.
What you may need to be looking at is classification methods. Given an observation of meter readings and estimates of distribution line capacity, what is the likely status of a meter? For that see, the Irizzary text
Classification or prediction is a matter of definition and intent. In my case, I am trying to first classify existing customers based on their known energy theft history and bypass outcome. But it doesn't stop there, I want to use the classification model to predict the outcome of a new customer.
I understand you're busy and you have a lot on your table, that's why I am so appreciative of your efforts so far. I just need to get this part done. For the Shiny, I already have a template in mind.
We just need to move forward and get this part completed.