I have a classification model that is trained, tested and working fine. As part of the exercise - I rescaled the numeric features so that they are all between 0 and 1. From reading about rescale - I understand min(x) = 0, max(x) = 1 and everything in between is scaled proportionately between those 2.
Now I want to use the model to score real time data. My question is - how do I scale that data? The dataset I want to score is a single row.
When your model has been trained, new data needs to be scaled too before it can serve as input for the model. You do this by plugging the new values again into the formula, but using the min and max values of the data you used for training.
Example: 3 (min = 1, max = 4) -- > 0.75
There is one caveat here: If the min and max values are not the natural limits of the data, then new values might be larger than the max of the training or smaller than the min. In that case you'll end up with a scaled new value > 1 or < 0, respectively. You need to clip these to 1 or 0 before putting them into your model.
Example: 5 --> 1.33 (needs to be clipped) --> 1.00
Example: 0 --> -0.33 (needs to be clipped) --> 0.00
There is a recipe step that can do this for you. It gets the range from the training set and applies that range transformation to any data (i.e. train, test, un knowns, etc)