How do you rescale data to be scored?

jflanner · October 6, 2019, 11:23am

Folks:

I have a classification model that is trained, tested and working fine. As part of the exercise - I rescaled the numeric features so that they are all between 0 and 1. From reading about rescale - I understand min(x) = 0, max(x) = 1 and everything in between is scaled proportionately between those 2.

Now I want to use the model to score real time data. My question is - how do I scale that data? The dataset I want to score is a single row.

Any help would be appreciated.

pieterjanvc · October 6, 2019, 12:40pm

Hello,

If you scale data to have values between 0 - 1, you usually use the normalization formula

Example: 1, 2, 1, 4 (min = 1, max = 4) --> 0.0, 0.25, 0.00, 1.0

When your model has been trained, new data needs to be scaled too before it can serve as input for the model. You do this by plugging the new values again into the formula, but using the min and max values of the data you used for training.

Example: 3 (min = 1, max = 4) -- > 0.75

There is one caveat here: If the min and max values are not the natural limits of the data, then new values might be larger than the max of the training or smaller than the min. In that case you'll end up with a scaled new value > 1 or < 0, respectively. You need to clip these to 1 or 0 before putting them into your model.

Example: 5 --> 1.33 (needs to be clipped) --> 1.00
Example: 0 --> -0.33 (needs to be clipped) --> 0.00

Hope this helps,
PJ

Max · October 7, 2019, 3:25pm

There is a recipe step that can do this for you. It gets the range from the training set and applies that range transformation to any data (i.e. train, test, un knowns, etc)

system · October 28, 2019, 3:25pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.