Transformation of training dataset

fcas80 · December 30, 2022, 6:51pm

Hi. May I get some general advice about when a transformation of the training data does or does not get applied to the testing data that is used to test the accuracy of a forecast?

If I am transforming independent variables such as by doing 1/x, √x. or e^(-x), I assume I want to do this to both the training data and the testing data.

If I am transforming the dependent variable such as by doing √y, arcsin √y, or ln(y), I assume I want to do this to both the training data and the testing data, but then the predicted values in the testing data are the inverse of the transformation.

If I am reducing the number of independent variables such as by doing a principal component analysis, I assume I want to do this to both the training data and the testing data.

If I am adjusting the sampling of the training data due to an imbalance of the distribution of the dependent variable (not really a transformation?), I assume this is an instance where I would NOT adjust the testing data but rather I would apply the model to the original unadjusted testing data.

Are there other common examples where a transformation of the training data does not require an adjustment to the testing data?

Also, is all of the above independent of the choice of the accuracy metric?

Thank you.

technocrat · December 30, 2022, 8:10pm

This looks tricky. See Kuhn & Johnson.

system · January 20, 2023, 8:10pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.