Transformation of training dataset

Hi. May I get some general advice about when a transformation of the training data does or does not get applied to the testing data that is used to test the accuracy of a forecast?

If I am transforming independent variables such as by doing 1/x, √x. or e^(-x), I assume I want to do this to both the training data and the testing data.

If I am transforming the dependent variable such as by doing √y, arcsin √y, or ln(y), I assume I want to do this to both the training data and the testing data, but then the predicted values in the testing data are the inverse of the transformation.

If I am reducing the number of independent variables such as by doing a principal component analysis, I assume I want to do this to both the training data and the testing data.

If I am adjusting the sampling of the training data due to an imbalance of the distribution of the dependent variable (not really a transformation?), I assume this is an instance where I would NOT adjust the testing data but rather I would apply the model to the original unadjusted testing data.

Are there other common examples where a transformation of the training data does not require an adjustment to the testing data?

Also, is all of the above independent of the choice of the accuracy metric?

Thank you.

This looks tricky. See Kuhn & Johnson.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.