Prior to PCA, center/scale vs. Box-Cox

recipes
caret
#1

More of a general question, but I am running a negative binomial regression model across multiple nested data sets. I have a recipe set up to center data, scale data, and then run a pca analysis (step_center, step_scale, then step_pca). I apologize if this is rather novice, but I haven't been able to find many references on this topic. Is there a preferred method of center/scale vs. Box-Cox transformations or should you do a box-cox and then center and scale (seems like it could be a bit overkill). From my reading, it seems that center/scale ends up doing the work of box-cox (plus putting it on the same scale).

I am looking into this because one of my data sets is giving me an error of "Error in svd(x, nu = 0, nv = k) : infinite or missing values in 'x'". I don't have any missing data and looking at the formula of svd, it seems that it is because whatever pre-processing being done is generating infinite values. So, I'm trying to figure out how to get rid of those infinite values while maintaining the integrity of the preprocessing.

0 Likes

#2

You would have to do the Box-Cox prior to centering since the data has to be positive for the Box-Cox transformation (and centering would make a lot of the data negative). I think the infinite values are being induced because of this.

Alternatively, you could use the Yeo-Johnson, which is basically Box-Cox without the constraint.

1 Like

closed #3

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.

0 Likes