Prior to PCA, center/scale vs. Box-Cox

More of a general question, but I am running a negative binomial regression model across multiple nested data sets. I have a recipe set up to center data, scale data, and then run a pca analysis (step_center, step_scale, then step_pca). I apologize if this is rather novice, but I haven't been able to find many references on this topic. Is there a preferred method of center/scale vs. Box-Cox transformations or should you do a box-cox and then center and scale (seems like it could be a bit overkill). From my reading, it seems that center/scale ends up doing the work of box-cox (plus putting it on the same scale).

I am looking into this because one of my data sets is giving me an error of "Error in svd(x, nu = 0, nv = k) : infinite or missing values in 'x'". I don't have any missing data and looking at the formula of svd, it seems that it is because whatever pre-processing being done is generating infinite values. So, I'm trying to figure out how to get rid of those infinite values while maintaining the integrity of the preprocessing.

You would have to do the Box-Cox prior to centering since the data has to be positive for the Box-Cox transformation (and centering would make a lot of the data negative). I think the infinite values are being induced because of this.

Alternatively, you could use the Yeo-Johnson, which is basically Box-Cox without the constraint.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.