Yes, the former. I will give the primary code below. I am building a linear model predicting home prices from Redfin.com data (Y = price) and I pass my entire dataframe (named 'bham' through to preProcess):
'bham' is my data frame.
range(na.omit(bham$price)) [130880, 3500000] ## range of original outcome variable prior to preProcess
bham <- read.csv('redfin_2021-05-03-13-50-46.csv', header = TRUE)
trans<-preProcess(bham, method = c("center", "scale", "BoxCox","spatialSign" ))
transformed <- predict(trans, bham)
m4 <- lm((price) ~ poly(square.feet, 3) + poly(year.built,2) + latitude + poly(longitude,3) , data = transformed)
P<-predict(m4, newdata = newdata, type = 'response')##PREDICT OUTCOME VARIABLE##
range(P) [-912424226099 , -487374100] ## range of transformed outcome variable##
So, this is where the trouble comes. I need to transform my predicted response back to original scale and I'm unsure of how to reverse the ordering from the order I passed to the method argument from preProcess above. preProcess doesn't seem to tell me which lambda was used for which variable. I also don't think a spatial-sign transformation needs a backtransform, correct?