Is there an easy way to unscale coefficients with tidymodels?

Hi all,

I was wondering, do any of you know any easy, possibly out-of-the-box methods for unscaling coefficients from a model using tidymodels?

For this exercise, I'm using a ridge regression, if that matters. And apologies if this is a dumb question, I haven't found a good answer yet in my searching.

Thanks!

Not a dumb question but I don't have a great answer.

If you are using a recipe to normalize, you can use the tidy() method to get the means/std's to unscale the coefficients. If you let the model do the scaling, then you'll need to get them from that object.

Thanks for the reply, Max.

I was wondering if you [or anybody else] could give me a bit of insight into understanding what units the coefficients in my model are and possibly how to return the coefficients to their unscaled form, since I'm kinda confused with my current interpretation.

I set standardize = FALSE in my glmnet model, but I normalized my data in the recipe, so I'd assume my coefficients are only standardized now.

So to "unscale" the coefficient, I assume that I'd first have to multiply the coefficient by its unscaled standard deviation, and then add its unscaled mean, right? At least according to
this it seems like that should be the case.

But when I do that, the coefficient makes no sense.

It's a bit tricky for me to post the data one would need to reproduce my results, but here's my code:

library(tidymodels)
library(tidyverse)

# preps data for model
myrecipe <- mydata %>%
  recipe(transactionrevenue ~ sessions + channelgrouping + month + new_user_pct + is_weekend) %>%
  step_novel(all_nominal(), -all_outcomes()) %>%
  step_dummy(month, channelgrouping, one_hot = TRUE) %>%
  step_zv(all_predictors()) %>%
  step_normalize(sessions, new_user_pct) %>%
  step_interact(terms = ~ sessions:starts_with("channelgrouping") + new_user_pct:starts_with("channelgrouping"))
  
# creates the model
mymodel <- linear_reg(penalty = 10, mixture = 0.2) %>%
  set_engine("glmnet", standardize = FALSE)

wf <- workflow() %>%
  add_recipe(myrecipe)

model_fit <- wf %>%
  add_model(mymodel) %>%
  fit(data = mydata)
  
# posts coefficients
tidy(model_fit)

If it would help, here's some information that might be useful:

The variable that I'm really focusing on is "sessions." In the model, the coefficient for sessions is 2543.094882 , and the intercept for the model is 1963.369782 . The penalty is also 10 .

The unscaled mean for sessions is 725.2884 and the standard deviation is 1035.381 .

When I multiply the coefficient of sessions by the standard deviation and then add back the mean, I get 2633797.41042 which makes no sense in terms of this situation -- it should probably be a number from 0 to 5.

Hopefully I'm just doing something dumb, because this is driving me a bit cuckoo. Any insight would be very much appreciated.

what intuition do you rely on to make this assumption?

Well the 0-5 is from running an unscaled OLS model. So it's not necessarily a perfect assumption, but that's around where the coefficient is through that framework.

But given the units of the unscaled independent and dependent variables of interest, the number definitely shouldn't be 2633797.41042, because that just doesn't add up.

After some trial and error, it looks like the trick is to just divide the scaled coefficient by the standard deviation, which I'm not really sure I understand. I would think, if anything, I should be multiplying by the standard deviation? But, oh well, it works.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.