Stepwise regression in r-studio - can you run stepwise regression with the condition that it throws out any coefficients greater than 1,000 in value?

rstudio

#1

Looking for some help if this is possible. Essentially, would like to run a stepwise regression in r-studio with the added condition to throw out all coefficients that turn out to be greater than 1,000.

Thank you,


#2

To clarify, R is the language, and RStudio is the IDE – so, even if you want to use RStudio to use R (which is great, yay!) the language you're using is R . See the FAQ, below, for further disambiguation:

Since it's not an issue with the IDE itself, I'm going to remove the rstudio tag.

For stepwise regression, I recommend checking out this brief Stepwuse Regression Essentials guide.

You'll need to filter or subset to get just the cases with coefficients > 1,000. This can be done in any number of ways; for how to do this using base R see Filtering and subsetting in R – personally, I tend to do my wrangling/filtering in the tidyverse, in this case, you'd be using dplyr::filter(), which you can learn about here:
https://dplyr.tidyverse.org/reference/filter.html

You might also want to take a look at this webinar, Data wrangling with R and RStudio, to get familiar with some of these tasks.
https://www.rstudio.com/resources/webinars/data-wrangling-with-r-and-rstudio/


#3

Hi Mara,

Thank you for the response. Maybe I am misunderstanding your answer or my question was misinterpreted.

I am trying to get a stepwise regression and while its looking for the most "optimal" model from a non-intuitive perspective, I want the stepwise to throw out the final regression coefficients out that are greater than 1,000. I don't want to filter out the data itself, just want the stepwise regression to produce the most "optimal" model with the constraint that coefficients cannot be larger than 1,000 as a final output.

Is this what your answer helps solve?

Thank you,


#4

I think this might be a big enough divergence from standard stepwise regression approaches (which are focused the statistical significants of variables or overall fit) that you may not be able to rely on an existing package.
For example, an estimated coefficient that's too big is usually not a good reason to exclude it. (I know some exclude variables that are quite small and "practically insignificant".)

If you don't have too many independent variables, you might try an "All Possible Regression", and only consider models with coefficients below your desired threshold?


#5

That wouldn't work. I have >300 independent variables. I have enough data, I just need to find what the best automated approach is to achieve what I am looking for.