Essentially, I want to develop a code that would automatically run and produce seperate regressions based on a 'key'. For instance, in the above example we would have 2 regression equations (since there is 2 keys - 1 and 2). I need to run 500 separate equations based on unique identifiers, call them keys. Would be nice to also have an option to do the same and run a stepwise for additional insights. I want to accomplish this task in R-Studio.
I guess you may want to take a look at the plm package and treat your key as a fixed effect factor. If you really want to run separate regressions for each group, there are at least 2 ways you can do that. First, use for loop to create a list consisting of all the regression models. Second, work on the data frame with dplyr and use the group_by and mutate to create columns of your interests (coefficients, r squared, etc.)
This is a common analysis pattern that is often called “split, apply, combine” because you are splitting your data into groups based on a key value, applying a model to each group, and then combining the results back together for further inspection. As @Peter_Griffin mentioned, there are a few different ways to write code that does this. This link goes through one tidyverse approach in detail: http://stat545.com/block024_group-nest-split-map.html
The following example uses purrr to solve a fairly realistic problem: split a data frame into pieces, fit a model to each piece, compute the summary, then extract the R2.
split(.$cyl) %>% # from base R
map(~ lm(mpg ~ wt, data = .)) %>%
#> 4 6 8
#> 0.5086326 0.4645102 0.4229655
In all cases, the first step is to write modeling code that does what you want for one piece of your dataset. Can you post an example of the model code you have in mind? That would make it a lot easier for people to help you with splitting-applying-combining that model to your data.
P.S. You also might attract more interested helpers if you changed the title of your post to something more specific to your problem.
Thank you for your help. Based on the sample data in the OP, say I have this model.
fit <- lm(y~., data = select(df, key==1))
This will select key = 1 and apply a regression model on all x variables. How do I automate this process for all key's, say 500 of them and produce a summary of each model in the form of say a matrix.
For example a summary would look like this. It doesn't have to be exactly that as I need the p-values as well. But something that I can then use and calculate the answers to different x variables quickly:
Hi there @g3lo, I think you are looking for something along the lines of this code, where you nest the data by key and fit a model to each dataset and then extract out the summaries using a combination of purrr and broom::glance