Imbalance Data in Regression Model

Hi R community,

I have a question related to modeling a machine learning regression model where data of independent variables is more than data of dependent variable. (i.e independent variables ~ 30,000 rows and dependent variable has only ~7,000 rows).

My questions are:

  • Would that be possible to make a regression model with such kind of data?
  • If yes, which method should I use to model it?
  • Can tidymodel be used in this case?

Thanks so much in advance :grinning:

Best regards,
Dat

This article might help:

https://www.tandfonline.com/doi/full/10.1080/00223891.2018.1530680#:~:text=This%20is%20also%20known%20as,with%20other%20variables%20as%20predictors).