Imbalance Data in Regression Model

Hi R community,

I have a question related to modeling a machine learning regression model where data of independent variables is more than data of dependent variable. (i.e independent variables ~ 30,000 rows and dependent variable has only ~7,000 rows).

My questions are:

  • Would that be possible to make a regression model with such kind of data?
  • If yes, which method should I use to model it?
  • Can tidymodel be used in this case?

Thanks so much in advance :grinning:

Best regards,

This article might help:,with%20other%20variables%20as%20predictors).