Does themis package feature functions for dealing with continuous data imbalance?

Dear community,

I recently discovered the themis package as part of the TidyModels framework. I really like the approach of themis as it creates great additions to the already extensive recipe package.

The themis package is great when one needs to account for categorical data imbalances, however, it is not immediately clear to me how it might be used for continuous data.

See here also a great source for the issues that arise with continuous data imbalances: https://towardsdatascience.com/strategies-and-tactics-for-regression-on-imbalanced-data-61eeb0921fca

My question is: Does the themis package have functions that are relevant for dealing with continuous data imbalance? – and if not, are there similar functions from alternative packages that allow for adding recipe steps to account for continuous data imbalances?

None of the implemented methods in {themis} can deal with unbalanced regression problems natively.

I did some searching and I found a SMOTE related method called SMOGN, http://proceedings.mlr.press/v74/branco17a/branco17a.pdf that tries to do what SMOTE does, but to regression problems instead. I'll add a issue and hopefully get to it soon.

1 Like

Dear Emil,

Thanks. Yeah it would be wonderful to have a TidyModels themis update to allow for unbalanced regression regression issues. Im excited already. Let me know when ready :slight_smile: Appreciate your response!