Hello guys,

I have a data frame that has a high coefficient of variance (COV) for some parameters (above 100 % ). I tried different models to perform regression analysis, such as linear regression, MARS, and random forest I tried some algorithms The independent parameters with high COV happen to be the most significant factors in explaining the variation in the response variable.

Does the high COV impact the robustness of my models?. Other than overfitting issues, does it make my data questionable?

For regression using machine learning algorithms, is there a way to find an optimal number of data points?. In my case, increasing the sample size decreases the R-squared.

Thanks

Hello guys,

Any feedback?

I'm finding your question a little too vague to wrestle with.

Does the high COV impact the robustness of my models?

well, it makes fitting models non trivial. if all your variables were constant, what would there be to make a model out of ? if you outlaw high variance variables (whatever that means ) where does that leave you ?

Everything to do with model building is context dependent. i.e. Why are you building a model? you need to understand something scientifically, and to explain that to people ? you want to detect a disease in patients and are willing to accept some false postives so as to benefit from higher true positives, so that fewer folks dont slip through the net and miss treatment ?

what is the intent of the model building behaviour? Almost everything in model building world involves tradeoffs, so you need to be oriented to a goal in order to have a chance of picking a tradeoff that is favourable to your goal, if you are goaless then its almost arbitrary what model you make and will be satisfied with. At least in my understanding.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.