Thanks for the thoughtful post!
Yes, absolutely. I can add a function to parsnip that could produce weights that are inversely proportional to the frequencies (or a custom weighting scheme). Any other suggestions are very welcome.
One thing that I hope to work more on this year is to make our interface to the various cost-sensitive learning models better/more consistent (this is unrelated to case weights though).
Importance weights, especially for class imbalances, should not directly interact with performance measures. Much like subsampling methods, the holdout data that is used to measure performance should represent the data as seen "in the wild". That means that they shouldn't be subsampled or impacted by methods to emphasize different data points.
The weights affect performance indirectly since the model is influenced by them.
However, it is important to make sure the right metric is being used (as you said above). If you want to strengthen the model for the minority class, optimizing on overall accuracy might obviate the effect of the case weights. We can do some documentation about that (and that's already on my list of updates for the second edition of Applied Predictive Modeling).
Very good point. It would be difficult to come up with a guideline for which approach should be used when but we can try a few things and show, for specific data sets, if there actually is a difference in say down-sampling and down-weighting (that might achieve the same effect). Again, more substrate for vignettes and APM.