Hi! I've been recently going through the Azure Ml documentation and came across their article regarding data drift calculation for their deployed models: link.
As part of that feature they calculate the following metrics, which seems very comprehensive:
- Measures the magnitude of data drift, called the drift coefficient.
- Measures the data drift contribution by feature, informing which features caused data drift.
- Measures distance metrics. Currently Wasserstein and Energy Distance are computed.
- Measures distributions of features. Currently kernel density estimation and histograms.
- Send alerts to data drift by email.
In the R ecosystem there's also this experimental package that tries to capture data drift:
applicable, but it's still in it's infancy.
Nevertheless, I spent quite some time on the Internet trying to find a single answer on how exactly the drift coefficient is calculated and then how can the individual feature impact be decomposed. Could any of you refer me to some good materials on that?