← 1st Round NA Investigation

Goal

Compare methods of handling outliers in the training data.

We probably won’t be able to meaningfully predict extreme spikes regardless, so it makes sense to filter them in training data.

Winsorization - preferred method as tested in DRW comp where it significantly improved stability and predictive power of the model (even though result was measured with Spearman not weighted Spearman).


Result

Iterative Cross-Sectional Standard Scaler (prevents data leak). Iterative Winsorization took too much time and was cumbersome to implement, hence was discarded.