Data Processing Hypothesis v0.1.0
| ← 1st Round | NA Investigation |
Goal
Compare methods of handling outliers in the training data.
We probably won’t be able to meaningfully predict extreme spikes regardless, so it makes sense to filter them in training data.
Winsorization - preferred method as tested in DRW comp where it significantly improved stability and predictive power of the model (even though result was measured with Spearman not weighted Spearman).
Result
Iterative Cross-Sectional Standard Scaler (prevents data leak). Iterative Winsorization took too much time and was cumbersome to implement, hence was discarded.