Infrastructure

Concept

Features selected on 30-70% of data
Initial model selected from array on 30-70%
Grid Search with CV on 70-100%

General strategy: Independent selection of features and models with data overlaps. Models compared on same data for fair comparison. Best model selected from ~15 architectures and grid searched. Baseline approach due to data/time scarcity.

backtester.py

Workflow

Allows comparison of:

Sets of features
Models

Across:

Number of coins
Data size

Validation: 5-fold CV returning Spearman value and optional p-value per symbol. Spearman calculated as decaying average over 96 periods (24 hours of 15min intervals). Decay accounts for noise likelihood over time.

fsa_feature_importance.py

Workflow

Boruta method too complex with high computational demand. FSA chosen for different architecture not requiring iteration over N shadow features with repetitive RF fit.

feature_test.py (Legacy)

Feature Selection Method:

Feature importance
Permutations

Implementation:

Input feature functions to backtester
Process selected coins through features
Validate features for 96-step prediction

Validation:

Rolling window-based permutation tests (MI changes)
Returns proportion of permuted samples where MI > baseline as p-value
Jaccard Stability