← Overview Team 2 → Code ↗

Round 1

Infrastructure

Concept: Independent feature and model selection with data overlaps

Step Data Range Purpose
1 30-70% Feature selection
2 30-70% Initial model selection
3 70-100% Grid search with CV

Scripts:

File Purpose Details
backtester.py Compare features/models 5-fold CV, Spearman per symbol, decaying avg over 96 periods
fsa_feature_importance.py Feature selection FSA method (faster than Boruta)

Workflow Details


Research

Topic Link Status
Preprocessing Notes Z-score, Winsorization, Denoising explored
Model Selection Notes TimeGPT, LSTM, RF, custom models
Coins Grouping Hypothesis · Findings Clustering by behavior
Exogenous Variables Hypothesis On-chain data, funding rates
Regimes Hypothesis · Moments Market regime identification
Data Processing Hypothesis · NA Strategy Missing data handling

Takeaways

Full Takeaways

Infrastructure:

  • S3 bucket for CUDA Docker caching
  • GPU optimization
  • Reusable scripts directory
  • Multiprocess feature selection
  • Digital Ocean persistent server ($4/month)

Process:

  • More time for meaningful research
  • Use nohup python script.py > output.log 2>&1 & for background calculations

Round 2

Infrastructure

Details Changes from R1

Strategy Ideas

All Ideas

Idea Link Result
ML Research Notes Poor results, abandoned
Coin Reduction Notes Volume-based filtering
Cointegration Notes Regime-based ECM approach