← Overview Algorithm Execution

Feature Engineering

Fourier Features

def add_fouriers(df, t_col='time', periods=[7, 30, 365], harmonics=3):
    out = df.copy()
    for p in periods:
        for k in range(1, harmonics + 1):
            out[f'fourier_P{p}_k{k}_sin'] = np.sin(2 * np.pi * k * out[t_col] / p)
            out[f'fourier_P{p}_k{k}_cos'] = np.cos(2 * np.pi * k * out[t_col] / p)
    return out

Rolling Windows

Track mean, std, z-score across multiple windows:

wins = [3, 7, 21, 63]
for w in wins:
    for col in feature_cols:
        df[f'{col}_mean{w}'] = df[col].rolling(w, min_periods=1).mean()
        df[f'{col}_std{w}']  = df[col].rolling(w, min_periods=1).std()
        df[f'{col}_z{w}']    = (df[col] - df[f'{col}_mean{w}']) / (df[f'{col}_std{w}'] + 1e-9)

SHAP Analysis

model = xgb.XGBRegressor(n_estimators=200, learning_rate=0.05, random_state=0)
model.fit(X_train_s, y_train, eval_set=[(X_test_s, y_test)], verbose=False)
explainer = shap.Explainer(model, X_train_s)
shap_values = explainer(X_test_s)

Parameters: n_estimators=200, learning_rate=0.05 optimal for small training data. Used TimeSeriesSplit to average SHAP values across folds.


Additional Tests

Test Purpose
ACF/PACF Applied to Y1 and Y2
Lasso Coefficients Linear association
Granger Causality Variable correlations at lags up to 79999
Periodogram Seasonality patterns in Y1 & Y2
PCA Feature importance

Workflow Details