Highlights
- Built a mixed-frequency financial forecasting pipeline spanning January 1991 to October 2025, combining daily market variables such as VIX, Treasury yields, and yield spreads with monthly CPI-derived inflation releases.
- Aligned macroeconomic variables using ALFRED vintage dates to reduce look-ahead bias, then evaluated models across chronological train, validation, and test windows covering COVID-19, post-pandemic, and Fed-tightening regimes.
- Implemented and compared TFT, LSTM, and ARIMAX baselines, then added domain-specific TFT modifications including directional-diversity penalties, regime-conditional outputs, multi-task classification, and VIX-conditioned attention gates.
- Found that regime-aware attention improved weekly directional accuracy from 57.9% to 59.4% and Sharpe ratio from 1.05 to 1.22, while learned gates dampened attention in low-volatility regimes and amplified it in high-volatility regimes.
- Used attention gates, variable-selection weights, gradient-flow plots, prediction variance, and directional-bias diagnostics to explain when the model learned useful regime structure and when the output layer collapsed.
Key metrics
Media
Tech stack
Overview
This project investigates whether Temporal Fusion Transformers can forecast S&P 500 returns when market data arrives at different temporal resolutions. Daily variables such as VIX update continuously, while macroeconomic variables such as CPI-derived inflation arrive monthly and can become stale before the next release.
The portfolio version emphasizes the engineering story: building the dataset correctly, avoiding look-ahead bias, comparing strong baselines, diagnosing model collapse, and adding regime-aware mechanisms that make the model behavior more interpretable.
Problem and constraints
- Financial returns are noisy and close to random-walk behavior, so even small directional improvements must be treated carefully and evaluated over multiple market regimes.
- Mixed-frequency features create a stale-information problem: a 30-day-old macro release should not be treated with the same freshness as yesterday's volatility signal.
- Market non-stationarity creates regime shifts where relationships between volatility, yields, inflation, and returns change across bull, bear, crisis, and tightening periods.
- The project is framed as an evaluation and interpretability system, not a claim of deployable trading alpha.
Data pipeline
The dataset spans January 1991 to October 2025 and combines S&P 500 returns with daily market indicators and lower-frequency macroeconomic releases. Features include VIX, 10-year Treasury yield, 10Y-2Y yield spread, and CPI-derived inflation with publication lag handling.
To reduce look-ahead bias, macroeconomic records were aligned using ALFRED vintage dates so each historical prediction only sees data that would have been available at that point in time.
- Chronological split: train on 1991-2015, validate on 2015-2020, test on 2020-2025.
- Validation covers COVID-era disruption; testing covers post-pandemic conditions and 2022-2023 Fed-tightening regimes.
- VIX-based regime labels are used for interpretability, attention gating, and rolling-window robustness analysis.
Modeling approach
- Temporal Fusion Transformer: LSTM encoder/decoder for local temporal structure plus interpretable multi-head attention for longer-range dependencies and variable selection.
- Quantile forecasting: predicts seven quantiles (0.02, 0.10, 0.25, 0.50, 0.75, 0.90, 0.98) using quantile loss to represent asymmetric uncertainty.
- Baselines: ARIMAX for statistical comparison and a three-layer LSTM baseline for recurrent deep learning comparison on the same prediction task.
- Training controls: constrained TFT hidden size after larger hidden dimensions produced degenerate positive-return predictions with low output variance.
Regime-aware attention
The central architectural experiment adds lightweight VIX-conditioned attention gates. Each attention head is multiplied by a learned regime-specific gate, letting the model specialize attention behavior under low- and high-volatility conditions with only four learned parameters for two regimes and two heads.
The resulting gates learned intuitive behavior: low-volatility periods dampened attention to about 0.46, while high-volatility periods amplified attention to about 0.57. In the weekly setting this lifted directional accuracy from 57.9% to 59.4% and improved Sharpe ratio from 1.05 to 1.22.
- Low-volatility regime: attention is dampened, suggesting the model relies less on sharp historical re-weighting when markets are calmer.
- High-volatility regime: attention is amplified, suggesting the model searches harder for regime-specific temporal signals during stress periods.
- Feature-weight analysis showed VIX becoming dominant during the 2022 bear market and yield spread becoming more important in later bull-market windows.
Failure analysis and diagnostics
A major learning from the project was that validation loss alone was not enough. Multiple experiments converged to similar validation loss while producing very different downstream behavior, including prediction collapse.
Gradient-flow diagnostics showed cases where encoder and decoder layers continued learning useful structure while the output layer collapsed early, indicating a mismatch between representation learning and the final prediction objective.
- Tracked prediction variance, directional bias, gradient norms, quantile loss, directional accuracy, hit rate, Sharpe ratio, RMSE, MAE, and max drawdown.
- Added directional-diversity and anti-collapse penalties to discourage batches where almost all predictions share the same sign.
- Used multi-task classification to test whether the encoder could learn VIX-based regime structure even when daily direction prediction remained noisy.
Results and interpretation
- Weekly regime-aware attention improved directional accuracy from 57.9% to 59.4%, a 1.5 percentage-point lift in a very noisy forecasting task.
- The same weekly setting improved Sharpe ratio from 1.05 to 1.22, suggesting that the regime-conditioned attention mechanism improved the quality of selected signals, not just the raw classification metric.
- Variable-selection weights became more regime-adaptive: VIX rose to about 0.74 during the 2022 bear market compared with about 0.31 in the baseline visualization, while yield spread dominated the 2024 bull-market window at about 0.54.
- The project ultimately shows both promise and limits: transformer encoders can learn meaningful regime structure, but noisy financial targets require careful evaluation, interpretation, and failure-mode analysis.
Engineering contribution
- Designed reproducible data preprocessing and chronological evaluation workflows for mixed-frequency financial time series.
- Implemented TFT, LSTM, and ARIMAX experiments with comparable inputs and evaluation metrics.
- Built diagnostics to expose output collapse, gradient disconnect, prediction diversity, and regime-specific attention behavior.
- Produced clear visualizations for attention gates, variable-selection weights, rolling regimes, and architecture-level modifications.
Next steps
- Add multi-horizon and multi-target forecasting across S&P 500 constituents rather than a single index-level target.
- Compare additional architectures such as Informer, Autoformer, and non-stationarity-aware time-series models on the same vintage-aligned dataset.
- Add a Streamlit or Next.js demo that allows users to inspect a forecast window, attention weights, feature weights, and regime classification side-by-side.
- Turn the notebook/report artifacts into a fully reproducible experiment package with locked data snapshots and automated figure generation.
Related projects

Vision-Guided Differentiable Physics for Robotic Manipulation
Robotics · 2025
A presentation-style robotics case study that connects RGB observations, robot state, temporal Transformer prediction, differentiable rollout losses, and Isaac Sim Franka data to learn multi-step manipulation behavior from visual context.

Automated Goalie: Ping Pong Ball Trajectory Prediction System
Robotics · 2025
A closed-loop robot-learning prototype that detects a ping pong ball, estimates its 3D motion, predicts the landing point, and rotates a servo-driven blocker in real time on a Raspberry Pi-based hardware setup.
Event-Based Star Tracking for Spacecraft Attitude Estimation
Space Autonomy · 2026
A Speed-Aware EBS-EKF research prototype for event-camera star tracking that improves low-light spacecraft attitude estimation by making centroid correction depend on both brightness and image-plane speed.
OwnerPilot - AI Operating Copilot for SMB Owners
AI Product Engineering · 2026
Open-source decision intelligence platform that helps smaller business owners import records, track obligations, investigate cash and margin shifts, forecast scenarios, and execute evidence-backed operating actions.