Mini-LLM Pretraining Framework
Self-contained PyTorch codebase for transformer LLMs from scratch: RoPE/NoPE, MoE, KV cache, LoRA, mixed-precision training. Config-driven scaling from small to billion-parameter models.
Themes & Publications
My work spans causal inference, model evaluation, and large-scale ML systems. The throughline is using rigorous statistical methodology to make black-box models easier to trust and reason about.
With Andrea Montanari at Granica. We estimate LOO Hessians via low-rank updates of the full-sample Hessian and mask eigen-directions associated with negative curvature in the LOO Hessian. The resulting prediction error for single-index data-generating processes with a mismatched Teacher–Student stays within single-digit percent.
Foundation models for tabular data that incorporate column semantics, with dataset generalization measured via per-example permutation of columns.
Self-contained PyTorch codebase for transformer LLMs from scratch: RoPE/NoPE, MoE, KV cache, LoRA, mixed-precision training. Config-driven scaling from small to billion-parameter models.
Core contributor on LLM-as-judge evaluation. Designed a self-critique framework with embedded rubrics achieving inter-rater reliability comparable to humans; proposed model-assisted estimation to combine human and model scores into an unbiased, lower-variance estimator.
Quasi-experiment with Guido Imbens. Bookmaker spreads serve as a conditioner for unanticipated effects; teams visiting cities with higher nightlife indices consistently underperform expectations. Replicates across NBA and MLB.
Summer research at Lawrence Livermore with Kaiser Research: machine-learned and Bayesian models for sepsis trajectory and prognosis from EHR signals.
Draft paper from a summer of research with Reza Zadeh at Stanford on a distributed algorithm for graph min-cut.
Quasi-experimental causal estimate using walk-on entry and mid-career injury as within-subject identification. Heterogeneous treatment effects by sport and entering SAT.