Where I've worked and what I've built.

Capital One

Apr 2024 — Present
Principal Data Scientist, Applied Research·

I lead inference optimization initiatives for production large language model systems, focusing on custom kernel development and system-level performance improvements.

  • Coordinate technical efforts across Applied Research, Infrastructure, and customer organizations, bridging research innovation with production deployment.
  • Led optimization work for Capital One's first customer-facing AI product, achieving 43% latency reduction and 34% throughput gains through custom kernels and serving infrastructure improvements.
  • Guide senior engineers on GPU kernel development, profiling techniques, and production inference engineering.
Senior Data Scientist, Applied Research·

I worked on production inference systems for large language models, developing optimization techniques that powered customer-facing AI applications.

  • Set up optimized inference pipelines using Triton Inference Server and TensorRT-LLM for production deployment.
  • Reduced latency by 30% for business-critical agentic application, enabling deployment of system supporting $80M in operational value through batching, prefix-cache routing, and pipeline optimizations.
  • Developed custom CUDA and Triton kernels for speculative decoding, achieving 40% training speedup through fused GPU operations and reduced memory-bound overhead.