Capital One
Apr 2024 — Present
Principal Data Scientist, Applied Research·
I lead inference optimization initiatives for production large language model systems, focusing on custom kernel development and system-level performance improvements.
- —Coordinate technical efforts across Applied Research, Infrastructure, and customer organizations, bridging research innovation with production deployment.
- —Led optimization work for Capital One's first customer-facing AI product, achieving 43% latency reduction and 34% throughput gains through custom kernels and serving infrastructure improvements.
- —Guide senior engineers on GPU kernel development, profiling techniques, and production inference engineering.
Senior Data Scientist, Applied Research·
I worked on production inference systems for large language models, developing optimization techniques that powered customer-facing AI applications.
- —Set up optimized inference pipelines using Triton Inference Server and TensorRT-LLM for production deployment.
- —Reduced latency by 30% for business-critical agentic application, enabling deployment of system supporting $80M in operational value through batching, prefix-cache routing, and pipeline optimizations.
- —Developed custom CUDA and Triton kernels for speculative decoding, achieving 40% training speedup through fused GPU operations and reduced memory-bound overhead.