Work | Ayushman Singh

Sesame AI

Machine Learning Engineer, Inference·May 2026 — Present

Working on Inference and streamlining Infra.

—Setting up an automated latency benchmarking pipeline.

Capital One

Apr 2024 — May 2026

Principal Associate, Applied Research·Jul 2025 — May 2026

I lead inference optimization initiatives for production large language model systems, focusing on custom kernel development and system-level performance improvements.

—Coordinate technical efforts across Applied Research, Infrastructure, and customer organizations, bridging research innovation with production deployment.
—Led optimization work for Capital One's first customer-facing AI product, achieving 43% latency reduction and 34% throughput gains through custom kernels and serving infrastructure improvements.
—Guide senior engineers on GPU kernel development, profiling techniques, and production inference engineering.

Senior Associate, Applied Research·Apr 2024 — Jul 2025

I worked on production inference systems for large language models, developing optimization techniques that powered customer-facing AI applications.

—Set up optimized inference pipelines using Triton Inference Server and TensorRT-LLM for production deployment.
—Reduced latency by 30% for business-critical agentic application, enabling deployment of system supporting $80M in operational value through batching, prefix-cache routing, and pipeline optimizations.
—Developed custom CUDA and Triton kernels for speculative decoding, achieving 40% training speedup through fused GPU operations and reduced memory-bound overhead.