Work | Raghav Kachroo

Work

Jan2026 - Present

Hao AI Lab, UC San Diego

Student Researcher
- Built a clean-room model-agnostic adaptive caching module for video-diffusion inference (residual-skip heuristic, no per-model adapters) — 22-47% latency reduction on Wan2.1 across an SSIM 0.875–0.946 quality frontier; runs on Wan2.2 MoE where the per-model baseline crashes.
- Audited an LLM-agent GPU profiler’s 34 skills against ground-truth Nsight Systems traces (H100 + L40S workloads), landing 2 maintainer-confirmed upstream bug fixes.
- First software port of FastVideo to NVIDIA’s DGX Spark (GB10/Blackwell): 4 models running, FlashAttention-2 built from source for sm_121, shipped Cosmos 2.5 sampling-preset fix (sharpness 92→431) and Wan VAE-precision flip (1.3× decode) upstream.
- Cut Cosmos 2.5 inference latency 15.6% on A100 by quantizing decoded frames on-GPU before a 500MB device-to-host transfer; output-identical (SSIM=1.0), cross-validated on H100/Wan.
Jun2025 - Sep2025

Amazon

Software Development Engineer Intern
- Built a distributed log indexing and query service over 42M+ log entries/hour using parallelized binary search, reducing incident triage latency from 15+ minutes to under 45 seconds.
- Automated on-call SOPs using AWS Step Functions and Lambda, saving 12+ engineer-hours per week.
- Integrated the log query service with internal diagnostic tooling via MCP, adding caching and query batching to maintain sub-2s response times under concurrent incident response load.
Apr2023 - Sep2024

Aark Global

Software Developer, AI/ML
- Developed an async document ingestion pipeline processing 18,000+ pages/day, distributing tasks via Azure Queue Storage to a VM worker pool.
- Implemented a read/write routing layer during datastore migration — reads from Cosmos DB replicas, writes to MongoDB primary — maintaining sub-100ms P95 latency throughout cutover.
- Designed a full-text search pipeline ingesting scanned PDFs through OCR into indexed Elasticsearch documents, enabling sub-180ms query latency over previously unsearchable content.
Jun2022 - Mar2023

Concentrix

Data Engineer
- Replaced sequential scrapers with Airflow-orchestrated distributed ingestion jobs streamed through Kafka, increasing throughput by 60%.
- Migrated aggregations from batch to Kafka streaming, reducing data freshness lag from 3 days to 6 hours.