Work
  • Jan2026 - Present
    Hao AI Lab, UC San Diego
    Student Researcher
    • Built a clean-room model-agnostic adaptive caching module for video-diffusion inference (residual-skip heuristic, no per-model adapters) — 22-47% latency reduction on Wan2.1 across an SSIM 0.875–0.946 quality frontier; runs on Wan2.2 MoE where the per-model baseline crashes.
    • Audited an LLM-agent GPU profiler’s 34 skills against ground-truth Nsight Systems traces (H100 + L40S workloads), landing 2 maintainer-confirmed upstream bug fixes.
    • First software port of FastVideo to NVIDIA’s DGX Spark (GB10/Blackwell): 4 models running, FlashAttention-2 built from source for sm_121, shipped Cosmos 2.5 sampling-preset fix (sharpness 92→431) and Wan VAE-precision flip (1.3× decode) upstream.
    • Cut Cosmos 2.5 inference latency 15.6% on A100 by quantizing decoded frames on-GPU before a 500MB device-to-host transfer; output-identical (SSIM=1.0), cross-validated on H100/Wan.
  • Jun2025 - Sep2025
    Amazon
    Software Development Engineer Intern
    • Built a distributed log indexing and query service over 42M+ log entries/hour using parallelized binary search, reducing incident triage latency from 15+ minutes to under 45 seconds.
    • Automated on-call SOPs using AWS Step Functions and Lambda, saving 12+ engineer-hours per week.
    • Integrated the log query service with internal diagnostic tooling via MCP, adding caching and query batching to maintain sub-2s response times under concurrent incident response load.
  • Apr2023 - Sep2024
    Aark Global
    Software Developer, AI/ML
    • Developed an async document ingestion pipeline processing 18,000+ pages/day, distributing tasks via Azure Queue Storage to a VM worker pool.
    • Implemented a read/write routing layer during datastore migration — reads from Cosmos DB replicas, writes to MongoDB primary — maintaining sub-100ms P95 latency throughout cutover.
    • Designed a full-text search pipeline ingesting scanned PDFs through OCR into indexed Elasticsearch documents, enabling sub-180ms query latency over previously unsearchable content.
  • Jun2022 - Mar2023
    Concentrix
    Data Engineer
    • Replaced sequential scrapers with Airflow-orchestrated distributed ingestion jobs streamed through Kafka, increasing throughput by 60%.
    • Migrated aggregations from batch to Kafka streaming, reducing data freshness lag from 3 days to 6 hours.