I'm a software engineer drawn to operational problems — the kind where something is slow, manual, or breaking under load. Most of my work has been finding those points and building systems that fix them structurally: faster log search, automated on-call workflows, distributed ingestion pipelines. Lately that's been pulling me toward AI systems specifically — where the same reliability and latency problems show up, but the infrastructure is less mature.
Most of my recent work is the same shape: profile, find the friction point, ship a measured fix. The friction is usually a deferred cost being misattributed, a hidden default nobody benchmarked, or a tool whose own findings are wrong. Recent examples at the Hao AI Lab on FastVideo (open-source video-diffusion framework): 15.6% off Cosmos 2.5 inference latency (a "slow stage" was absorbing a deferred half-gigabyte GPU→CPU transfer), 22-47% off Wan2.1 via a model-agnostic adaptive caching module that has no per-model adapters, and the first software port of the framework to NVIDIA's DGX Spark (GB10/Blackwell). Same instinct outside the lab — a Go log indexing engine sustaining 42M rows/hour with sub-microsecond lookups, and an LLM-powered anomaly detection system that scores its own outputs against labeled HDFS data.