Blog
2026
- From 15 minutes to 45 seconds: rebuilding an Amazon on-call tool in GoHow I rebuilt a production log triage system — binary search over a sorted index, a channel-based ingestion pool, a write-ahead log for crash recovery, and what I'd do differently at scale.
- Building Flare: LLM-Powered Incident Detection on Real Log DataHow I built an end-to-end log anomaly detection pipeline that combines classical ML with LLM summarization — and what I learned about evaluating LLM output in production-adjacent systems.