A distributed systems reliability glossary

A Crowd-Sourced Reliability Glossary Antithesis have opened an A-to-Z of distributed-systems reliability terms—concise definitions of “Byzantine fault,” “gray failure,” “coordinated omission,” “write amplification,” and dozens more. Each entry cites a canonical paper and tags the relevant failure mode so post-mortem debates can skip the semantics and jump straight to root cause. Even better, the repo accepts pull-requests, letting ops teams contribute new jargon as it appears. #distributedsystems

A list of key concepts for building and testing reliable distributed systems, with basic definitions and deep references.