Alex has just written a refactoring of some website backend code. Since it was a small task, it's committed and Alex moves on to the next feature. When the code is deployed in production two weeks later it causes the entire site to go down. A one-character typo which was missed by automated tests caused a failure cascade reminiscent of the bad-old-days at twitter. It takes eight hours of downtime to isolate the problem, produce a one character … Read More
via Timothy Fitz
Advertisements