Profile Picture
Learn from software development experts. Listen here!

Incident Response with Emil Stolarsky

Preparedness and Enabling Accurate and Quick Incident Responses

As a system becomes more complex, the chance of failure increases. At a large enough scale, failures are inevitable. Incident response is the practice of preparing for and effectively recovering from these failures.

In this interview Emil argues that we need to move beyond tribal knowledge and incorporate practices such as an incident command system and rigorous use of checklists. Emil suggests that we need to move beyond a mindset of “move fast and break things” and toward a place of more deliberate preparation.

Podcast Transcript

“But the reality it turns out is that humans, while we might be good at solving these complex problems, we’ll often forget the basics, or we’ll often forget something that’s easily overlooked, but it’s really important to the recovery.” – Emil Stolarsky

“The only way we can fight those biases and do an effective analysis of what went wrong is by having other people point them out.” – Emil Stolarsky

“Because postmortems retrospectives are super valuable. You don’t want to repeat the same mistake.” – Emil Stolarsky

Links:

Strange Loop

SRE Handbook

The Power of Scriptable Load Balancers

Leave a Comment

Your email address will not be published. Required fields are marked *

Name

E-Mail

Website

Comment

Subscribe / Listen Here

Incident Response with Emil Stolarsky