Handling unforeseen issues

The unpredictable happens. Whether it is a production outage or priorities of a dependent team shifted causing a rippling effect on your deliverables, your next steps are important to building trust and providing a correction.

As soon as you notice an issue, take a moment and gather information. Get the facts by understanding what is happening and why. Estimate the blast radius, or the reach, of this issue. What groups of people are impacted — internal users, a subset of customers, all customers, dependent services? How does this degrade the customer experience? Is revenue being lost?

With a clearer picture, assess your options, determine priority, and work as a team to create actionable next steps. If you haven’t already, inform your project lead. If you are a lead engineer, notify stakeholders as soon as possible and prepare options. “We ran into an issue with X. Can we jump on a call to discuss?” As a leader, you should be able to navigate obstacles, document workarounds, and articulate trade-offs. Have a recommendation and provide evidence to support your reasoning. Chatting with your stakeholders will prevent surprises and will establish trust. Be transparent and work together to institute a path forward that functions with your timeline and the business requirements.

Follow up with team members continuing to execute the agreed-upon steps to the end result. Maintain close ties with stakeholders informing them of updates. Transparency can’t be stressed enough. Once the issue is resolved or stabilized have a retrospective. The sooner the team is in a position to have one the better. The issue should still be fresh in their minds. Even if the issue was minor take a moment with yourself and at least one other person to review the timeline of events from discovery to resolution. Capture all feedback. Find opportunities to streamline processes, detail logging, refine metrics, improve alerting, implement backups, polish documentation or playbooks, and so forth. We can learn so much from failure. Catalog the issue detailing the events from start to finish and your retrospective. This is an incredible learning opportunity that pays dividends for yourself and your team. Allowing new team members to read about past issues provides better context to understand where you are today and what still needs to be done.

Going forward prepare for the worst, correct weaknesses in your systems, and add redundancy to limit the chance of future unforeseen issues. Be transparent and allow others to learn from past mistakes.

Software Engineer / Architect. Opinions are my own. https://www.linkedin.com/in/philtobias/