No fault go-arounds

In aviation, if you think there’s going to be a problem with a landing or a take-off, you stop, pull away, and try again.  There are two somewhat conflicting quotes that I love:

If you can walk away from a landing, it’s a good landing. If you use the airplane the next day, it’s an outstanding landing.
– Chuck Yeager

A superior pilot uses his superior judgment to avoid situations which require the use of his superior skill.
– Frank Borman

So the Yeager quote is funny, and I think it’s great for test pilots (in technology, development and test environments): do what you can, try it out- because the data and equipment aren’t sacred.  But if you’re Part 121, there is no room for tomfoolery. And the proof is in the pudding: we’ve had exactly two Part 121 deaths since 2009.

I work in DevOps, and I recently experienced a significant difference in tactics with a product owner who really wanted to just get the thing deployed, because $BusinessReason. However, she was taking the Yeager approach in production, and didn’t want to hear anything about what I had to say. It was a swap-type deployment, and my methodology was, “we flipped the switch, it doesn’t work out of the box- time to go-around. Flip the switch back and try it once we have a better understanding of the system.” But: the product owner was really fearful of a rollback, to the point that she termed it the “R-word”. My response is that the “S-word” (severe incident) is worse, and that we were taking unplanned and undocumented risk by playing with the system.

The point is that a “rollback” should be just like a go-around- if you’re not sure, pause and try again. So, in your organization- I encourage you to take the Borman approach for production.