Skip to main content

The art of successful Kubernetes failures

In the real world, things don't always go the way you want them to. Even when you’ve designed your Kubernetes cluster and the services it hosts to be highly available, scalable, and resilient, sometimes they fail anyway. These failures, if used correctly, can help you gain a deep understanding of how your system works and can act as tools that help spread knowledge throughout your engineering community. In this session, we cover expert techniques for defining and reviewing cluster-, node-, and pod-level metrics and for watching Kubernetes-based services before they fail. You also learn how to perform post-failure analyses that drives learning and meaningful improvement.

Speaker:
Mitch Beaumont, Principal Solutions Architect, Amazon Web Services

Download presentation:
The art of successful Kubernetes failures

Resource:
Operational Insights for Containers and Containerised Applications