×

First Name
Last Name
Company Name
Country
CDN Province
US State
India State
AU State
Postal Code
Phone Number
Job Role
Industry
This information is associated with my:
Compliance Opt-in
Thank you!
Error - something went wrong!

Chaos Engineering: Falling over without falling over

As applications move online, and automation extends to control more of the world around us, software failures have an increasing impact on business outcomes and safety. We need to develop more resilient systems, and that can’t be left as an operational concern. Resilience needs to be architected into the application code and operability is one of the most important attributes of a resilient system. We’ve seen many examples of failures escalating as a small initial problem causes poorly designed and tested error handling code and procedures to fail in ways that magnify the problem and take out the whole system. What can we do about this? To start with, it’s a shared responsibility to build and operate systems that are observable, controllable, and resilient. With the integration of roles from DevOps practices, and the automation provided by cloud providers, we need to adapt common concepts and terminology that already exist in resilient systems design, for cloud native architectures.

Presenter:
Adrian Cockcroft, VP Cloud Architecture Strategy, Amazon Web Services

Previous Video
Chaos Engineering: Enabling collaboration with reliable realtime services
Chaos Engineering: Enabling collaboration with reliable realtime services

In this session you will hear from Canva on how they enable users to collaborate with each other by introdu...

Next Video
Chaos Engineering: Towards operational excellence
Chaos Engineering: Towards operational excellence

Once systems are designed, implemented, and tested, we come to what is arguably one of the hardest aspects ...