Scroll to top
© 2019, DigitalOnUs

SRE & Chaos Engineering


SRE and Chaos Engineering - the DOU Edge

The core tenets on which SRE works are as follows

Observability

Observability

In order to conduct experiments, one must be able to have deep introspection into the functionalities of the system.

Experimentation

Experimentation

Tightly define scope, time, and duration of experiments. Choose experiments where the risk/reward ratio is in your favor.

Reporting

Reporting

Chaos Engineers should do deep dives into codebases to determine sources of problems and work with engineers to fix problems and increase reliability.

Culture

Culture

Chaos engineering, like DevOps, is a cultural paradigm shift that provides incentives for engineers to design systems with reliability in mind.

Reliability is key

DigitalOnUs is proud of enhancing the delivery standards to the next level of Site Reliability Engineering: Chaos Engineering.

Chaos Engineering brings in a massive paradigm shift with the design focus shifting to reliability as the key quotient, in comparison to systems that merely perform routine tasks.

Chaos Engineering increases reliability and uptime by surgically attacking the infrastructure to detect weak spots, thereby increasing resilience to service degradation. 

This is a notch higher than the conventional approach of typical Incident Response – Prevention lifecycle. Experiments are run, data is collected, and fixes are made. Instead of hoping that disaster recovery and failover work as expected, Chaos Engineering actively tests assumptions, clarifying what works and what does not, during outages.