SCE Tools – Implementing Security Chaos Engineering 4/4

There are numerous tools for Chaos Engineering such as Gremlin, the Chaos Automation Platform or Chaos Monkey. However, these SCE tools are of limited use for Security Chaos Engineering (SCE) because companies cannot include all relevant experiment types. Security Chaos Engineering is a relatively new approach. Therefore, there are not many tools available for enterprise implementation yet. In this blog post, we briefly discuss four potential tools – ChaoSlingr, Kirvis, CloudStrike, and AWS Fault Injection Simulator (AWS FIS) – for SCE.
ChaoSlingr was the first open-source tool for SCE available on GitHub.

UnitedHealth Group, led by Aaron Rinehart, developed this tool for Amazon Web Services (AWS).

Customization depending on the needs of each organization is possible. The SCE tool relies on Python code and lambda functions. It uses an opt-in model, meaning testers must actively select the assets for the experiment

SCE Process
ChaoSlingr (Source: Rinehart, Aaron, and Kelly Shortridge – Security Chaos Engineering (2020), p. 66)
Each experiment consists of four major components.

Generatr identifies the target environment and selects a random target with an opt-in tag. Slingr runs the experiment by applying the configured change. Trackr tracks the changes made during the experiment and sends notifications about them. The documentation file documents the experiment considering the input and output parameters for Lambda functions. Further, the tool provides sample code for a port change experiment. Maintenance and update of the tool no longer occurs.

Aaron Rinehart, along with Matas Kulkovas, has developed another tool called Kirvis .

It is also an open-source SCE tool that is made available on GitHub. They developed Kirvis specifically for Kubernetes. Each experiment is built like an application that can be inserted into a Kubernetes cluster. The language used is Go. Rinehart and Kulkovas first talked about their tool at KubeCon + CloudNativeCon in June 2022. Hence, not much information is available yet.

Kennedy Torkura et al. developed another SCE tool called CloudStrike.

Usage is applicable for both AWS and Google Cloud Platform (GCP). All components are Javascript and do not rely on specific functions. The following figure shows the high-level architecture of CloudStrike.

Cloud Infrastructure and SCE
Cloud Infrastructure and SCE (Source: Torkura – CloudStrike: Chaos Engineering for Security and Resiliency in Cloud Infrastructure, p. 8)
It consists of various components:

The chaos controller coordinates the experiments. It receives requests with the necessary parameters and passes them to the chaos manager. The chaos manager manages the attacks. It selects a subset of assets based on the desired attack intensity rules from the Fault Engine. The Fault Engine contains all the knowledge about cloud compliance, best practices, etc.. It also translates the information into actionable code. The Fault Injector implements the experiments compiled by the Chaos Manager. The Chaos Monitor monitors the progress of the experiments and is responsible for recovering the target system after experiment termination and completion. Finally, the Chaos Analyzer analyzes the vulnerabilities found and generates reports. It also forwards the results to the Chaos Controller to initiate remediation. CloudStrike is still unreleased.

In addition to all these external tools, AWS itself has released a Security Chaos Engineering service, the AWS Fault Injection Simulator.

AWS FIS is a fully managed service that injects chaos experiments into AWS systems and reports on them. The following figure illustrates how AWS FIS works.

AWS App
AWS Fault Injection Simulator (Source: https://aws.amazon.com/fis/?nc1=h_ls)
AWS has published a guide that defines all the relevant topics for implementing this service efficiently:

It includes a plan for experiments, instructions, experiment templates, a description of how to run experiments, and monitoring information. The experiment template contains all actions, targets, and stop conditions relevant to the particular experiment and its execution. While the experiment is running, the organization can track its status. However, AWS FIS cannot test every single resource. It currently focuses on experiments on Amazon EC2, ECS, EKS and RDS.

Each tool has its own advantages and disadvantages.

Kirvis offers a so far unaddressed SCE solution for Kurbernetes. However, it also focuses only on this area and not much information is available yet. CloudStrike is broadly set-up. It serves AWS as well as GCP and does not rely on specific functions in the background. However, it has not been released yet. ChaoSlingr offers a complete description including code for an experiment example offering a lot of guidance for the user. Nevertheless, it is no longer updated. AWS FIS offers a fully managed service through which companies can easily implement first experiments. It includes a detailed description of each component, tutorials for first experiments and is easy to use. However, by now, AWS FIS only focuses on specific resources of the AWS environment. Each company must individually decide, which tool is most appropriate for the own organization.

Summarizing the SCE series, we first discussed the relevance of novel security concepts. Increasing complexity in distributed systems – especially clouds – results in a lack of human system comprehension and changes the demands regarding security. Next, we discussed SCE as potential solution.

SCE aims at proactive detection of vulnerabilities through automated experimentation uncovering issues that otherwise would have remained undetected.

The differences between SCE and traditional approaches are manifold, but the most important are its continuity, automatization, and the fact that the focus lies on learning novel things about the system. In the last post, we discussed the SCE tools as well as their advantages and disadvantages for an implementation of SCE. All in all, SCE is a complex but promising approach for addressing changing security demands, which will be of great importance for cloud security in the near future.

If you are interested in more details on how A&B security experts can help establish a Security Chaos Engineering culture in your company havc a look at our SCE Program or contact us at Alice&Bob.Company!

Resources used and interesting content on this topic:

  1. Rinehart, Aaron, and Kelly Shortridge – Security Chaos Engineering (2020)
  2. https://github.com/Optum/ChaoSlingr/blob/master/README.md (last accessed 15.06.2022)
  3. https://www.youtube.com/watch?v=wLlME4Ve1go (last accessed 15.06.2022)
  4. https://www.youtube.com/watch?v=BLRb-E0G5zk (last accessed 15.06.2022)
  5. https://github.com/Optum/ChaoSlingr (last accessed 15.06.2022)
  6. https://github.com/nilement/kirvis (last accessed 15.06.2022)
  7. Torkura, Kennedy A., et al. “Cloudstrike: Chaos engineering for security and resiliency in cloud infrastructure.” IEEE Access 8 (2020): 123044-123060.
  8. Torkura, Kennedy A., et al. “Security chaos engineering for cloud services: Work in progress.” 2019 IEEE 18th International Symposium on Network Computing and Applications (NCA). IEEE, 2019
  9. https://www.youtube.com/watch?v=9uzexriaXj4 (last accessed 15.06.2022)
  10. CAST AI (2022) (https://cast.ai/blog/chaos-engineering-and-kirvis-for-kubernetes-at-kubecon-europe-2022/ last accessed 15.06.2022)
  11. Amazon Web Services (2022) (https://docs.aws.amazon.com/fis/latest/userguide/fis-guide.pdf#what-is last accessed 15.06.2022)
  12. Amazon Web Services (https://aws.amazon.com/fis/?nc1=h_ls last accessed 15.6.2022)