CloudFormation is the AWS native way of describing Infrastructure as Code, it is a simple and easy way to deploy almost all types of resources while AWS takes care of their lifecycle. For example, we can easily create, delete or update an EC2 instance by changing properties in a CloudFormation template. Unfortunately, in some circumstances, AWS resources cannot be deleted by CloudFormation, an S3 Bucket that contains objects for example.
At AL, we like to think we go above and beyond for our clients. With this in mind, when a client team is having an issue, our engineer’s curiosity comes into play and they will always do everything they can to find a resolution. A number of internal teams were extensively using CloudFormation and were re-writing the same code across the company to clean up their resources. This manual effort was introducing inconsistencies so the teams asked Automation Logic to develop a better system to solve this issue.
One of the most common development workflows used by developers is gitflow. This is based on feature branches and the client teams run tests for each one of them independently by deploying a CloudFormation stack for each branch and running a suite of tests before merging. This can be used to prevent broken code from being merged or to allow developers to have their own stacks so they can develop without any entropy from other developments on shared infrastructure.
So far this is a well-adopted workflow but there are 3 edge cases:
- S3 Buckets can only be deleted automatically by CloudFormation when they are completely empty (including old versions when versioning is enabled)
- CloudFormation Stacks deployed by the CodePipeline are not implicit dependencies of the pipeline or of the Stack which deploys the pipeline.
- CloudWatch GroupLogs are also not deleted by CloudFormation or they can be created automatically by some resources without any implicit dependency to any stack.
To solve this, multiple project teams created custom resources within their CloudFormation templates so they could clean up all these resources left behind. Initially, this was a snippet of code in 1 or 2 projects running in a lambda function. Nowadays it’s in dozens of CloudFormation templates. However, there are still a few drawbacks:
- After duplicating this code in dozens of projects, which may potentially include some code changes, they end up with several versions of the code to maintain.
- New projects need to find which version is better for them, some features are not available in all versions, for instance, delete versioned S3 buckets.
- Patching or fixing any possible issues can’t be done easily across all projects, the most common use case is to upgrade the runtime version of a lambda function (we need to go through all these templates and update each one individually as well as testing them individually)
- No cross-account capabilities to, for instance, delete stacks deployed in other accounts.
To begin with, we reviewed several projects and spoke with multiple teams across the company to get as broad a view as possible, specifically those starting new projects, to find what could be done to help them.
We came up with a few requirements:
- The solution must be easy to use. If possible minimising it to one line of code to mark that the resource needs to be deleted.
- Prevent mistakes/deletions in production environments.
- No extra resources in their templates.
- Easy to patch or upgrade.
Based on these requirements, we started experimenting. First, we combined all the versions available and created a Serverless App. Then we shared our work with one of the client’s project teams who requested this work to get feedback as we went. After this we were able to distribute the feature and since we started with a Serverless App, AWS SAR was the most obvious solution.
When we planned to share this amongst everyone, the initial plan was to deploy the lambda function in one central account and allow everyone to invoke it. This would make it easier to manage and upgrade however it would require the central lambda function to have access to every account and that could be exploited to delete resources in other accounts. We solved this by adding a new requirement to our list: The lambda function needs to rely on IAM Roles & policies already in place and any custom security logic should be avoided.
After this experimentation, we realized that the best way would be to deploy the lambda function in every single account and for that, we could leverage AWS CloudFormation StackSets.
It’s worth pointing out again that shared components can also be exploited to bypass security, in this shared component we had to rely on IAM roles and their trust relationships. The best advice is to rely on policies setup in AWS and not try to implement them in any shape or form in the code itself.
Extending cloudformation in one template is simple and straightforward using lambda functions although maintaining it in a big organization can be harder.
AWS SAR is a good way to share lambda functions across the whole organization, to have good code governance and allow every team to have the freedom to use shared components as well as contribute to them.
The whole journey of writing and sharing this functionality also helped us to provide better support and new ideas for shared components emerged, for instance, code to handle webhooks from GitHub which was used by several teams.< Back