IAM a Mess
Often times when looking through CloudFormation example templates online, we will tend to notice that IAM roles and policies are coded alongside the resource they are attached to, embedded into the same template.
In most circumstances, that is perfectly fine – having IAM resources in the same CloudFormation template as your resources is a clean and quick way to identify resources that these IAM roles & policies are used by.
However, is this approach scalable? When we are creating an infrastructure to support & cater to multiple applications, can we centralise these IAM roles and policies?
IAM Identifying a Crisis
Here at a government client, the model that was taken as the part of a migration project – migrating applications from a private hosting platform onto AWS – was centred around the applications.
Each application had its own CloudFormation stack that incorporated all the resources, such as ECS, RDS, CloudWatch etc, alongside IAM Roles & Policies that were defined to manage permissions for these resources; specific to the application.
The applications were quite similar in the way they were architected on AWS. They all required ECS. The docker image for the container was stored in ECR. CodePipeline was deploying CloudFormation templates across all the different environments in the exact same process.
The applications were also similar when deploying the same IAM role & policies.
From a security perspective, that is perfect. Each application has been given a granular set of permissions that allow it access to AWS services, and nothing more.
However, across these applications, we noticed that there was one key problem.
The policy statements were exactly the same.
The only difference between these roles and policies were in how they were named – application name was added as a prefix to the role/policy name and that was it!
Resources that were shared across applications were not collated together to be imported from the management layer and eventually we had reached a hard service limit set by AWS. A limit to the number of policies that can be attached per role – 20.
This approach was not scalable. It meant that only a maximum of 20 applications can be hosted in this AWS estate if we carried on attaching one user policy per application for just one role.
The first step we took was to identify a fairly simple template as our starting point and identify how the roles and polices were constructed, and how they were used.
Image: A typical pipeline template’s IAM resources
We took a look at a template that creates a CI/CD pipeline using AWS CodePipeline and AWS CodeBuild. In each template, a CodePipline and CodeBuild service role was created and their respective policies. We compared these roles and policies across another application and it was quite simple to notice a few key points:
- Roles and policy names were set with the application name.
- The permissions were exactly the same for the service + user policies.
Perfect! The pattern has been quite easily noticed as just the naming of the roles and policies being the dominant difference factor between the templates. The permissions for the policies were kept the same and consistent throughout all of the other applications.
A quick check on the IAM management console for UserDeveloper role confirmed that each policy was, indeed, exactly the same, save the name of the policy.
Image: One role’s policies
We’ve identified what can be extracted from each of these templates:
- Service roles
What’s left is now to identify a new “home” to house these roles and policies and generalise them. I.e strip away the “application name” from the name.
Since these applications are deployed from CodePipeline, which is currently done through the management layer, it made perfect sense to create a new IAM specific stack for these IAM roles and policies.
Image: New implemented solution of inheriting roles and policies from shared services account
This one role for each AWS CodePipeline and AWS CodeBuild and one policy for each plus one policy for the UserDeveloper role now covers all the pipelines needs for using the correct permissions and role.
Next step was to remove the above roles and policies from the pipeline templates across all applications!
We have a number of benefits that have arisen from this optimising task.
- No more duplicated code – saving about 100-150 lines for generating a policy statement in YAML.
- Centralised service and user roles with policies deployed as a separate stack onto the management layer
From this optimisation task, it can easily be seen how we can extend this to the other service roles, such as ECR, EC2/ECR and S3 and implement the same solution!< Back