You only push once - Cross Region Replication for AWS ECR

Why setting up Cross Region Replication in AWS ECR is worth exploring, with examples of common Docker -> ECR patterns.

It's a common pattern where you need to push your docker image and make it readily available to multiple servers in multiple regions.

Patterns

Over time i've seen the following patterns take shape:

Pattern 1: Single Image/Multiple ECR per Region

Illustration of docker image pushed to each region's ECR - which is read by servers

Pros:

  • Fast instance creation times - Each server instance has fast network access to their same region ECR.
  • Explicit Architecture.

Cons:

  • DevOps Maintenance
    • Must write code that is concerned about pushing to each region.
    • More moving parts/mental load.
    • More error prone - Need to retry and babysit the deploy if the push to one region is disrupted.
  • Cost - Per region ECR billing (1 USD a month as of writing per ECR).

Pattern 2: Single Image/Single Region ECR Source

Illustration of docker image being pushed to a single Region ECR and other regions referencing the image

Pros:

  • Simpler DevOps/Mental load: Docker image is only pushed to a single ECR.
  • Cheaper: Single ECR to manage.

Cons:

  • Slower instance creation times: Due to geographical constraints affecting network speeds - instances will take longer to create/spin up. This will be more evident when autoscaling latency is crucial.

Conclusion

Pattern 1 has the advantage of quick instance creation from a docker definition due geographical redundancy, but has too many moving parts from a DevOps experience perspective

Pattern 2 from a DevOps perspective is simple to grok. However the lack of geographical redundancy might be a deal breaker - imagine if tasks in the UK region need to scale up quickly - but need to fetch the docker definition from the source region in Australia.

Single ECR/CRR - The best of all patterns

This approach takes the best of the solutions we explored above - and leverages AWS Cross Region Replication to fill in the cracks.

It's important to grok that it's a per region setting (for private and public repositories) thus we need to filter ECRs via name.

Illustration of docker image being pushed into a single source ECR - and Cross Region Replication copying it automatically to other regions

Pros:

  • Less IaC /Easy to grok (Single Region/ECR push).
  • Free: Just pay for ECR storage costs - and uses the AWS backbone for network transfer/replication,
  • Fast Replication: I personally found It takes around 15 seconds for an image to be available on the other side of the world for a 500mb image.
  • Supports Cross Account replication
    • Great for Disaster Recovery.
    • Great for multiple environments (i.e develop and production),
  • No need to manually create corresponding ECR in other regions - CRR creates them for you (although as outlined below - if using IaC, it's recommended to create them first before enabling CRR).
  • ECS handles that slight delay in replication by re-querying ECR until the image is there.

Cons

  • CRR happens behind the scenes so finding the originating region must be found in the terraform or in the AWS console by inspecting the settings of the repo.

IaC gotchas

  • It's better to explicitly create the ECRs in each region required first and then apply CRR.
    • This is because IaC may become confused and try create ECRs in the other regions for you - when you have to reference them once again (e.g referencing them for deployment code).
  • Remember that CRR is a per region setting - and you should specify CRR infrastructure code in one place.
    • I was caught out where we had CRR IaC code in two repositories. This creates an unfortunate situation where each repository's deployments overrode the CRR settings of the other.

Takeaway notes

It's definitely worth exploring CRR!

  • We found our build times were cut by 20 minutes (billable) - In our case this includes an image build for each and every region (and finally pushing).
  • Improved DevOps Experience - we didn't have to worry about each individual region painstakingly - or worry about network issues breaking things requiring a retry - we only push once - and the image becomes available in every ECR/region we required.
  • Reduced lines of code relating to multiple regions (only need to reference the source region).