MENU

Extending AWS Backup with off-site capabilities

05.05.2020

AWS Backup is a convenient tool to manage backup life-cycles. Extending this with AWS Lambda grants even more features, resiliency and security.

AWS Backup is a very convenient service which allows customers to easily manage backups in one unified tool for RDS Snapshots, EBS and EFS volumes, as well as DynamoDB tables. It does not even need any setup, except for the definition of your intended backup plans.

In this article, we want to highlight how to utilize AWS Backup for the heavy lifting and extend it where necessary to accommodate your specific additional requirements.

The basis is a backup plan which has to cater to different requirements and purposes regarding backups, e.g.:

1. Ad-hoc backups prior to potentially dangerous operations (e.g. major upgrades) – easily done with an on-demand backup

2. Frequent and short-lived backups to minimize losses in daily operations

3. Long-lasting backups (often even legally mandated depending on your industry with set retention times)

4. Off-site backups to decouple risks between source systems and backups

Most of these goals can be directly addressed with AWS Backup, but some requirements might need additional support. This can either be caused by some missing features or simply because of some internal or external policy mandating distinct requirements regarding your backups.

Laying the foundation

In order to get started, you simply have to create a backup plan and assign resources to it. The backup plan does define how a backup should be done, how often and in which time window. In addition, it also offers to specify when to expire the backups. Using this life-cycle management it allows you to directly address the second and third requirement as stated above. One plan can be used to trigger daily and expire backups quite quickly – and a second plan can be used to create a monthly backup which could be stored for several years. Both plans can and should be used collaboratively.

5 year retention plan example

Which plans get executed is expressed using assignments. For testing and PoC purposes, you can manually add a resource to a plan. In all other environments, this will not be sufficient. Instead AWS Backup can assign resources to plans by using tags which is much easier and safer to implement at scale. For instance, if your backup requirements mandate to keep production database backups for at least five years, you would create a plan which expires DB backups after 5y and create an assignment which subjects all RDS and EC2 instances to this plan which are tagged with ‘key: environment, value: production’. This assignment does not have to be exclusive to one plan. You can copy this assignment to another backup plan which could implement the frequent and short-lived backups in parallel.

Declarative resource assignment

Handling off-site backups is more difficult and might have some very specific requirements. Since January 2020 AWS Backup has the option to copy backups into different regions. To enable this, simply select a target region in your backup plans. This feature was long missed and is exceptionally useful to establish an off-site backup. Having all relevant data in another region enables a much quicker recovery in case of your main region suffering a total outage. Simply restore your database in your standby region and reroute your traffic. But stating this as an off-site backup is not the entire truth. Although it does establish independence regarding location, it does not decouple the backup from the source account. Backups can be protected using vault policies – however, should the (root) account be compromised even a backup in another location could be in danger.

Unfortunately, there is no option available to share a backup with another account yet. But this will not stop us from doing it anyway.

Going beyond

(To understand the following section – a separated account is meant to be an account which is able to access relevant accounts of yours but can not be interacted within the other direction. Think of your IT Security department or a contracted auditing company. Any admin user of a source account should never be able to interact with the separated account in order to establish this separation)

Using RDS as an example, we will outline the steps necessary to extend AWS Backup with a possibility to export snapshots to a separated account. Normally you can not interact with snapshots in AWS Backup directly, they can only be restored to a database instance where you could then run an export or dump job manually. They are, however, visible in the RDS console as regular snapshots for some time after creation. Working with a regular snapshot at this point, we can leverage all the features and integrations from RDS directly.

First we have to setup a trigger to react on RDS snapshot events by invoking a Lambda function. This can be done using either RDS Event Notifications or Cloudwatch Event Rules. We found the latter to be easier to work with in automation, but the former to be more human-friendly.

In order to be able to export these snapshots, we have to do some preparations.

First, we have to set up two KMS encryption keys, one dedicated to exporting and one for the separated account for importing. This has several reasons:

  1. You can not share an encrypted snapshot which uses the RDS default key
  2. Using a dedicated key for exporting allows to formulate a much more narrow level of needed privileges (esp. important when granting other account access)
  3. The separated account should specify an import key to enable and secure access independently from the source key after the snapshot has been imported.

Second, we need appropriate access permissions. One feature of AWS Backup does enable a lean definition of permissions for both the RDS snapshots as well as the KMS key. We’ll use the recovery-point-tagging feature to apply a specific tag to all backup snapshots.

Tag all backups with a specific encryption key ID.

Having all backup snapshots automatically tagged with this key-value pair, we can nail down the permissions to modify, share, copy and cleanup snapshots to only ones with these exact tags. An example IAM permission statement could look like this:

            "Effect": "Allow",
            "Action": [
                "rds:ModifyDBSnapshotAttribute",
                "rds:CopyDBSnapshot"
            ],
            "Resource": "*",
            "Condition": {
              "StringEquals": {
                "rds:snapshot-tag/encryption_key": "123456ab-abcd-.."
              }
            }

Now a Lambda function will be restricted to only interact with snapshots marked with the KMS ID of the export key. It can now carry out the copy and sharing operation (where sharing means granting other accounts the permission to create a copy themselves).

Unfortunately, the RDS/Cloudwatch events used for indicating the creation of a snapshot do not trigger on snapshot copy creation. The copy operation is non-blocking, meaning the Lambda simply issues the copy command and continues. But before we can continue with sharing and importing the snapshot on the separated account we must check the copy progress and only proceed once finished. Depending on the size of your databases, you can either pause and wait (which increases the run-time costs of your Lambda) or you can simply trigger another Lambda function periodically throughout the day to scan for finished copies (which leads to delay).

Once the copy is done you share it by specifying the account ID of the separated account in the modify_db_snapshot_attributes API call. This conveniently allows the authorized account to create a copy for this specific snapshot without the need to formulate IAM permissions about the entities inside the separated account (which you might not even be allowed to know about). You do not need to specify an IAM statement for sharing as you needed with the export copy creation, only to be allowed to modify the list of allowed accounts for sharing.

For importing this export snapshot, the Lambda function in the source account notifies the separated account that the export snapshot is available. This can be easily set up with SNS and SQS. Another Lambda function in the separated account can now attempt to import the snapshot by simply creating a local copy. Remember that the KMS key must also allow the decryption for the Lambda execution role in the separated account, as well as the permission to use the local import key. Once the copy is finished you should now have a manual snapshot copy in your separated account.

From this point on you have securely extended AWS Backup with the ability to automatically export RDS Snapshots into a separated backup-account while letting AWS Backup do the heavy lifting of backup and life-cycle management in the source account. From there on you can utilize and store the backups in your off-site account as you seem fit or are required to do. For example, you could export the snapshot data to S3 (in some regions) where you could enable your legal department or auditors to run ad-hoc queries using Amazon Athena on historic database copies.

To really run this in production environments, you should also deploy monitoring, alerting and verification for the backups, but this is a topic for another blog post.

Log in