Sharing Our Passion for Technology

& Continuous Learning

Save Money by Scaling Off Hours

By Paul Rowe | September 06, 2023 | Development

Source Allies, like many organizations, has several AWS environments. In addition to production we also have a dev and a qual environment. An application deployed to all three environments will be running three copies of its infrastucture. If that architecture includes RDS databases, EC2 instances, ECS tasks, or other compute then we will be billed for each minute those services are running. Since our team isn't using these environments unless we are actively testing things then this is a wasted expense that could account for two-thirds of a projects overall AWS spend.

Creating a scheduled job to stop these resources during off-hours isn't a new idea. Generally this involves a Lambda that has a bit of code to make the appropriate AWS calls. Instead, since Step Functions has added support for calling almost any AWS API natively, we can leverage a State Machine to shut down our database. In Cloudformation it looks like this:

AWSTemplateFormatVersion: "2010-09-09"
Transform: AWS::Serverless-2016-10-31
Parameters:
  ScaleDownOffHours:
    Type: String
    Default: "false"
Conditions:
  ConfigureScaleDownOffHours: !Equals [ "true", !Ref ScaleDownOffHours ]
Resources:
  ...
  ScaleDownOffHoursStateMachine:
    Condition: ConfigureScaleDownOffHours
    Type: AWS::Serverless::StateMachine
    Properties:
      Definition:
        StartAt: ScaleDown
        States:
          ScaleDown:
            Type: Task
            Resource: "arn:aws:states:::aws-sdk:rds:stopDBCluster"
            Parameters:
              DbClusterIdentifier: !Ref DatabaseCluster
            End: true
  ...

We're using a AWS::Serverless::StateMachine rather than a AWS::StepFunctions::StateMachine. This configuration leverages the serverless transform and inlines some additional requirements to get this to run on a schedule. First, we need to create an IAM Role that gives the State Machine permission to stop the database. We can do that by adding a Policies property to the resource and the Serverless transform will expand it into a full Role at deploy time:

  ScaleDownOffHoursStateMachine:
    Condition: ConfigureScaleDownOffHours
    Type: AWS::Serverless::StateMachine
    Properties:
      ...
      Policies:
        - Version: '2012-10-17'
            Statement:
            - Effect: Allow
                Action:
                - rds:StopDBCluster
                - rds:StartDBCluster
                Resource:
                - !GetAtt DatabaseCluster.DBClusterArn

We want to scale down every day at 5 PM Central. We can add an Events property and the transform will expand that into other resources. Those resources will kick off the statemachine on the appropriate schedule.

ScaleDownOffHoursStateMachine:
  Condition: ConfigureScaleDownOffHours
  Type: AWS::Serverless::StateMachine
  Properties:
    ...
    Events:
      ScaleDown:
        Type: ScheduleV2
        Properties:
          ScheduleExpressionTimezone: America/Chicago
          ScheduleExpression: "cron(0 17 * * ? *)"

If we stop here, we have a single resource we can add to out template that is able to automatically shut down the database every day at 5PM. Additional states can be added to the state machine to stop other resources as well (such as an EC2 instance). One downside to this approach is that our enviroment is never started back up, we would have to do that manually. We can modify the definition of our state machine to actually start resources as well. Replace the Definition element with:

ScaleDownOffHoursStateMachine:
  Condition: ConfigureScaleDownOffHours
  Type: AWS::Serverless::StateMachine
  Properties:
    ...
    Definition:
      StartAt: DetermineDirection
      States:
        DetermineDirection:
          Type: Choice
          Choices:
            - Variable: "$$.Execution.Input.source"
              StringEquals: aws.scheduler
              Next: ScaleDown
          Default: ScaleUp
        ScaleUp:
          Type: Task
          Resource: "arn:aws:states:::aws-sdk:rds:startDBCluster"
          Parameters:
            DbClusterIdentifier: !Ref DatabaseCluster
          End: true
        ScaleDown:
          Type: Task
          Resource: "arn:aws:states:::aws-sdk:rds:stopDBCluster"
          Parameters:
            DbClusterIdentifier: !Ref DatabaseCluster
          End: true

This definition will start the database if the state machine is not triggered by the scheduled event (such as manually). Let's go even further by adding an event to start the database whenever we deploy a new version of our application:

ScaleDownOffHoursStateMachine:
  Condition: ConfigureScaleDownOffHours
  Type: AWS::Serverless::StateMachine
  Properties:
    ...
    Events:
      ...
      ScaleUp:
        Type: EventBridgeRule
        Properties:
          Pattern:
            source: [ "aws.cloudformation" ]
            account: [ !Ref AWS::AccountId ]
            detail-type: [ "CloudFormation Stack Status Change" ]
            detail:
              stack-id: [ !Ref AWS::StackId ]
              status-details:
                status: [ "UPDATE_IN_PROGRESS" ]

This event actually listens for the current stack to go into "UPDATE_IN_PROGRESS" state and starts the database in response. It isn't a synchronous operation so it will still take a few moments before the application is usable.

This is just a sample of some of the ways to manage your non-production infrastructure. State machines are flexible enough that all sorts of innovative combinations can be supported. You could even setup a Wait state to automatically shut down things a certain amount of time after they are deployed. Take a look at the complete template on our Github repository.

& Continuous Learning

Save Money by Scaling Off Hours

Recent Posts

The Iterative Migration

GenAI in Production: Avoiding the POC Purgatory

From Manual Checks to Automation: Building a Slack Bot for Time Entry Reminders

Stop debating. Run the experiment.

Seeking experts to accelerate delivery & elevate your team?