Scaling

This page discusses horizontal scaling to meet increasing demands. Specifically, the horizontal scaling of ESS services and worker nodes in the Kubernetes cluster.

The page does not cover scaling services such as Kafka, Zookeeper and PostgreSQL Metrics. The reference deployment relies on AWS managed services for these functions. Refer to AWS documentation on how to scale the services.

Horizontal Scaling of ESS

Horizontal scaling of ESS involves adding more worker nodes (VMs) to run additional instances of ESS services; for instance, having 2 instances of LDP service, each running on a separate server, instead of a single LDP service running on one server.

In contrast, vertical scaling adds more resource (e.g., more CPU, memory, storage space, etc.) to your infrastructure, such as by replacing an existing server with one with more resource. Vertical scaling is limited by the restrictions on available technology, such as the amount of CPU that can be added to a single server.

Although horizontal scaling can be more architecturally complicated than vertical scaling, in general, horizontal scaling has fewer limitation on growth than vertical scaling.

Scale Services Independently

With ESS, you can scale each service independently of each other. For example, you can have 3 instances of the LDP service and 1 instance of the OIDC Broker.

Stateless Services

Horizontal scaling of ESS requires all user-facing ESS services (LDP, OIDC Broker) be stateless. That is, these services are only aware of their current request. They do not depend on or maintain information on earlier requests, and a user’s requests to a given service do not need to route to the same instance of that service.

Scale ESS Components

Load Balancers

Elastic Load Balancers (ELBs) sit in front of all ESS services and evenly balance the traffic being sent to the proxy servers.

Select Load Balancer Routing Strategy

To benefit from having multiple instances of a service, requests should route to the available instances in a balanced manner. For the ESS deployment, a round robin strategy is sufficient.

A round robin strategy is where each instance is routed a request in a rotating, sequential manner. For example, with 3 instances of the LDP service, a round robin strategy sends the first request to instance 1, second request to instance 2, third request to instance 3, and then cycles back to send the fourth request to instance 1, etc.

Add Multiple Load Balancers

Having stateless services simplifies request routing since a user’s requests do not have to be routed to the same instance of a service. As such, ELBs can generally handle large spikes in traffic without provisioning additional load balancers.

The reference AWS deployment assumes that a single load balancer can handle all of the traffic coming in to the ESS cluster.

If, however, this is insufficient, you can launch multiple load balancers. Multiple load balancers are themselves balanced by having multiple DNS entries.

Worker Nodes

In the reference AWS deployment, the capacity of an ESS service is determined by the EC2 worker nodes. The worker nodes are managed by AWS Autoscaling Group (ASG). The ASG attempts to keep a desired number of EC2 instances running. If the CPU usage for any worker node becomes too high, the ASG adds more worker nodes. If the CPU usage becomes too low, the ASG terminates some worker nodes.

Adjust the Number of Worker Nodes

In the AWS reference deployment, the AWS worker nodes are controlled by two different AWS Autoscaling Groups (ASGs):

  • an “ess” group for the public facing services, and

  • a backend group for everything else.

These groups are defined with the following entries in the deployment/infrastructure/aws/eks.tf Terraform file:

# Auto-scaling will handle adding more (or taking away) EC2 instances as EKS
# needs more or less worker nodes.
#
# This ASG handles capacity for the ESS containers
#
resource "aws_autoscaling_group" "eks-worker-nodes-ess" {
  name                 = "${local.resource_prefix}-eks-worker-nodes-ess"
  min_size             = local.eks_ess_cluster_worker_node_min_count
  desired_capacity     = local.eks_ess_cluster_worker_node_desired_count
  max_size             = local.eks_ess_cluster_worker_node_max_count
  launch_configuration = aws_launch_configuration.eks-worker-nodes-ess.id
...

# This ASG handles capacity for the backend containers
#
resource "aws_autoscaling_group" "eks-worker-nodes-backend" {
  name                 = "${local.resource_prefix}-eks-worker-nodes-backend"
  min_size             = local.eks_backend_cluster_worker_node_min_count
  desired_capacity     = local.eks_backend_cluster_worker_node_desired_count
  max_size             = local.eks_backend_cluster_worker_node_max_count
  launch_configuration = aws_launch_configuration.eks-worker-nodes-backend.id
...

Each group has the following settings related to the number of nodes:

  • desired_capacity, i.e., desired number of nodes.

  • min_size, i.e., the minimum limit. The ASG will not decrease beyond this minimum number of nodes.

  • max_size, i.e., maximum limit. The ASG will not increase beyond the maximum limit of nodes.

These settings are set to local variables defined in the deployment/infrastructure/aws/configure.tf Terraform file:

# Kubernetes Configuration/Size
eks_ess_cluster_worker_node_size               = "t3.2xlarge"
eks_ess_cluster_worker_node_min_count          = 1
eks_ess_cluster_worker_node_desired_count      = 2
eks_ess_cluster_worker_node_max_count          = 20
eks_backend_cluster_worker_node_size           = "t3.medium"
eks_backend_cluster_worker_node_min_count      = 1
eks_backend_cluster_worker_node_desired_count  = 2
eks_backend_cluster_worker_node_max_count      = 4

To change the ASG settings, modify the variable values in the deployment/infrastructure/aws/configure.tf Terraform file.

In addition to the min/desired/max count of AWS worker nodes, you can also adjust the size of the EC2 instances created. This can be used to increase the memory or CPU capacity of any one given node in the cluster.

Note

It is possible to remove all of the Autoscaling Group logic, manually provision each EC2 instance, and have them join the Kubernetes cluster. Doing so would allow the operator complete and exact control over how many worker nodes are in the cluster, but would require manual intervention any time the capacity needed to change (including disaster recovery if a node were to be terminated unexpectedly).

Scale ESS Services

While the AWS Autoscaling Groups (ASGs) adjusts the number of AWS worker nodes available to the Kubernetes cluster, the number of actual services launched in the cluster is controlled by Kubernetes configuration.

In the AWS reference deployment, Kubernetes configuration files for the services can be found in the deployment/kubernetes/aws/05_deployments directory. For example, the ldp-deployment.yaml file contains the following configuration:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: ess-ldp
  namespace: ess
spec:
  replicas: 1
  selector:
    matchLabels:
      app: ess-ldp
...

The replicas configuration value determines the number of LDP instances in the cluster. For example, the replicas: 1 indicates that the deployment should launch a single instance of the LDP service in the cluster.

Manually Adjust Number of Instances

To manually adjust the number of instances, modify the replicas value in the configuration file and re-apply to the cluster.

Dynamically Adjust Number of Instances

To manage the service capacity dynamically, i.e., have the services scale up or down without manual intervention, you can use a K8 autoscaler.

The K8 autoscaler is a deployment that monitors the other deployments/clusters in the cluster. Based on its monitoring, autoscaler requests the control plane to scale up or down the number of replicas in those deployments.

The K8 autoscaler requires a Metrics Server to capture metrics from the other containers.

In addition, the K8 autoscaler requires configuration that instructs when services should be scaled up or down. In the AWS reference deployment, the K8 autoscaler configuration can be found in deployment/kubernetes/aws/06_autoscale/horizontal-autoscaler.yaml:

apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
  name: ess-ldp
  namespace: ess
spec:
  maxReplicas: 30
  minReplicas: 2
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: ess-ldp
  # Average CPU usage across all LDP instances in the cluster.
  targetCPUUtilizationPercentage: 40
  • The spec.scaleTargetRef.name tells the autoscaler to monitor the statistics of the LDP deployment.

  • THe spec.targetCPUUtilizationPercentage indicates that if the average CPU usage across all of the LDP instances rises above 40%, the autoscaler can scale up the number of instances to meet demand. If the average CPU usage falls below 40%, the autoscaler should scale down the number of instances.

Note

This dynamic capacity is still dependent on the worker nodes available.

A request for additional services may initially be ineffective if there are no worker nodes available to schedule the additional service. However, the control plane will continue to try to schedule the service and should succeed once the AWS Autoscaling Group determines that additional capacity is needed and adds additional AWS worker node(s).

The above strategies outline mechanisms to both manually and automatically provision and maintain compute capacity for an ESS cluster in AWS.

Adjusting Container Scheduling

Generally, the default scheduling mechanisms for your Kubernetes cluster should be sufficient to work with the above autoscaling strategies. However, to help ensure that the scheduler does not place too many containers on the same underlying worker node, you may occasionally need to be explicit about the amount of resources a container needs.

To adjust these settings in the AWS reference deployment, modify the resources settings in the deployment configuration files; e.g., deployment/kubernetes/aws/05_deployments/ldp-deployment.yaml

resources:
  limits:
    cpu: "1200m"
  requests:
    cpu: "800m"

where cpu: "1200m" is “1200 millicpu” and cpu: "800m" is “800 millicpu”. For details, see Kubernetes Documentation: Managing Resources for Containers.