Scaling#
This page discusses horizontal scaling to meet increasing demands. Specifically, the horizontal scaling of ESS services and worker nodes in the Kubernetes cluster.
The page does not cover scaling services such as Kafka, Zookeeper and PostgreSQL Metrics. The reference deployment relies on AWS managed services for these functions. Refer to AWS documentation on how to scale the services.
Horizontal Scaling of ESS#
Horizontal scaling of ESS involves adding more worker nodes (VMs) to run additional instances of ESS services; for instance, having 2 instances of LDP service, each running on a separate server, instead of a single LDP service running on one server.
In contrast, vertical scaling adds more resource (e.g., more CPU, memory, storage space, etc.) to your infrastructure, such as by replacing an existing server with one with more resource. Vertical scaling is limited by the restrictions on available technology, such as the amount of CPU that can be added to a single server.
Although horizontal scaling can be more architecturally complicated than vertical scaling, in general, horizontal scaling has fewer limitation on growth than vertical scaling.
Scale Services Independently#
With ESS, you can scale each service independently of each other. For example, you can have 3 instances of the LDP service and 1 instance of the Solid OIDC Broker.
Stateless Services#
Horizontal scaling of ESS requires all user-facing ESS services (LDP, Solid OIDC Broker) be stateless. That is, these services are only aware of their current request. They do not depend on or maintain information on earlier requests, and a user’s requests to a given service do not need to route to the same instance of that service.
Scale ESS Components#
Load Balancers#
Elastic Load Balancers (ELBs) sit in front of all ESS services and evenly balance the traffic being sent to the proxy servers.
Select Load Balancer Routing Strategy#
To benefit from having multiple instances of a service, requests should route to the available instances in a balanced manner. For the ESS deployment, a round robin strategy is sufficient.
A round robin strategy is where each instance is routed a request in a rotating, sequential manner. For example, with 3 instances of the LDP service, a round robin strategy sends the first request to instance 1, second request to instance 2, third request to instance 3, and then cycles back to send the fourth request to instance 1, etc.
Add Multiple Load Balancers#
Having stateless services simplifies request routing since a user’s requests do not have to be routed to the same instance of a service. As such, ELBs can generally handle large spikes in traffic without provisioning additional load balancers.
The reference AWS deployment assumes that a single load balancer can handle all of the traffic coming in to the ESS cluster.
If, however, this is insufficient, you can launch multiple load balancers. Multiple load balancers are themselves balanced by having multiple DNS entries.
Worker Nodes#
In the reference AWS deployment, the capacity of an ESS service is determined by the EC2 worker nodes. The worker nodes are managed by AWS Autoscaling Group (ASG). The ASG attempts to keep a desired number of EC2 instances running. If the CPU usage for any worker node becomes too high, the ASG adds more worker nodes. If the CPU usage becomes too low, the ASG terminates some worker nodes.
Adjust the Number of Worker Nodes#
In the AWS reference deployment, the AWS worker nodes are controlled by two different AWS Autoscaling Groups (ASGs):
an “ess” group for the public facing services, and
a backend group for everything else.
These groups are defined with the following entries in
the deployment/infrastructure/aws/eks.tf
Terraform file:
# Auto-scaling will handle adding more (or taking away) EC2 instances as EKS
# needs more or less worker nodes.
#
# This ASG handles capacity for the ESS containers
#
resource "aws_autoscaling_group" "eks-worker-nodes-ess" {
name = "${local.resource_prefix}-eks-worker-nodes-ess"
min_size = local.eks_ess_cluster_worker_node_min_count
desired_capacity = local.eks_ess_cluster_worker_node_desired_count
max_size = local.eks_ess_cluster_worker_node_max_count
launch_configuration = aws_launch_configuration.eks-worker-nodes-ess.id
...
# This ASG handles capacity for the backend containers
#
resource "aws_autoscaling_group" "eks-worker-nodes-backend" {
name = "${local.resource_prefix}-eks-worker-nodes-backend"
min_size = local.eks_backend_cluster_worker_node_min_count
desired_capacity = local.eks_backend_cluster_worker_node_desired_count
max_size = local.eks_backend_cluster_worker_node_max_count
launch_configuration = aws_launch_configuration.eks-worker-nodes-backend.id
...
Each group has the following settings related to the number of nodes:
desired_capacity
, i.e., desired number of nodes.min_size
, i.e., the minimum limit. The ASG will not decrease beyond this minimum number of nodes.max_size
, i.e., maximum limit. The ASG will not increase beyond the maximum limit of nodes.
These settings are set to variables defined in the
deployment/infrastructure/aws/configure.tf
Terraform file:
variable "eks_ess_cluster_worker_node_size" {
description = "EKS ESS cluster worker node size."
type = string
default = "t3.2xlarge"
}
variable "eks_ess_cluster_worker_node_min_count" {
description = "EKS ESS cluster worker node min count."
type = number
default = 1
}
...
To change the ASG settings, override
the specific variable values in the variable definition file
.variables/<ENV>.tfvars
For example:
Starting in version 1.0.4, you can override default values for any of
the following ASG settings in the
.variables/<ENV>.tfvars
file:
eks_ess_cluster_worker_node_size
eks_ess_cluster_worker_node_min_count
eks_ess_cluster_worker_node_desired_count
eks_ess_cluster_worker_node_max_count
eks_backend_cluster_worker_node_size
eks_backend_cluster_worker_node_min_count
eks_backend_cluster_worker_node_desired_count
eks_backend_cluster_worker_node_max_count
In addition to the min/desired/max count of AWS worker nodes, you can also adjust the size of the EC2 instances created. This can be used to increase the memory or CPU capacity of any one given node in the cluster.
Note
It is possible to remove all of the Autoscaling Group logic, manually provision each EC2 instance, and have them join the Kubernetes cluster. Doing so would allow the operator complete and exact control over how many worker nodes are in the cluster, but would require manual intervention any time the capacity needed to change (including disaster recovery if a node were to be terminated unexpectedly).
Scale ESS Services#
While the AWS Autoscaling Groups (ASGs) adjusts the number of AWS worker nodes available to the Kubernetes cluster, the number of actual services launched in the cluster is controlled by Kubernetes configuration.
In the AWS reference deployment, Kubernetes configuration
files for the services can be found in the
deployment/kubernetes/aws/05_deployments
directory. For example,
the ldp-deployment.yaml
file contains the following
configuration:
apiVersion: apps/v1
kind: Deployment
metadata:
name: ess-ldp
namespace: ess
spec:
replicas: 1
selector:
matchLabels:
app: ess-ldp
...
The replicas
configuration value determines the number of LDP
instances in the cluster. For example, the replicas: 1
indicates
that the deployment should launch a single instance of the LDP service
in the cluster.
Manually Adjust Number of Instances#
To manually adjust the number of instances, modify the replicas
value in the configuration file and re-apply to the cluster.
Dynamically Adjust Number of Instances#
To manage the service capacity dynamically, i.e., have the services scale up or down without manual intervention, you can use a K8 autoscaler.
The K8 autoscaler is a deployment that monitors the other deployments/clusters in the cluster. Based on its monitoring, autoscaler requests the control plane to scale up or down the number of replicas in those deployments.
The K8 autoscaler requires a Metrics Server to capture metrics from the other containers.
In addition, the K8 autoscaler requires configuration that instructs when
services should be scaled up or down. In the AWS reference
deployment, the K8 autoscaler
configuration can be found in
deployment/kubernetes/aws/06_autoscale/horizontal-autoscaler.yaml
:
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: ess-ldp
namespace: ess
spec:
maxReplicas: 30
minReplicas: 2
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: ess-ldp
# Average CPU usage across all LDP instances in the cluster.
targetCPUUtilizationPercentage: 40
The
spec.scaleTargetRef.name
tells the autoscaler to monitor the statistics of the LDP deployment.THe
spec.targetCPUUtilizationPercentage
indicates that if the average CPU usage across all of the LDP instances rises above 40%, the autoscaler can scale up the number of instances to meet demand. If the average CPU usage falls below 40%, the autoscaler should scale down the number of instances.
Note
This dynamic capacity is still dependent on the worker nodes available.
A request for additional services may initially be ineffective if there are no worker nodes available to schedule the additional service. However, the control plane will continue to try to schedule the service and should succeed once the AWS Autoscaling Group determines that additional capacity is needed and adds additional AWS worker node(s).
The above strategies outline mechanisms to both manually and automatically provision and maintain compute capacity for an ESS cluster in AWS.
Adjusting Container Scheduling#
Generally, the default scheduling mechanisms for your Kubernetes cluster should be sufficient to work with the above autoscaling strategies. However, to help ensure that the scheduler does not place too many containers on the same underlying worker node, you may occasionally need to be explicit about the amount of resources a container needs.
To adjust these settings in the AWS reference deployment, modify the resources
settings in the deployment configuration files; e.g.,
deployment/kubernetes/aws/05_deployments/ldp-deployment.yaml
resources:
limits:
cpu: "7600m"
requests:
cpu: "7000m"
where cpu: "7600m"
is “7600 millicpu” and cpu: "7000m"
is “7000
millicpu”. For details, see Kubernetes Documentation: Managing
Resources for Containers.