Aggregate Cluster
An aggregate cluster allows you to set up failover between multiple upstream clusters that have different
configurations. For example, you might switch from an EDS cluster to
a STRICT_DNS cluster, or from a cluster using
ROUND_ROBIN load balancing to one using
MAGLEV. You can also use it to change timeouts, such as moving from
a 0.1s
connection timeout to a 1s
timeout.
To enable this failover, the aggregate cluster references other clusters by their names in the configuration. The ordering of these clusters in the clusters list implicitly defines the fallback priority.
The aggregate cluster uses a tiered approach to load balancing:
At the top level, it decides which cluster and priority to use.
It then hands off the actual load balancing to the selected cluster’s own load balancer.
Internally, this top-level load balancer treats all the priorities across all referenced clusters as a single linear list. By doing so, it reuses the existing load balancing algorithm and makes it possible to seamlessly shift traffic between clusters as needed.
Linearize Priority Set
Upstream hosts are grouped into different priority levels, and each level includes hosts that can be healthy, degraded, or unhealthy. To simplify host selection during load balancing, linearization merges these priority levels across multiple clusters into a single sequence.
For example, if the primary cluster has three priority levels, and the secondary and tertiary clusters each have two, the failover order is:
Primary
Secondary
Tertiary
Cluster |
Priority Level |
Priority Level after Linearization |
---|---|---|
Primary |
0 |
0 |
Primary |
1 |
1 |
Primary |
2 |
2 |
Secondary |
0 |
3 |
Secondary |
1 |
4 |
Tertiary |
0 |
5 |
Tertiary |
1 |
6 |
This approach ensures a straightforward way to decide which hosts receive traffic based on priority, even when working with multiple clusters.
Example
A sample aggregate cluster configuration could be:
name: aggregate_cluster
connect_timeout: 0.25s
lb_policy: CLUSTER_PROVIDED
cluster_type:
name: envoy.clusters.aggregate
typed_config:
"@type": type.googleapis.com/envoy.extensions.clusters.aggregate.v3.ClusterConfig
clusters:
# cluster primary, secondary and tertiary should be defined outside.
- primary
- secondary
- tertiary
Important Considerations for Aggregate Clusters
Some features might not work as expected with aggregate clusters. For example,
PriorityLoad Retry Plugins
PriorityLoad retry plugins will not work with an aggregate cluster. Because the aggregate cluster’s load balancer controls traffic distribution at a higher level, it effectively overrides the PriorityLoad behavior during load balancing.
Stateful Sessions
Stateful Sessions rely on the cluster to directly know the endpoint receiving traffic. With an aggregate cluster, the top-level load balancer selects a cluster first, but does not track specific endpoints inside that cluster.
If we configure Stateful Sessions to override the upstream address, the load balancer bypasses its usual algorithm to send traffic directly to that host. This works only when the cluster itself knows the exact endpoint.
In an aggregate cluster, the final routing decision happens one layer beneath the aggregate load balancer, so the filter cannot locate that specific endpoint at the aggregate level. As a result, Stateful Sessions are incompatible with aggregate clusters, because the final cluster choice is made without direct knowledge of the specific endpoint which doesn’t exist at the top level.
Load Balancing Example
Aggregate cluster uses tiered load balancing algorithm and the top tier is distributing traffic to different clusters according to the health score across all priorities in each cluster. The aggregate cluster in this section includes two clusters which is different from what the above configuration describes.
The aggregate cluster uses a tiered load balancing algorithm with two main steps:
Top Tier: Distribute traffic across different clusters based on each cluster’s overall health (across all priorities).
Second Tier: Once a cluster is chosen, delegate traffic distribution within that cluster to its own load balancer (e.g., ROUND_ROBIN, MAGLEV, etc.).
Cluster |
Traffic to Primary |
Traffic to Secondary |
||||
---|---|---|---|---|---|---|
Primary |
Secondary |
|||||
P=0 Healthy Endpoints |
P=1 Healthy Endpoints |
P=2 Healthy Endpoints |
P=0 Healthy Endpoints |
P=1 Healthy Endpoints |
||
100% |
100% |
100% |
100% |
100% |
100% |
0% |
72% |
100% |
100% |
100% |
100% |
100% |
0% |
71% |
1% |
0% |
100% |
100% |
100% |
0% |
71% |
0% |
0% |
100% |
100% |
99% |
1% |
50% |
0% |
0% |
50% |
0% |
70% |
30% |
20% |
20% |
10% |
25% |
25% |
70% |
30% |
20% |
0% |
0% |
20% |
0% |
50% |
50% |
0% |
0% |
0% |
100% |
0% |
0% |
100% |
0% |
0% |
0% |
72% |
0% |
0% |
100% |
Note
By default, the overprovisioning factor is 1.4. This factor boosts lower health percentages to account for partial availability. For instance, if a priority level is 80% healthy, multiplying by 1.4 results in 112%, which is capped at 100%. In other words, any product above 100% is treated as 100%.
The aggregate cluster load balancer first calculates each priority’s health score for every cluster, sums those up, and then assigns traffic based on the overall total. If the total is at least 100, the combined traffic is capped at 100%. If it’s below 100, Envoy scales (normalizes) it so that the final distribution sums to 100%.
Scenario A: Total Health ≥ 100
Suppose we have two clusters:
Primary with three priority levels:
20%, 20%, 10%
healthy.Secondary with two priority levels:
25%, 25%
healthy.
Compute raw health scores using
percent_healthy × overprovisioning_factor (1.4)
, each capped at 100.Primary:
P=0: 20% × 1.4 = 28
P=1: 20% × 1.4 = 28
P=2: 10% × 1.4 = 14
Sum: 28 + 28 + 14 = 70
Secondary:
P=0: 25% × 1.4 = 35
P=1: 25% × 1.4 = 35
Sum: 35 + 35 = 70
Assign traffic to the first cluster, then the next, etc., without exceeding 100% total.
Primary takes its 70% first.
Secondary then takes min(100 - 70, 70) = 30.
Combined total is 70 + 30 = 100.
Distribute that traffic internally by priority.
Primary’s 70% is split across its priorities in proportion to 28 : 28 : 14, i.e.:
P=0 → 28%
P=1 → 28%
P=2 → 14%
Secondary’s 30% goes first to P=0, which is 35, but capped at whatever remains from 100 after primary took 70 (i.e., 30). So:
P=0 → 30%
P=1 → 0%
Hence the final breakdown of traffic is:
Primary:
{28%, 28%, 14%}
Secondary:
{30%, 0%}
Scenario B: Total Health < 100
Sometimes the health scores add up to less than 100. In that case, Envoy ‘normalizes’ them so that each cluster and priority still receives a portion out of 100%.
For instance, consider:
Primary:
20%, 0%, 0%
Secondary:
20%, 0%
Compute raw health scores (same formula:
percent_healthy × 1.4
, capped at 100):Primary:
P=0: 20% × 1.4 = 28
P=1: 0 → 0
P=2: 0 → 0
Sum: 28 + 0 + 0 = 28
Secondary:
P=0: 20% × 1.4 = 28
P=1: 0 → 0
Sum: 28 + 0 = 28
Total raw health = 28 + 28 = 56 (below 100).
Normalize so that the final total is 100%.
Both clusters end up at
28 / 56 = 50%
.
Thus each cluster, primary and secondary, receives 50% of the traffic. And since all of each cluster’s share is in the Priority 0 (28 points) and the others are 0, the final distribution is:
Primary:
{50%, 0%, 0%}
Secondary:
{50%, 0%}
These scenarios show how Envoy’s aggregate cluster load balancer decides which cluster (and priority level) gets traffic, depending on the overall health of the endpoints. When the summed health across all clusters and priorities reaches or exceeds 100, Envoy caps the total at 100% and allocates accordingly. If the total is below 100, Envoy scales up proportionally so that all traffic still adds up to 100%.
Within each cluster, priority levels are also respected and allocated traffic based on their computed health scores.
Putting It All Together
To sum this up in pseudo algorithms:
Calculates each priority level’s health score using
(healthy% × overprovisioning factor)
, capped at 100%.Sums and optionally normalizes total health across clusters.
Computes each cluster’s share of overall traffic i.e. its “cluster priority load”.
Distributes traffic among the priorities within each cluster according to their health scores.
Performs final load balancing within each cluster.
health(P_X) = min(100, 1.4 * 100 * healthy_P_X_backends / total_P_X_backends), where
total_P_X_backends is the number of backends for priority P_X after linearization
normalized_total_health = min(100, Σ(health(P_0)...health(P_X)))
cluster_priority_load(C_0) = min(100, Σ(health(P_0)...health(P_k)) * 100 / normalized_total_health),
where P_0...P_k belong to C_0
cluster_priority_load(C_X) = min(100 - Σ(priority_load(C_0)..priority_load(C_X-1)),
Σ(health(P_x)...health(P_X)) * 100 / normalized_total_health),
where P_x...P_X belong to C_X
map from priorities to clusters:
P_0 ... P_k ... ...P_x ... P_X
^ ^ ^ ^
cluster C_0 cluster C_X
In the second tier of load balancing, Envoy hands off traffic to the cluster selected in the first tier. That cluster can then apply any of the load balancing algorithms described in load balancer type.