.. _arch_overview_aggregate_cluster: Aggregate Cluster ================= An aggregate cluster allows you to set up failover between multiple upstream clusters that have different configurations. For example, you might switch from an :ref:`EDS ` cluster to a :ref:`STRICT_DNS ` cluster, or from a cluster using :ref:`ROUND_ROBIN ` load balancing to one using :ref:`MAGLEV `. You can also use it to change timeouts, such as moving from a ``0.1s`` connection timeout to a ``1s`` timeout. To enable this failover, the aggregate cluster references other clusters by their names in the :ref:`configuration `. The ordering of these clusters in the :ref:`clusters list ` implicitly defines the fallback priority. The aggregate cluster uses a tiered approach to load balancing: * At the top level, it decides which cluster and priority to use. * It then hands off the actual load balancing to the selected cluster's own load balancer. Internally, this top-level load balancer treats all the priorities across all referenced clusters as a single linear list. By doing so, it reuses the existing load balancing algorithm and makes it possible to seamlessly shift traffic between clusters as needed. Linearize Priority Set ---------------------- Upstream hosts are grouped into different :ref:`priority levels `, and each level includes hosts that can be healthy, degraded, or unhealthy. To simplify host selection during load balancing, linearization merges these priority levels across multiple clusters into a single sequence. For example, if the primary cluster has three priority levels, and the secondary and tertiary clusters each have two, the failover order is: * Primary * Secondary * Tertiary +-----------+----------------+-------------------------------------+ | Cluster | Priority Level | Priority Level after Linearization | +===========+================+=====================================+ | Primary | 0 | 0 | +-----------+----------------+-------------------------------------+ | Primary | 1 | 1 | +-----------+----------------+-------------------------------------+ | Primary | 2 | 2 | +-----------+----------------+-------------------------------------+ | Secondary | 0 | 3 | +-----------+----------------+-------------------------------------+ | Secondary | 1 | 4 | +-----------+----------------+-------------------------------------+ | Tertiary | 0 | 5 | +-----------+----------------+-------------------------------------+ | Tertiary | 1 | 6 | +-----------+----------------+-------------------------------------+ This approach ensures a straightforward way to decide which hosts receive traffic based on priority, even when working with multiple clusters. Example ------- A sample aggregate cluster configuration could be: .. code-block:: yaml name: aggregate_cluster connect_timeout: 0.25s lb_policy: CLUSTER_PROVIDED cluster_type: name: envoy.clusters.aggregate typed_config: "@type": type.googleapis.com/envoy.extensions.clusters.aggregate.v3.ClusterConfig clusters: # cluster primary, secondary and tertiary should be defined outside. - primary - secondary - tertiary Important Considerations for Aggregate Clusters ----------------------------------------------- Some features might not work as expected with aggregate clusters. For example, PriorityLoad Retry Plugins ^^^^^^^^^^^^^^^^^^^^^^^^^^ :ref:`PriorityLoad retry plugins ` will not work with an aggregate cluster. Because the aggregate cluster's load balancer controls traffic distribution at a higher level, it effectively overrides the PriorityLoad behavior during load balancing. Stateful Sessions ^^^^^^^^^^^^^^^^^ :ref:`Stateful Sessions ` rely on the cluster to directly know the endpoint receiving traffic. With an aggregate cluster, the top-level load balancer selects a cluster first, but does not track specific endpoints inside that cluster. If we configure Stateful Sessions to override the upstream address, the load balancer bypasses its usual algorithm to send traffic directly to that host. This works only when the cluster itself knows the exact endpoint. In an aggregate cluster, the final routing decision happens one layer beneath the aggregate load balancer, so the filter cannot locate that specific endpoint at the aggregate level. As a result, Stateful Sessions are incompatible with aggregate clusters, because the final cluster choice is made without direct knowledge of the specific endpoint which doesn't exist at the top level. Load Balancing Example ---------------------- Aggregate cluster uses tiered load balancing algorithm and the top tier is distributing traffic to different clusters according to the health score across all :ref:`priorities ` in each cluster. The aggregate cluster in this section includes two clusters which is different from what the above configuration describes. The aggregate cluster uses a tiered load balancing algorithm with two main steps: * **Top Tier:** Distribute traffic across different clusters based on each cluster's overall health (across all :ref:`priorities `). * **Second Tier:** Once a cluster is chosen, delegate traffic distribution within that cluster to its own load balancer (e.g., :ref:`ROUND_ROBIN `, :ref:`MAGLEV `, etc.). +-----------------------------------------------------------------------------------------------------------------------+--------------------+----------------------+ | Cluster | Traffic to Primary | Traffic to Secondary | +=======================================================================+===============================================+====================+======================+ | Primary | Secondary | | +-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+ + | P=0 Healthy Endpoints | P=1 Healthy Endpoints | P=2 Healthy Endpoints | P=0 Healthy Endpoints | P=1 Healthy Endpoints | | +-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--------------------+----------------------+ | 100% | 100% | 100% | 100% | 100% | 100% | 0% | +-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--------------------+----------------------+ | 72% | 100% | 100% | 100% | 100% | 100% | 0% | +-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--------------------+----------------------+ | 71% | 1% | 0% | 100% | 100% | 100% | 0% | +-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--------------------+----------------------+ | 71% | 0% | 0% | 100% | 100% | 99% | 1% | +-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--------------------+----------------------+ | 50% | 0% | 0% | 50% | 0% | 70% | 30% | +-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--------------------+----------------------+ | 20% | 20% | 10% | 25% | 25% | 70% | 30% | +-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--------------------+----------------------+ | 20% | 0% | 0% | 20% | 0% | 50% | 50% | +-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--------------------+----------------------+ | 0% | 0% | 0% | 100% | 0% | 0% | 100% | +-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--------------------+----------------------+ | 0% | 0% | 0% | 72% | 0% | 0% | 100% | +-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--------------------+----------------------+ .. note:: By default, the :ref:`overprovisioning factor ` is **1.4**. This factor boosts lower health percentages to account for partial availability. For instance, if a priority level is **80%** healthy, multiplying by **1.4** results in **112%**, which is capped at **100%**. In other words, any product above **100%** is treated as **100%**. The aggregate cluster load balancer first calculates each priority's health score for every cluster, sums those up, and then assigns traffic based on the overall total. If the total is at least **100**, the combined traffic is capped at **100%**. If it's below **100**, Envoy scales (normalizes) it so that the final distribution sums to **100%**. Scenario A: Total Health ≥ 100 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Suppose we have two clusters: * Primary with three priority levels: ``20%, 20%, 10%`` healthy. * Secondary with two priority levels: ``25%, 25%`` healthy. 1. Compute raw health scores using ``percent_healthy × overprovisioning_factor (1.4)``, each capped at **100**. * Primary: * P=0: 20% × 1.4 = 28 * P=1: 20% × 1.4 = 28 * P=2: 10% × 1.4 = 14 * **Sum:** 28 + 28 + 14 = 70 * Secondary: * P=0: 25% × 1.4 = 35 * P=1: 25% × 1.4 = 35 * **Sum:** 35 + 35 = 70 2. Assign traffic to the first cluster, then the next, etc., without exceeding **100%** total. * Primary takes its 70% first. * Secondary then takes min(100 - 70, 70) = 30. * Combined total is 70 + 30 = 100. 3. Distribute that traffic internally by priority. * Primary's **70%** is split across its priorities in proportion to **28** : **28** : **14**, i.e.: * P=0 → 28% * P=1 → 28% * P=2 → 14% * Secondary's **30%** goes first to P=0, which is 35, but capped at whatever remains from 100 after primary took 70 (i.e., 30). So: * P=0 → 30% * P=1 → 0% Hence the final breakdown of traffic is: * Primary: ``{28%, 28%, 14%}`` * Secondary: ``{30%, 0%}`` Scenario B: Total Health < 100 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Sometimes the health scores add up to less than **100**. In that case, Envoy 'normalizes' them so that each cluster and priority still receives a portion out of 100%. For instance, consider: * Primary: ``20%, 0%, 0%`` * Secondary: ``20%, 0%`` 1. Compute raw health scores (same formula: ``percent_healthy × 1.4``, capped at **100**): * Primary: * P=0: 20% × 1.4 = 28 * P=1: 0 → 0 * P=2: 0 → 0 * **Sum:** 28 + 0 + 0 = 28 * Secondary: * P=0: 20% × 1.4 = 28 * P=1: 0 → 0 * **Sum:** 28 + 0 = 28 2. Total raw health = 28 + 28 = **56** (below 100). 3. Normalize so that the final total is 100%. * Both clusters end up at ``28 / 56 = 50%``. Thus each cluster, primary and secondary, receives 50% of the traffic. And since all of each cluster's share is in the **Priority 0** (28 points) and the others are 0, the final distribution is: * Primary: ``{50%, 0%, 0%}`` * Secondary: ``{50%, 0%}`` These scenarios show how Envoy's aggregate cluster load balancer decides which cluster (and priority level) gets traffic, depending on the overall health of the endpoints. When the summed health across all clusters and priorities reaches or exceeds **100**, Envoy caps the total at **100%** and allocates accordingly. If the total is below **100**, Envoy scales up proportionally so that all traffic still adds up to **100%**. Within each cluster, priority levels are also respected and allocated traffic based on their computed health scores. Putting It All Together ^^^^^^^^^^^^^^^^^^^^^^^^ To sum this up in pseudo algorithms: * Calculates each priority level's health score using ``(healthy% × overprovisioning factor)``, capped at **100%**. * Sums and optionally normalizes total health across clusters. * Computes each cluster's share of overall traffic i.e. its "cluster priority load". * Distributes traffic among the priorities within each cluster according to their health scores. * Performs final load balancing within each cluster. :: health(P_X) = min(100, 1.4 * 100 * healthy_P_X_backends / total_P_X_backends), where total_P_X_backends is the number of backends for priority P_X after linearization normalized_total_health = min(100, Σ(health(P_0)...health(P_X))) cluster_priority_load(C_0) = min(100, Σ(health(P_0)...health(P_k)) * 100 / normalized_total_health), where P_0...P_k belong to C_0 cluster_priority_load(C_X) = min(100 - Σ(priority_load(C_0)..priority_load(C_X-1)), Σ(health(P_x)...health(P_X)) * 100 / normalized_total_health), where P_x...P_X belong to C_X map from priorities to clusters: P_0 ... P_k ... ...P_x ... P_X ^ ^ ^ ^ cluster C_0 cluster C_X In the second tier of load balancing, Envoy hands off traffic to the cluster selected in the first tier. That cluster can then apply any of the load balancing algorithms described in :ref:`load balancer type `.