Internet-Draft ISIS-DCM May 2026
Psenak, et al. Expires 7 November 2026 [Page]
Workgroup:
Network Working Group
Published:
Intended Status:
Informational
Expires:
Authors:
P. Psenak
Cisco Systems
J. Horn
Cisco Systems
B. Decraene
Orange
G. Gryszata
Orange

Distributed Congestion Mitigation

Abstract

This document describes the Distributed Congestion Mitigation (DCM) mechanism using the Interior Gateway Protocols (IGPs) such as IS-IS [RFC1195], OSPFv2 [RFC2328], or OSPFv3 [RFC5340]. DCM is a tactical, distributed mechanism, designed to mitigate network congestion by offloading traffic to an alternate, congestion-free paths. DCM is fully integrated in IGPs.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on 7 November 2026.

Table of Contents

1. Introduction

Network capacity planning is a proactive strategy to deal with the network congestion. Even with the proper capacity planning, network congestion arises from oversubscription, link or node failures, and from the shifting traffic patterns. DCM provides a reactive, distributed mechanism to mitigate local congestion by leveraging the interface utilization monitoring, IGP link administrative groups [RFC8919], [RFC8920], and Flex-Algo [RFC9350] path computation and forwarding.

2. Requirements Language

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.

3. Terminology

ABR:
Area Border Router.
ASBR:
Autonomous System Border Router.
DCM:
Distributed Congestion Mitigation.
FAD:
Flex-algo Definition [RFC9350].
OFA:
Offloading Flex-Algo, used for traffic offloading.
Congestion Affinity:
Administrative group used to exclude congested links from the OFA topology.
High Utilization Affinity:
Administrative group used to signal high utilization and prevent additional offload traffic.
Congestion Threshold:
The utilization level at which a link is marked with Congestion affinity. Traffic offloading is initiated for the local link.
Non-Congestion Threshold:
The utilization level at which the additional offloading on the link stops. If no traffic is offloaded from the local link, the Congestion affinity is removed.
High Utilization Threshold:
The utilization level at which a link is marked with High Utilization affinity to stabilize the load.
Restore Threshold:
The utilization level at which traffic restoration is initiated.
Low Utilization Threshold:
The utilization level used to remove High Utilization affinity.
LSP:
Link State Packet.
LSA:
Link State Advertisement.

4. DCM Requirements

DCM aims to offload traffic from the locally congested links. Some of the requirements of DCM are listed below:

5. Control Plane

DCM provisions a dedicated OFA. OFA's FAD is set to exclude the Congestion affinity. Any link declared congested MUST be excluded from the OFA topology by setting the Congestion affinity. This allows the OFA to natively route traffic around all congested links.

OFA is only used to carry the offloaded traffic.

DCM can be used to offload Algorithm 0 traffic or traffic of any Flex-Algo, which is not used as OFA itself. If the DCM is used for Flex-algo traffic, the OFA, on top of using the Congestion affinity exclude rule, SHOULD inherit the FAD algorithm-type, constraints, and metric type from the original Flex-algo for which the DCM is done.

Multiple OFAs can coexist inside the IGP area.

6. Local Congestion Monitoring and Detection

DCM utilizes precise congestion monitoring and detection mechanisms for the local interfaces on the router. Some of the characteristics of such monitoring are:

7. Traffic Offloading and Restoring

Traffic offloading is performed to divert the traffic onto the shortest path that avoids any congested links. The offloading process adheres to the following principles:

8. Offloaded Traffic Forwarding

Offloaded traffic is routed via OFA paths, requiring the switching of the traffic's algorithm to the OFA. The forwarding behavior is defined as follows:

9. Oscillation Avoidance

To limit the likelihood of oscillations, DCM uses two affinity-based signals based on link utilization thresholds as illustrated below:


Utilization (%)
  100 +-------+
      |       |
      |       |
   90 |-------| Congestion Threshold
      |       |     if (>=) Set Congestion Affinity,
      |       |             Start offloading
   80 |-------| Non-congestion threshold
      |       |     if (<=) Unset Congestion Affinity,
      |       |             Stop additional offloading
   70 |-------| High utilization threshold
      |       |     if (>=) Mark High Utilization
      |       |
   60 |-------| Restore Threshold
      |       |     if (<=) Start restoring traffic
      |       |
   50 |-------| Low utilization threshold
      |       |     if (<=) Unmark High Utilization
      |       |
      +-------+

The logic of the two affinities is as follows:

10. Oscillation Mitigation

While the affinity-based signaling described in Section 9 effectively mitigates large-scale oscillations, localized instabilities may still occur due to the following:

To address these scenarios, implementations MUST employ a damping mechanism for prefix-specific offloading:

11. Implementation Scope and Discretion

While this document defines the mandatory protocol behaviors required to ensure interoperability and network stability, certain aspects of the DCM mechanism are left to the implementation.

11.1. Mandatory Requirements

Implementations MUST adhere to the following rules to ensure the integrity of the DCM mechanism:

  • OFA Topology Exclusion: The OFA FAD MUST exclude Congestion Affinity.
  • High Utilization Affinity Signaling: Routers MUST interpret the High Utilization Affinity as a signal to cease sending new offload traffic to the link, while allowing existing offloaded traffic to persist to avoid unnecessary churning.
  • Routers participating in the OFA MUST monitor the local links utilization and advertise the Congestion Affinity when the Congestion Threshold is exceeded. They MUST stop advertising the Congestion Affinity if the utilization drops below the Non-Congestion Threshold and there is no traffic that is locally offloaded from the link.
  • Routers participating in the OFA MUST monitor the local links utilization and advertise the High Utilization Affinity when the High Utilization Threshold is exceeded. They MUST stop advertising the High Utilization Affinity if the utilization drops below the Low Utilization Threshold.
  • Advertising the Congestion and High Utilization Affinities is subject to the standard throttling used by the implementation when generating the LSP, or LSA update.
  • The setting of the Congestion and High Utilization affinities SHOULD be performed more aggressively than the unsetting. Setting of these affinities is a protective measure against imminent performance degradation; therefore, it SHOULD be prioritized to prevent congestion. Implementations MAY use immediate (un-smoothed) utilization samples, or more aggressive statistical adjustments for setting these affinities. Conversely, unsetting these affinity is a restoration measure to return to optimal routing. A more cautious approach to unsetting is recommended to ensure the link has stabilized and usage of the smoothed, statistically adjusted utilization value is recommended.
  • Algorithm Constraints: Offloading MUST NOT be performed for any Flex-Algo that is currently designated as an OFA.

11.2. Implementation-Specific Decisions

Implementations have the discretion to define the operational heuristics that trigger the protocol mechanisms, including:

  • Congestion Detection Logic: The specific statistical methods used for link utilization adjustment (e.g., EWMA, trend analysis, or noise filtering) are implementation-dependent.
  • Threshold Tuning: The specific values for Congestion, Non-Congestion, High Utilization, Restore, and Low Utilization thresholds are operational parameters that should be tuned based on network-specific capacity, stability and performance requirements.
  • Offload Granularity: The iterative process, including the size of traffic increments, is left to the implementer to optimize for network stability.
  • Offload Interval: The offload interval SHOULD be set to a value larger than the sum of the time required for the nodes in the network to detect high utilization or congestion, the time required to advertise the link affinities (including the LSP/LSA throttling), and the time required for those affinities to propagate across the network, plus some additional time to ensure network stability.
  • Filtering Policies: While the protocol provides the mechanism for offloading, the policy governing which prefixes or traffic classes are eligible for offloading is a local configuration choice.

11.3. Deployment Considerations

DCM can be deployed incrementally in the network. Legacy routers, that do not support DCM, MUST not participate in the OFA.

DCM does not need to be enabled on all routers in the network. However, it must be enabled on all routers along the specific path towards the egress node, starting from the point where DCM is being used. To successfully offload traffic, the offload path must be contiguous. If any router along the path towards the egress node lacks DCM enablement, the OFA path may not be available.

12. IANA Considerations

This document makes no requests of IANA.

13. Security Considerations

DCM relies on standard IS-IS, OSPF, and OSPFv3 flooding mechanisms. Implementations MUST ensure that affinity configuration is consistent between routers participating in the DCM inside the area and protected against unauthorized modification, as malicious manipulation could lead to traffic drops or suboptimal routing.

14. Normative References

[RFC2119]
Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, , <https://www.rfc-editor.org/info/rfc2119>.
[RFC8174]
Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, , <https://www.rfc-editor.org/info/rfc8174>.

15. Informative References

[RFC1195]
Callon, R., "Use of OSI IS-IS for routing in TCP/IP and dual environments", RFC 1195, DOI 10.17487/RFC1195, , <https://www.rfc-editor.org/info/rfc1195>.
[RFC2328]
Moy, J., "OSPF Version 2", STD 54, RFC 2328, DOI 10.17487/RFC2328, , <https://www.rfc-editor.org/info/rfc2328>.
[RFC5340]
Coltun, R., Ferguson, D., Moy, J., and A. Lindem, "OSPF for IPv6", RFC 5340, DOI 10.17487/RFC5340, , <https://www.rfc-editor.org/info/rfc5340>.
[RFC5286]
Atlas, A., Ed. and A. Zinin, Ed., "Basic Specification for IP Fast Reroute: Loop-Free Alternates", RFC 5286, DOI 10.17487/RFC5286, , <https://www.rfc-editor.org/info/rfc5286>.
[RFC9855]
Bashandy, A., Litkowski, S., Filsfils, C., Francois, P., Decraene, B., and D. Voyer, "Topology Independent Fast Reroute Using Segment Routing", RFC 9855, DOI 10.17487/RFC9855, , <https://www.rfc-editor.org/info/rfc9855>.
[RFC8919]
Ginsberg, L., Psenak, P., Previdi, S., Henderickx, W., and J. Drake, "IS-IS Application-Specific Link Attributes", RFC 8919, DOI 10.17487/RFC8919, , <https://www.rfc-editor.org/info/rfc8919>.
[RFC8920]
Psenak, P., Ed., Ginsberg, L., Henderickx, W., Tantsura, J., and J. Drake, "OSPF Application-Specific Link Attributes", RFC 8920, DOI 10.17487/RFC8920, , <https://www.rfc-editor.org/info/rfc8920>.
[RFC9350]
Psenak, P., Ed., Hegde, S., Filsfils, C., Talaulikar, K., and A. Gulko, "IGP Flexible Algorithm", RFC 9350, DOI 10.17487/RFC9350, , <https://www.rfc-editor.org/info/rfc9350>.

Contributors

The following people contributed to the content of this document and should be considered coauthors:

Francois Clad

Authors' Addresses

Peter Psenak
Cisco Systems
Apollo Business Center
Mlynske nivy 43
82109 Bratislava
Slovakia
Jakub Horn
Cisco Systems
Milpitas, CA 95035
United States of America
Bruno Decraene
Orange
France
Guillaume Gryszata
Orange
France