Network Working Group T. Zhou Internet-Draft Huawei Intended status: Experimental D. Li Expires: 21 April 2025 Tsinghua University X. Geng Huawei 18 October 2024 Perceptive Routing Information Model draft-zhou-rtgwg-perceptive-routing-information-00 Abstract This docuement defines the information model for perceptive routing, which could serve as a foundational component in the implementation of perceptive routing. Requirements Language The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119]. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on 21 April 2025. Copyright Notice Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. Zhou, et al. Expires 21 April 2025 [Page 1] Internet-Draft draft-zhou-rtgwg-perceptive-routing-info October 2024 This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/ license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 2. Terminologies . . . . . . . . . . . . . . . . . . . . . . . . 3 3. Perceptive Routing General Process . . . . . . . . . . . . . 3 4. Perceptive Routing Information Model . . . . . . . . . . . . 4 4.1. Local information model of PR Sensing Node . . . . . . . 4 4.1.1. Port Failure . . . . . . . . . . . . . . . . . . . . 5 4.1.2. Congestion . . . . . . . . . . . . . . . . . . . . . 5 4.1.3. Queue Length . . . . . . . . . . . . . . . . . . . . 5 4.1.4. Link SLA . . . . . . . . . . . . . . . . . . . . . . 6 4.2. Network information model of PR Sensing Node . . . . . . 6 4.2.1. On Path Information . . . . . . . . . . . . . . . . . 6 4.2.2. Bottleneck Information . . . . . . . . . . . . . . . 6 4.2.3. Topology Information . . . . . . . . . . . . . . . . 7 4.3. Routing decision information model of PR routing node . . 7 4.3.1. Reroute . . . . . . . . . . . . . . . . . . . . . . . 7 4.3.2. Congestion Control . . . . . . . . . . . . . . . . . 7 4.3.3. ECMP (Equal-Cost Multi-Path) Mode . . . . . . . . . . 8 4.3.4. Hierarchical Routing . . . . . . . . . . . . . . . . 8 4.3.5. Service Routing . . . . . . . . . . . . . . . . . . . 8 5. Security Considerations . . . . . . . . . . . . . . . . . . . 9 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 9 8. Normative References . . . . . . . . . . . . . . . . . . . . 9 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 9 1. Introduction In a lot of scenarios, especailly in DC, adaptive routing has emerged as a crucial technique for enhancing network performance and resilience. Traditional routing methods, which rely on static or pre-defined paths, often struggle to cope with rapidly changing network conditions, such as link failures, congestion, and varying traffic demands. Adaptive routing addresses these challenges by allowing routing decisions to be adjusted in real time, based on the current state of the network. Zhou, et al. Expires 21 April 2025 [Page 2] Internet-Draft draft-zhou-rtgwg-perceptive-routing-info October 2024 Adaptive routing systems like Perceptive Routing (PR) continuously monitor network parameters, such as port status, congestion levels, and link SLAs, to make informed decisions that improve traffic distribution and fault tolerance. A standardized information model could abstract the essential properties and relationships within the system, allowing different implementations to interact seamlessly. This model offers a common information model for representing the state of the network, allowing devices to communicate critical information such as failures, congestion, and optimal paths, facilitating dynamic and automated decision-making. This docuement defines the information model for perceptive routing, which could serve as a foundational component in the implementation of perceptive routing. 2. Terminologies PR-SN: Perceptive Routing Sensing Node, percept local and network information for routing decisions. PR-RN: Perceptive Routing Routing Node, use multi-dimensional sensory information to make routing decisions, including reroute, adjust speed, load balance, etc. PR-N: Perceptive Routing Notification, the message from PR-SN to PR- RN. 3. Perceptive Routing General Process The perceptive routing (PR) mechanism, akin to the adaptive routing network (ARN), aims to ensure efficient and resilient routing in dynamic network environments. PR involves real-time monitoring and decision-making based on multi-dimensional network status information, which enables the network to adapt to changes, such as congestion or link failures, with minimal disruption. Here's a summary of the general process: 1. Detection of Network Status Changes Perceptive Routing Sensing Nodes (PR-SN) continuously monitor both local and network-level conditions to detect any anomalies or changes in network performance, for example congestion or link/node failure. When such conditions are detected, PR-SN assesses whether they can be resolved locally or require further action. 2. Impact Assessment and Notification Zhou, et al. Expires 21 April 2025 [Page 3] Internet-Draft draft-zhou-rtgwg-perceptive-routing-info October 2024 If the PR-SN determines that the local measures (e.g., congestion mitigation strategies) are insufficient to address the problem, it generates a Perceptive Routing Notification (PR-N). The PR-N message contains detailed information about the change in network status (e.g., the type of failure, affected links/nodes, etc.) and is sent to the Perceptive Routing Routing Node (PR-RN) or other designated nodes. These messages inform PR-RN about issues that could affect network performance, allowing them to take proactive steps. 3. Routing Decision and Mitigation Upon receiving the PR-N message, PR-RN analyzes the specific information provided to make appropriate routing decisions. This decisions includes: * Rerouting: Selecting an alternative path to avoid the impacted link or node. * Traffic load adjustment: Rebalancing traffic flows to prevent further congestion or link overload. * Congestion control and ECMP: Adjusting traffic flows across multiple paths if available, using mechanisms like Equal-Cost Multi-Path (ECMP). * Hierarchical routing decisions: In cases of large-scale network changes, PR-RN may use hierarchical routing strategies to route traffic across different layers of the network efficiently. By leveraging real-time data provided by PR-SN and using advanced decision-making algorithms, PR-RN ensures that traffic is rerouted or adjusted dynamically, reducing latency, avoiding congested paths, and enhancing overall network efficiency. The following sections Define a standardized information model for this general process. 4. Perceptive Routing Information Model 4.1. Local information model of PR Sensing Node This section focuses on the attributes collected by a Perceptive Routing (PR) sensing node that monitors and gathers real-time data about local conditions. Zhou, et al. Expires 21 April 2025 [Page 4] Internet-Draft draft-zhou-rtgwg-perceptive-routing-info October 2024 4.1.1. Port Failure This type of attribute represents the status of ports on a node. This attribute indicates whether a port has failed and can no longer transmit or receive traffic. Monitoring port failures allows the network to quickly reroute traffic or trigger failover mechanisms. The possible attributes could include: * Port Status: Indicates if the port is active, down, or in a failed state. * Failure Cause: Specifies reasons for failure, such as hardware issues, misconfigurations, or timeouts. 4.1.2. Congestion This type of attribute represents the level of congestion at the node, typically measured by monitoring packet delay, packet loss, and throughput. This attribute informs the system of where congestion points are forming, helping to reroute traffic or apply congestion control techniques. The possible attributes could include: * Traffic Load: Measures current traffic levels on the link * Congestion Thresholds: Defines limits for congestion states * Packet Drop Rate: The rate at which packets are dropped due to congestion 4.1.3. Queue Length This type of attribute represents the length of queues in the node. High queue lengths indicate potential bottlenecks and delays, while short queues suggest fast packet forwarding. This attribute is vital for assessing node performance and avoiding network congestion. The possible attributes could include: * Queue Depth: Real-time data about the number of packets in the queue. * Queue Thresholds: Defines situations where the queue has overflowed, possible leading to packet loss Zhou, et al. Expires 21 April 2025 [Page 5] Internet-Draft draft-zhou-rtgwg-perceptive-routing-info October 2024 4.1.4. Link SLA This type of attribute represents the Service Level Agreement (SLA) associated with the link, including metrics like bandwidth, latency, jitter, and availability. The node monitors whether the link's performance is within the agreed SLA parameters and flags any violations for corrective actions. The possible attributes could include: * Link Latency: Measures the round-trip delay across the link. * Bandwidth Utilization: Tracks the percentage of available bandwidth being used. 4.2. Network information model of PR Sensing Node This section covers the attributes about network conditions beyond the local node, providing insights about paths, bottlenecks, and topology to assist in making routing decisions. 4.2.1. On Path Information This type of attribute represents detailed information about the current paths in use for traffic forwarding, including path metrics such as latency, jitter, and hop count. This attribute allows the node to assess the quality of the existing paths and their suitability for ongoing traffic demands. The possible attributes could include: * Hop Count: Number of hops the data takes between source and destination. * Latency Per Hop: The time it takes to traverse each node. 4.2.2. Bottleneck Information This type of attribute identifies and describes network bottlenecks where traffic is delayed or congested. This can include points where the capacity of a link is exceeded or where high latency is introduced due to excessive queuing. The possible attributes could include: * Link Utilization: Monitors bandwidth use on specific bottleneck links. Zhou, et al. Expires 21 April 2025 [Page 6] Internet-Draft draft-zhou-rtgwg-perceptive-routing-info October 2024 * Queue Status: Alerts when queues at a bottleneck link are nearing full capacity. 4.2.3. Topology Information This type of attribute rsepresents the structure of the network from the node's perspective. This attribute includes details such as connected neighbors, available paths, link states, and node status, providing a global view of the network for optimizing routing decisions. The possible attributes could include: * Neighboring Nodes: A list of adjacent nodes and their statuses. * Link Metrics: Performance and quality of links connecting nodes in the topology. 4.3. Routing decision information model of PR routing node This section covers the key attributes that influence the decision- making processes within a routing node. These attributes determine how traffic is routed, how congestion is managed, and how network resources are allocated. 4.3.1. Reroute This type of attribute describes the mechanisms and criteria used to reroute traffic in response to changes in the network, such as link failures or congestion events. This attribute ensures that traffic is dynamically redirected to optimal paths. The possible attributes could include: * Reroute Path: The alternative path selected during rerouting. * Failover Time: Time taken to switch to an alternate path. 4.3.2. Congestion Control This type of attribute details the strategies and protocols used to manage congestion at the routing node. This attribute includes techniques like rate-limiting, traffic shaping, or prioritizing certain flows to alleviate network congestion. The possible attributes could include: Zhou, et al. Expires 21 April 2025 [Page 7] Internet-Draft draft-zhou-rtgwg-perceptive-routing-info October 2024 * Congestion Avoidance Policies: Mechanisms to prevent congestion before it occurs. * Rate Limiting: Controls the traffic rate to avoid overwhelming the network. 4.3.3. ECMP (Equal-Cost Multi-Path) Mode This type of attribute refers to Equal-Cost Multi-Path (ECMP) routing, where multiple paths with equal cost are used to distribute traffic evenly across the network. This attribute describes how ECMP is implemented and the criteria for path selection. The possible attributes could include: * Hash Algorithm: Determines how ECMP chooses paths. * Traffic Distribution: Shows how traffic is split across multiple paths. 4.3.4. Hierarchical Routing This type of attribute covers the use of hierarchical routing techniques to manage larger networks efficiently. This attribute provides information about how the network is divided into tiers or areas, with routing decisions optimized within each layer. The possible attributes could include: * Routing Layers: Defines the layers of routing, such as access, aggregation, and core. * Aggregated Traffic Metrics: Summarizes traffic data for groups of lower-layer nodes. 4.3.5. Service Routing This type of attribute describes how the routing node handles service-specific routing requirements, such as directing traffic based on application needs (e.g., video streaming, voice, or data). This attribute ensures that service-level routing objectives are met, such as prioritizing latency-sensitive traffic. The possible attributes could include: * Service Path: The path chosen for traffic according to a specific service type. Zhou, et al. Expires 21 April 2025 [Page 8] Internet-Draft draft-zhou-rtgwg-perceptive-routing-info October 2024 * Service-Specific SLAs: Monitors SLA adherence based on service- level routing. 5. Security Considerations 6. IANA Considerations This document makes no request of IANA. Note to RFC Editor: this section may be removed on publication as an RFC. 7. Acknowledgements 8. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, . Authors' Addresses Tianran Zhou Huawei Email: zhoutianran@huawei.com Dan Li Tsinghua University Email: tolidan@tsinghua.edu.cn Xuesong Geng Huawei Email: gengxuesong@huawei.com Zhou, et al. Expires 21 April 2025 [Page 9]