<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE rfc []>
<rfc xmlns:xi="http://www.w3.org/2001/XInclude" category="info" docName="draft-zhu-sketch-int-codesign-00" ipr="trust200902" submissionType="IETF" consensus="true" version="3">

  <front>
    <title abbrev="Sketch-INT Co-Design">A Framework for Co-Designing Sketch and In-Band Network Telemetry for Accurate Network Measurement</title>
    <seriesInfo name="Internet-Draft" value="draft-zhu-sketch-int-codesign-00"/>

    <author fullname="Longlong Zhu" initials="L." surname="Zhu">
      <organization>Zhejiang University</organization>
      <address>
        <postal>
          <street>College of Computer Science and Technology</street>
          <city>Hangzhou</city>
          <region>Zhejiang</region>
          <country>China</country>
        </postal>
        <email>pocofo@foxmail.com</email>
      </address>
    </author>

    <author fullname="Xiang Chen" initials="X." surname="Chen">
      <organization>Zhejiang University</organization>
      <address>
        <postal>
          <street>College of Computer Science and Technology</street>
          <city>Hangzhou</city>
          <region>Zhejiang</region>
          <country>China</country>
        </postal>
         <email>wasdnsxchen@gmail.com</email>
      </address>
    </author>

    <date year="2026" month="June" day="3"/>
    <area>Transport</area>
    <keyword>network measurement</keyword>
    <keyword>sketch</keyword>
    <keyword>in-band network telemetry</keyword>
    <keyword>programmable data plane</keyword>

    <abstract>
      <t>Network measurement is a fundamental building block for network management applications. Existing measurement techniques face a trade-off between accuracy and resource efficiency: sketch-based techniques achieve high accuracy for large flows but degrade for small flows, while In-band Network Telemetry (INT) measures every flow accurately but at the cost of significant bandwidth and control plane resources.</t>
      <t>This document describes a framework for co-designing sketches and INT to measure large and small flows respectively, achieving both high accuracy and resource efficiency. It addresses two key challenges: (1) where to deploy measurement functions when routing information is incomplete, and (2) how to collect measurement data without causing network congestion.</t>
    </abstract>
  </front>

  <middle>

    <section anchor="introduction">
      <name>Introduction</name>
      <t>Network measurement collects traffic statistics such as per-flow packet counts from switches and periodically reports them to the control plane. The control plane provides these data to network management applications that identify events of interest and make corresponding decisions, including heavy hitter detection, DDoS detection, congestion control, and flow size distribution estimation.</t>
      <t>Accurate measurement of both large flows and small flows is critical. Large flows (i.e., flows comprising many packets) are important for volumetric applications such as heavy hitter and superspreader detection. Small flows are essential for applications such as flow size distribution estimation and congestion control, which require visibility into the long tail of the flow size distribution.</t>
      <t>However, existing measurement techniques face a fundamental trade-off between accuracy and resource efficiency:</t>
      <ul>
        <li>Sketch-based techniques <xref target="CM-Sketch"/> <xref target="Count-Sketch"/> <xref target="Elastic-Sketch"/> use compact probabilistic data structures to achieve accurate measurement of large flows with low resource consumption. However, due to hash collisions in memory-constrained switch environments, sketches exhibit significant errors when measuring small flows.</li>
        <li>In-band Network Telemetry (INT) <xref target="INT-Spec"/> records per-flow statistics within packet headers and extracts them at network egress. INT preserves full accuracy for every flow but generates high volumes of telemetry data that consume significant bandwidth and control plane resources.</li>
      </ul>
      <t>Recent hybrid approaches <xref target="SketchINT"/> <xref target="LightGuardian"/> attempt to combine sketches and INT but still inherit limitations from one or both techniques.</t>
      <t>This document describes a measurement framework that co-designs sketches and INT by assigning each technique to the flow type it handles best: sketches for large flows and INT for small flows. The framework addresses two optimization challenges: measurement point selection under incomplete routing knowledge, and congestion-free collection of measurement data.</t>
    </section>

    <section anchor="terminology">
      <name>Terminology</name>
      <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 <xref target="RFC2119"/> <xref target="RFC8174"/> when, and only when, they appear in all capitals, as shown here.</t>
      <dl>
        <dt>Sketch:</dt>
        <dd>A probabilistic data structure that maintains approximate traffic statistics using compact memory. Examples include Count-Min Sketch and Count Sketch.</dd>
        <dt>In-band Network Telemetry (INT):</dt>
        <dd>A technique where each switch along a packet's path appends measurement metadata to the packet header. At the egress switch, the accumulated metadata is extracted and reported to the control plane.</dd>
        <dt>Large Flow:</dt>
        <dd>A flow whose packet count exceeds a predefined threshold, indicating it contributes a significant portion of total traffic volume.</dd>
        <dt>Small Flow:</dt>
        <dd>A flow whose packet count is below the large flow threshold.</dd>
        <dt>Measurement Point:</dt>
        <dd>A programmable switch that executes sketch and/or INT functions to collect traffic statistics.</dd>
        <dt>Control Plane Node:</dt>
        <dd>A server or controller that receives measurement data from switches and runs network management applications.</dd>
        <dt>OD Pair:</dt>
        <dd>An Origin-Destination pair representing the ingress and egress switches of a specific flow.</dd>
        <dt>Flow Coverage:</dt>
        <dd>The fraction of flows in the network that are measured by at least one measurement point.</dd>
      </dl>
    </section>

    <section anchor="problem-statement">
      <name>Problem Statement</name>

      <section anchor="sketch-limitations">
        <name>Limitations of Sketch-Only Measurement</name>
        <t>Sketches achieve high accuracy for large flows because large flow counters dominate over noise from hash collisions. However, when measuring small flows, the few packets in each small flow are easily overwhelmed by collisions with large flow data.</t>
        <t>In resource-constrained switch environments, sketch memory is typically limited to a few megabytes per switch. Under such constraints, experimental evaluation shows that fewer than 50% of flows achieve measurement errors below 10% when using state-of-the-art sketches with 10 MB memory per switch.</t>
        <t>Recent techniques such as compressive sensing-based sketches and learning-based sketches attempt to mitigate this issue but require specific sketch designs (limiting generality), involve complex data structures that hinder hardware implementation, or require recovery times on the order of tens of seconds (unsuitable for latency-sensitive applications).</t>
      </section>

      <section anchor="int-limitations">
        <name>Limitations of INT-Only Measurement</name>
        <t>INT achieves full accuracy for every flow by piggybacking metadata on each packet. However, this per-packet monitoring generates significant overhead.</t>
        <t>According to the INT protocol specification <xref target="INT-Spec"/>, each switch adds a 12-byte INT header to each packet. In modern networks transferring Tbps-level traffic, the total number of packets per second exceeds 10^9, producing a corresponding volume of INT headers. This accumulation creates non-trivial pressure on both network bandwidth (for transferring INT headers) and control plane resources (for processing them).</t>
        <t>Existing optimizations such as sampling-based INT reduce overhead but degrade accuracy, particularly for small flows with few packets. Path-based INT optimizations reduce redundancy but still suffer from per-packet overhead for large flows.</t>
      </section>

      <section anchor="hybrid-limitations">
        <name>Limitations of Existing Hybrid Approaches</name>
        <t>Two categories of hybrid approaches have been proposed:</t>
        <ul>
          <li>Control-plane sketch aggregation: These approaches activate INT at data plane switches and build sketches at the control plane to aggregate collected INT data. However, they inherit the bandwidth overhead of full INT and additionally lose small flow accuracy through sketch aggregation.</li>
          <li>INT-embedded sketch data: These approaches encode sketch data into INT headers for efficient collection. While this reduces bandwidth overhead, it retains the fundamental accuracy limitations of sketches for small flows.</li>
        </ul>
        <t>Neither approach effectively leverages the complementary strengths of sketches and INT.</t>
      </section>
    </section>

    <section anchor="framework-overview">
      <name>Framework Overview</name>

      <section anchor="design-goals">
        <name>Design Goals</name>
        <t>The framework aims to achieve two goals:</t>
        <dl>
          <dt>G1: High Accuracy.</dt>
          <dd>Measure both large flows and small flows with high accuracy.</dd>
          <dt>G2: Resource Efficiency.</dt>
          <dd>Avoid excessive bandwidth and control plane resource consumption during measurement data collection.</dd>
        </dl>
      </section>

      <section anchor="architecture">
        <name>Architecture</name>
        <t>The core observation is that sketches and INT are complementary:</t>
        <ul>
          <li>Sketches offer high accuracy and resource efficiency for large flows, but fall short for small flows.</li>
          <li>INT provides full accuracy for all flows, but at high resource cost that scales with total packet volume.</li>
        </ul>
        <t>In modern networks, traffic is typically skewed: most packets come from a small number of large flows <xref target="Traffic-Skew"/>. This skewness enables the following assignment:</t>
        <ul>
          <li>Large flows are measured by sketches, which provide high accuracy and resource efficiency for these flows.</li>
          <li>Small flows are measured by INT, which preserves full accuracy. Because small flows collectively contribute few packets, INT's resource consumption remains bounded.</li>
        </ul>
        <t>This co-design achieves both G1 and G2 simultaneously.</t>
        <t>The framework operates in a general architecture comprising two planes:</t>
        <dl>
          <dt>Data Plane:</dt>
          <dd>Programmable switches execute both sketch and INT functions. Each incoming flow is initially measured by both sketches and INT. Once a flow is identified as a large flow by the sketch, subsequent packets of that flow are recorded only by the sketch, and INT processing for that flow is deactivated.</dd>
          <dt>Control Plane:</dt>
          <dd>A cluster of servers receives measurement data from switches, runs network management applications, and provides query interfaces for traffic statistics.</dd>
        </dl>
      </section>

      <section anchor="workflow">
        <name>Workflow</name>
        <t>The framework operates in four steps:</t>
        <dl>
          <dt>Step 1: Configuration.</dt>
          <dd>The administrator specifies which sketch and INT techniques to deploy. The framework supports arbitrary combinations of sketch types (e.g., Count-Min, Count Sketch, Elastic Sketch) and INT variants.</dd>
          <dt>Step 2: Measurement Point Selection.</dt>
          <dd>Given the network topology and a set of OD pairs, the framework selects which programmable switches to deploy sketch and INT functions on. The selection maximizes flow coverage while minimizing the distance between measurement points and control plane nodes. This step handles the case where precise routing information is unknown (<xref target="point-selection"/>).</dd>
          <dt>Step 3: Data Plane Execution.</dt>
          <dd><t>On each selected switch, every incoming packet is processed as follows:</t>
            <ol type="a">
              <li>The packet is inserted into the sketch data structure.</li>
              <li>The sketch checks whether the flow is a large flow (i.e., exceeds the threshold).</li>
              <li>If the flow is NOT a large flow, INT metadata is appended to the packet header.</li>
              <li>If the flow IS a large flow, the packet is forwarded without INT headers. The sketch maintains approximate statistics for this flow.</li>
              <li>At the egress switch, INT headers (if present) are extracted and queued for transmission to the control plane.</li>
            </ol>
          </dd>
          <dt>Step 4: Data Collection.</dt>
          <dd>The framework selects network paths to transfer measurement data (both periodic sketch dumps and INT reports) from switches to control plane nodes. Path selection ensures that measurement data traffic does not congest normal data plane traffic (<xref target="data-collection"/>).</dd>
        </dl>
      </section>
    </section>

    <section anchor="point-selection">
      <name>Measurement Point Selection</name>

      <section anchor="formulation">
        <name>Problem Formulation</name>
        <t>The measurement point selection problem determines which programmable switches in the network should deploy sketch and INT functions.</t>
        <t>Input:</t>
        <ul>
          <li>Network topology G = (V, E), where V is the set of switches and E is the set of links.</li>
          <li>The set P of programmable switches (P is a subset of V) capable of running sketches and INT.</li>
          <li>The set C of control plane nodes (C is a subset of V).</li>
          <li>A set F of flows, where each flow f is characterized by an OD pair (o_f, d_f), representing the ingress and egress switches.</li>
          <li>For each flow f, the set P_f of programmable switches on any shortest path between o_f and d_f.</li>
          <li>A distance metric delta(p, c) between switch p and control plane node c (e.g., hop count).</li>
        </ul>
        <t>Objectives:</t>
        <ul>
          <li>Maximize flow coverage: measure as many flows as possible.</li>
          <li>Minimize collection distance: reduce the distance between measurement points and control plane nodes to enable timely data collection.</li>
        </ul>
        <t>The two objectives are balanced by a user-configurable parameter alpha in [0, 1].</t>
        <t>Constraints:</t>
        <ul>
          <li>A flow is covered if at least one measurement point exists on any shortest path connecting its OD pair.</li>
          <li>Each selected measurement point MUST be assigned to exactly one control plane node for data reporting.</li>
          <li>Decision variables are binary: each switch is either selected or not.</li>
        </ul>
        <t>This problem is NP-hard, as it reduces to a combination of the set cover problem (for coverage maximization) and the uncapacitated facility location problem (for distance minimization).</t>
      </section>

      <section anchor="lagrangian">
        <name>Optimization via Lagrangian Relaxation</name>
        <t>Given the NP-hardness, the framework employs Lagrangian relaxation to obtain near-optimal solutions in polynomial time.</t>
        <t>The coverage constraints are relaxed using Lagrange multipliers. The relaxed problem decomposes into independent per-switch decisions:</t>
        <ul>
          <li>For each switch p, a "switch penalty" is computed as the sum of Lagrange multipliers over all flows that could be measured at p.</li>
          <li>A "collection cost" is computed as the minimum distance from p to any control plane node.</li>
          <li>Switch p is selected if and only if the switch penalty exceeds the collection cost.</li>
        </ul>
        <t>The Lagrange multipliers are iteratively updated using subgradient optimization. At each iteration:</t>
        <ol>
          <li>The Lagrangian subproblem is solved to obtain primal variables and the dual bound.</li>
          <li>Subgradients are computed based on constraint violations.</li>
          <li>Multipliers are updated with a diminishing step size.</li>
          <li>The best solution across iterations is recorded.</li>
        </ol>
        <t>This procedure yields solutions with bounded optimality gaps, as guaranteed by weak duality: the optimal dual value provides a lower bound on the optimal primal value.</t>
      </section>
    </section>

    <section anchor="data-collection">
      <name>Congestion-Free Data Collection</name>
      <t>After selecting measurement points, the framework determines how to transfer measurement data from switches to control plane nodes without causing network congestion.</t>

      <section anchor="rate-estimation">
        <name>Worst-Case Rate Estimation</name>
        <t>The framework estimates the maximum possible sending rate of measurement data at each switch.</t>
        <t>For sketch data, the worst-case sending rate is determined by the sketch memory size divided by the collection interval: gamma_sketch = S / T, where S is the sketch size in bytes and T is the collection window in seconds. For example, a 10 MB sketch with a 1 ms window produces a worst-case rate of 8 Gbps.</t>
        <t>For INT data, the worst-case rate at switch p is: gamma_INT = (C_p * phi / mu) * B_INT, where C_p is the link bandwidth capacity, phi is the maximum fraction of bandwidth consumed by small flows (obtainable from historical traffic analysis), mu is the average packet size of small flows, and B_INT is the INT header size per packet.</t>
        <t>The total worst-case rate at switch p is the sum of sketch and INT rates across all deployed measurement functions.</t>
      </section>

      <section anchor="path-selection">
        <name>Dynamic Path Selection</name>
        <t>Given the worst-case rate estimates, the framework selects network paths for measurement data transfer with the goal of avoiding congestion.</t>
        <t>The path selection problem is formulated as: minimize the total queue depth across all links, subject to the constraint that measurement data traffic on each link, combined with normal data traffic, MUST NOT exceed a safety threshold (e.g., 80% of link capacity).</t>
        <t>The framework computes candidate paths (e.g., K-shortest paths) between each measurement point and each control plane node. It then determines splitting ratios that distribute measurement data across these paths.</t>
        <t>At each time step, the following inputs are collected:</t>
        <ul>
          <li>Current data plane traffic utilization on each link.</li>
          <li>Current queue depth at each switch.</li>
          <li>Worst-case measurement data rates.</li>
        </ul>
        <t>Based on these inputs, the path selection algorithm outputs:</t>
        <ul>
          <li>Selected paths from each measurement point to the assigned control plane node.</li>
          <li>Splitting ratios for distributing measurement data.</li>
        </ul>
        <t>If the measurement data rate on any link would exceed the safety threshold, the splitting ratios are scaled down proportionally and renormalized to ensure compliance.</t>
        <t>The path selection process operates at sub-second timescales to adapt to changing traffic conditions.</t>
      </section>
    </section>

    <section anchor="applicability">
      <name>Applicability</name>
      <t>The framework is applicable to the following network management scenarios:</t>
      <dl>
        <dt>Volumetric Applications:</dt>
        <dd>Heavy hitter detection, superspreader detection, DDoS flow detection, and per-flow counting benefit from accurate large flow measurement (provided by sketches) combined with accurate small flow visibility (provided by INT).</dd>
        <dt>Aggregated Applications:</dt>
        <dd>Entropy estimation and flow size distribution estimation require accurate statistics across all flow sizes. The framework provides near-ideal flow size distributions by preserving small flow accuracy.</dd>
        <dt>Troubleshooting Applications:</dt>
        <dd>Microburst detection and congestion control require per-flow, per-hop metadata. INT provides this metadata for small flows while sketches efficiently summarize large flow behavior.</dd>
      </dl>
      <t>The framework is designed to be general-purpose:</t>
      <ul>
        <li>It supports arbitrary sketch types as pluggable components.</li>
        <li>It supports standard INT as well as variants such as probabilistic INT and delta-based INT.</li>
        <li>It operates on programmable switches (e.g., those based on the Protocol-Independent Switch Architecture) without requiring modifications to the forwarding pipeline.</li>
      </ul>
      <t>Implementation experience on 12.8 Tbps programmable switches demonstrates that the framework is feasible on production-grade hardware.</t>
    </section>

    <section anchor="security">
      <name>Security Considerations</name>
      <t>The framework inherits the security properties and risks of the underlying sketch and INT mechanisms.</t>
      <t>Measurement data transmitted from switches to control plane nodes SHOULD be integrity-protected to prevent tampering. In environments where measurement data traverses untrusted network segments, encryption SHOULD be applied.</t>
      <t>The large flow identification mechanism at each switch could be targeted by adversaries who craft traffic to evade classification (e.g., splitting a large flow into many small flows to force excessive INT processing). Implementations SHOULD incorporate rate limiting on INT data generation to mitigate such attacks.</t>
      <t>The measurement point selection algorithm takes OD pairs as input. In deployments where OD pair information is sensitive, access to this information SHOULD be restricted to authorized control plane components.</t>
    </section>

    <section anchor="iana">
      <name>IANA Considerations</name>
      <t>This document has no IANA actions.</t>
    </section>

  </middle>

  <back>
    <references>
      <name>References</name>
      <references>
        <name>Normative References</name>
        <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.2119.xml"/>
        <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.8174.xml"/>
      </references>
      <references>
        <name>Informative References</name>

        <reference anchor="INT-Spec" target="https://p4.org/p4-spec/docs/INT_v2_1.pdf">
          <front>
            <title>In-band Network Telemetry (INT) Dataplane Specification, Version 2.1</title>
            <author><organization>The P4.org Applications Working Group</organization></author>
            <date year="2020"/>
          </front>
        </reference>

        <reference anchor="CM-Sketch">
          <front>
            <title>An Improved Data Stream Summary: the Count-Min Sketch and its Applications</title>
            <author initials="G." surname="Cormode"/>
            <author initials="S." surname="Muthukrishnan"/>
            <date year="2005"/>
          </front>
          <seriesInfo name="Journal of Algorithms" value="Vol. 55, No. 1, pp. 58-75"/>
        </reference>

        <reference anchor="Count-Sketch">
          <front>
            <title>Finding Frequent Items in Data Streams</title>
            <author initials="M." surname="Charikar"/>
            <author initials="K." surname="Chen"/>
            <author initials="M." surname="Farach-Colton"/>
            <date year="2004"/>
          </front>
          <seriesInfo name="Theoretical Computer Science" value="Vol. 312, No. 1, pp. 3-15"/>
        </reference>

        <reference anchor="Elastic-Sketch">
          <front>
            <title>Elastic Sketch: Adaptive and Fast Network-wide Measurements</title>
            <author initials="T." surname="Yang"/>
            <date year="2018"/>
          </front>
          <seriesInfo name="Proceedings of ACM SIGCOMM" value="pp. 561-575"/>
        </reference>

        <reference anchor="SketchINT">
          <front>
            <title>SketchINT: Empowering INT with TowerSketch for Per-flow Per-switch Measurement</title>
            <author initials="K." surname="Yang"/>
            <date year="2023"/>
          </front>
          <seriesInfo name="IEEE TPDS" value="Vol. 34, No. 11"/>
        </reference>

        <reference anchor="LightGuardian">
          <front>
            <title>LightGuardian: A Full-Visibility, Lightweight, In-Band Telemetry System Using Sketchlets</title>
            <author initials="Y." surname="Zhao"/>
            <date year="2021"/>
          </front>
          <seriesInfo name="Proceedings of USENIX NSDI" value="pp. 991-1010"/>
        </reference>

        <reference anchor="Traffic-Skew">
          <front>
            <title>Inside the Social Network's (Datacenter) Network</title>
            <author initials="A." surname="Roy"/>
            <date year="2015"/>
          </front>
          <seriesInfo name="Proceedings of ACM SIGCOMM" value="pp. 123-137"/>
        </reference>


      </references>
    </references>
  </back>

</rfc>
