<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/xsl" href="rfc2629.xslt" ?>

<rfc xmlns:xi="http://www.w3.org/2001/XInclude"
     ipr="trust200902"
     docName="draft-hood-agtp-communication-00"
     category="info"
     submissionType="independent"
     xml:lang="en"
     tocInclude="true"
     sortRefs="true"
     symRefs="true"
     version="3">

  <front>
    <title abbrev="AGTP-COMMUNICATION">AGTP Communication Protocol</title>
    <seriesInfo name="Internet-Draft" value="draft-hood-agtp-communication-00"/>

    <author fullname="Chris Hood" initials="C." surname="Hood">
      <organization>Nomotic, Inc.</organization>
      <address>
        <email>chris@nomotic.ai</email>
        <uri>https://nomotic.ai</uri>
      </address>
    </author>

    <date year="2026"/>

    <area>Applications and Real-Time</area>

    <keyword>AI agents</keyword>
    <keyword>real-time communication</keyword>
    <keyword>voice</keyword>
    <keyword>video</keyword>
    <keyword>multi-modal</keyword>

    <abstract>
      <t>This document specifies the AGTP Communication Protocol
      (AGTP-COMMUNICATION): the companion specification for real-time
      multi-modal communication between agents over the Agent Transfer
      Protocol (AGTP). AGTP-COMMUNICATION defines how voice, video, and
      other real-time media streams are exchanged between agents on the
      agent-native substrate, with native support for the wire-level
      identity, authority scope, and attribution that AGTP provides.</t>

      <t>This is an early specification covering bilateral (two-agent)
      real-time communication. Multi-party conversations and conferencing
      patterns are out of scope for this revision and are deferred to
      future companion work.</t>
    </abstract>
  </front>

  <middle>

    <section anchor="introduction">
      <name>Introduction</name>

      <t>The Agent Transfer Protocol (AGTP) <xref target="AGTP"/> defines
      a dedicated protocol substrate for agent-to-agent and agent-to-API
      communication. AGTP carries agent identity, authority scope,
      attribution records, and intent-aligned methods at the wire level,
      with traffic structurally identified as agent traffic by the
      protocol itself.</t>

      <t>Agent communication is increasingly multi-modal. Agents communicate
      through voice when speaking to humans or to other voice-capable
      agents. Agents communicate through video when participating in
      visual interactions, screen sharing, or visual data exchange. Agents
      communicate through structured data streams for sensor data,
      telemetry, and continuous information flows. These real-time
      communication patterns require protocol-level support distinct from
      the request/response patterns AGTP's base methods address.</t>

      <t>This document specifies how real-time multi-modal communication
      runs on AGTP. The design reuses established real-time media
      patterns where appropriate (drawing on the architectural
      principles of <xref target="RFC3550"/> and <xref target="RFC7656"/>)
      and defines only what is specific to agent-native communication on
      the AGTP substrate.</t>

      <section anchor="relationship-to-agtp-session">
        <name>Relationship to AGTP-SESSION</name>

        <t>AGTP-SESSION <xref target="AGTP-SESSION"/> defines session
        establishment, lifecycle, and basic message exchange semantics on
        AGTP. AGTP-COMMUNICATION builds on AGTP-SESSION: real-time
        communication sessions are established through AGTP-SESSION's
        ESTABLISH method, with media-specific parameters negotiated as
        part of session setup.</t>
      </section>

      <section anchor="scope-of-this-document">
        <name>Scope of This Document</name>

        <t>In scope:</t>
        <ul spacing="normal">
          <li>Bilateral real-time audio communication between agents</li>
          <li>Bilateral real-time video communication between agents</li>
          <li>Multi-modal exchange (audio plus video, structured data
          alongside media)</li>
          <li>Codec negotiation and media format selection</li>
          <li>Real-time media framing on AGTP transport</li>
          <li>Quality of service handling at the AGTP layer</li>
          <li>Integration with AGTP-SESSION for session lifecycle</li>
        </ul>

        <t>Out of scope for this revision:</t>
        <ul spacing="normal">
          <li>Multi-party conversations (three or more agents)</li>
          <li>Conferencing patterns (mixers, SFUs, broadcast)</li>
          <li>Recording and replay protocols</li>
          <li>Voice-specific applications (telephony, IVR patterns)</li>
          <li>Domain-specific conversational AI patterns</li>
        </ul>
      </section>

      <section anchor="conventions-and-terminology">
        <name>Conventions and Terminology</name>

        <t>The key words "<bcp14>MUST</bcp14>", "<bcp14>MUST NOT</bcp14>",
        "<bcp14>REQUIRED</bcp14>", "<bcp14>SHALL</bcp14>",
        "<bcp14>SHALL NOT</bcp14>", "<bcp14>SHOULD</bcp14>",
        "<bcp14>SHOULD NOT</bcp14>", "<bcp14>RECOMMENDED</bcp14>",
        "<bcp14>NOT RECOMMENDED</bcp14>", "<bcp14>MAY</bcp14>", and
        "<bcp14>OPTIONAL</bcp14>" in this document are to be interpreted
        as described in BCP 14 <xref target="RFC2119"/> <xref
        target="RFC8174"/> when, and only when, they appear in all
        capitals, as shown here.</t>
      </section>
    </section>

    <section anchor="terminology">
      <name>Terminology</name>

      <dl>
        <dt>Communication Session:</dt>
        <dd>An AGTP-SESSION established for real-time multi-modal
        communication between two agents, with media parameters
        negotiated during session establishment.</dd>

        <dt>Media Stream:</dt>
        <dd>A unidirectional flow of real-time media data within a
        Communication Session. A bilateral Communication Session typically
        carries two media streams (one in each direction) per modality.</dd>

        <dt>Modality:</dt>
        <dd>A category of real-time media. This specification addresses
        audio, video, and structured data modalities. Future revisions may
        address additional modalities.</dd>

        <dt>Codec:</dt>
        <dd>An encoding format for media data, negotiated between
        communicating agents during session establishment.</dd>

        <dt>Communication Endpoint:</dt>
        <dd>An AGTP-aware agent participating in a Communication Session.
        Identified by its canonical Agent-ID and carrying authority scope
        appropriate to the communication being undertaken.</dd>
      </dl>
    </section>

    <section anchor="architectural-model">
      <name>Architectural Model</name>

      <t>AGTP-COMMUNICATION extends AGTP's request/response model with
      real-time streaming semantics. The architectural model has three
      components.</t>

      <section anchor="session-layer">
        <name>Session Layer</name>

        <t>Communication Sessions are established using AGTP-SESSION's
        ESTABLISH method with communication-specific parameters. The
        session carries the agent identity, authority scope, and
        attribution chain that apply throughout the communication.</t>

        <t>Session establishment for communication is more involved than
        session establishment for request/response: media parameters
        must be negotiated, codecs agreed, and stream characteristics
        established before media can flow.</t>
      </section>

      <section anchor="media-layer">
        <name>Media Layer</name>

        <t>Media streams carry real-time data between Communication
        Endpoints. Each stream has a defined modality (audio, video, or
        structured data), a negotiated codec, and timing characteristics
        appropriate to its modality.</t>

        <t>Media streams are framed for transport over AGTP. The framing
        preserves the timing and sequencing properties that real-time
        media requires while carrying the AGTP wire-level facts
        (identity, attribution) on each frame.</t>
      </section>

      <section anchor="control-layer">
        <name>Control Layer</name>

        <t>Control messages within a Communication Session manage stream
        lifecycle: opening streams, modifying parameters, handling
        quality degradation, and closing streams. Control messages use
        AGTP methods within the established session context.</t>
      </section>
    </section>

    <section anchor="communication-session-establishment">
      <name>Communication Session Establishment</name>

      <t>Communication Sessions are established through AGTP-SESSION's
      ESTABLISH method with the <tt>communication</tt> capability
      declared.</t>

      <section anchor="establish-request">
        <name>ESTABLISH Request</name>

        <t>A Communication Endpoint initiates a session by issuing
        ESTABLISH with a communication intent declaration:</t>

        <sourcecode type="http"><![CDATA[
ESTABLISH /sessions HTTP/AGTP/1.0
Agent-ID: <canonical Agent-ID>
Authority-Scope: communication:bilateral
Session-Intent: communication
Communication-Modalities: audio, video
Audio-Codecs: opus, g722
Video-Codecs: vp9, av1
Content-Type: application/agtp+json
]]></sourcecode>

        <t>The <tt>Communication-Modalities</tt> header declares which
        modalities the initiator wishes to use. The <tt>Audio-Codecs</tt>
        and <tt>Video-Codecs</tt> headers declare codecs the initiator
        supports, in order of preference.</t>
      </section>

      <section anchor="establish-response">
        <name>ESTABLISH Response</name>

        <t>The receiving Communication Endpoint responds with the
        negotiated parameters or rejects the session:</t>

        <sourcecode type="http"><![CDATA[
HTTP/AGTP/1.0 200 OK
Agent-ID: <canonical Agent-ID>
Session-ID: <session identifier>
Communication-Modalities: audio, video
Audio-Codec: opus
Video-Codec: vp9
Stream-Parameters: <negotiated stream parameters>
]]></sourcecode>

        <t>Successful establishment returns 200 with the negotiated
        parameters. Rejection returns appropriate AGTP status codes (451
        Scope Violation for authority-scope issues, 463 Proposal Rejected
        for parameter mismatch, 503 Service Unavailable for capacity
        limitations).</t>
      </section>

      <section anchor="authority-scope-considerations">
        <name>Authority Scope Considerations</name>

        <t>Communication Sessions carry significant authority implications.
        A session that includes audio capture and transmission grants the
        initiating agent the ability to capture and transmit audio for the
        session duration. Authority-Scope <bcp14>MUST</bcp14> include
        appropriate permissions for each modality:</t>

        <ul spacing="normal">
          <li><tt>communication:audio:capture</tt> for capturing audio</li>
          <li><tt>communication:audio:transmit</tt> for transmitting audio</li>
          <li><tt>communication:video:capture</tt> for capturing video</li>
          <li><tt>communication:video:transmit</tt> for transmitting video</li>
          <li><tt>communication:bilateral</tt> as a shorthand combining
          standard bilateral capture and transmission</li>
        </ul>

        <t>Receivers <bcp14>MUST</bcp14> validate that the initiator's
        Authority-Scope includes appropriate permissions for the
        requested modalities.</t>
      </section>
    </section>

    <section anchor="media-stream-semantics">
      <name>Media Stream Semantics</name>

      <t>Media streams within a Communication Session carry real-time data
      with timing, sequencing, and quality requirements appropriate to
      their modality.</t>

      <section anchor="audio-streams">
        <name>Audio Streams</name>

        <t>Audio streams carry audio media between Communication Endpoints.
        Audio framing follows established real-time audio practice with
        adaptation for AGTP transport:</t>

        <ul spacing="normal">
          <li>Frames carry timestamp information for synchronization</li>
          <li>Sequence numbers detect loss and reordering</li>
          <li>Frame size is negotiated during session establishment</li>
          <li>Codec-specific parameters (sample rate, channels) are
          negotiated</li>
        </ul>

        <t>AGTP-COMMUNICATION reuses RTP timestamp and sequence semantics
        <xref target="RFC3550"/> where compatible, adapted for transport
        on AGTP rather than UDP. This preserves established real-time
        audio handling while gaining AGTP's wire-level identity and
        attribution properties.</t>
      </section>

      <section anchor="video-streams">
        <name>Video Streams</name>

        <t>Video streams carry video media between Communication
        Endpoints. Video framing addresses the additional complexity of
        variable frame sizes, key frame management, and bandwidth
        adaptation:</t>

        <ul spacing="normal">
          <li>Frames carry timestamp and sequence information</li>
          <li>Frame type (key/delta) is indicated</li>
          <li>Codec-specific parameters (resolution, frame rate) are
          negotiated</li>
          <li>Bandwidth adaptation signals are exchanged through control
          messages</li>
        </ul>
      </section>

      <section anchor="structured-data-streams">
        <name>Structured Data Streams</name>

        <t>Structured data streams carry continuous data flows that are
        not audio or video: sensor telemetry, conversational state
        updates, real-time analytics, contextual data alongside other
        media.</t>

        <t>Structured data streams have different real-time
        characteristics than audio or video. Timing may matter (sensor
        sampling rates) or may not (state updates). Loss tolerance varies
        by use case. Structured data stream parameters are negotiated
        during session establishment.</t>
      </section>
    </section>

    <section anchor="quality-of-service">
      <name>Quality of Service</name>

      <t>Real-time communication has quality requirements that AGTP must
      support at the transport layer. AGTP-COMMUNICATION specifies
      quality of service handling appropriate to each modality.</t>

      <section anchor="latency-requirements">
        <name>Latency Requirements</name>

        <t>Audio communication typically requires latency under 150ms for
        natural conversational flow. Video communication tolerates higher
        latency but synchronization between audio and video is critical.
        Structured data streams have application-specific latency
        requirements.</t>

        <t>When AGTP runs over QUIC <xref target="RFC9000"/>, the
        underlying transport supports multiple streams with independent
        flow control, which enables appropriate handling of different
        modality requirements within a single Communication Session.</t>
      </section>

      <section anchor="bandwidth-adaptation">
        <name>Bandwidth Adaptation</name>

        <t>Communication Endpoints <bcp14>MUST</bcp14> be capable of
        adapting media parameters in response to bandwidth constraints.
        Control messages within a Communication Session signal:</t>

        <ul spacing="normal">
          <li>Bandwidth estimates from the receiving endpoint</li>
          <li>Requested adaptations from the sending endpoint</li>
          <li>Confirmation of parameter changes</li>
        </ul>

        <t>Bandwidth adaptation is negotiated; both endpoints participate
        in the decision to adapt.</t>
      </section>

      <section anchor="priority-within-agtp">
        <name>Priority Within AGTP</name>

        <t>AGTP traffic on port 4480 <bcp14>SHOULD</bcp14> be treated
        with priority appropriate to its modality at the transport
        layer. Real-time audio and video streams require lower latency
        than request/response traffic; structured data streams may have
        varying requirements.</t>

        <t>Network operators carrying AGTP traffic <bcp14>SHOULD</bcp14>
        consider that AGTP-COMMUNICATION sessions are likely to include
        latency-sensitive real-time media and apply appropriate QoS
        handling.</t>
      </section>
    </section>

    <section anchor="attribution-and-recording">
      <name>Attribution and Recording</name>

      <t>AGTP's attribution model applies to Communication Sessions:
      every session establishes attribution chains, and attribution
      records are produced for session lifecycle events.</t>

      <t>Media content within streams is not, by default, recorded by
      the protocol. Recording is an application-layer decision made by
      governance frameworks or specific deployments. AGTP-COMMUNICATION
      provides the session-level attribution that recording systems can
      build on; it does not itself perform recording.</t>

      <t>When recording is performed at the application layer, the
      attribution records produced by AGTP-COMMUNICATION provide
      verifiable evidence of session participants, authority scope, and
      session lifecycle that supports compliance with recording-relevant
      regulations.</t>
    </section>

    <section anchor="security-considerations">
      <name>Security Considerations</name>

      <t>Real-time communication on AGTP inherits AGTP's security
      properties: transport encryption (TLS 1.3 or QUIC), agent identity
      verification, and authority scope enforcement at the protocol
      layer.</t>

      <t>Additional security considerations specific to communication:</t>

      <section anchor="media-capture-authorization">
        <name>Media Capture Authorization</name>

        <t>Agents that capture audio or video <bcp14>MUST</bcp14> have
        appropriate Authority-Scope. This is enforced at session
        establishment. Capture without scope is a 451 Scope Violation.</t>
      </section>

      <section anchor="replay-and-tampering">
        <name>Replay and Tampering</name>

        <t>Audio and video streams <bcp14>MUST NOT</bcp14> be replayable
        across sessions without the cryptographic markers that identify
        them as recordings. Session identifiers, timestamps, and
        attribution records carried with streams enable verification that
        media was captured in the context the recipient believes.</t>
      </section>

      <section anchor="privacy-considerations">
        <name>Privacy Considerations</name>

        <t>Communication Sessions may involve sensitive content (private
        conversations, confidential video, sensor data with privacy
        implications). AGTP's wire-level identity verification and
        attribution provide the structural facts that privacy frameworks
        require. Application-layer privacy controls build on these
        foundations.</t>
      </section>

      <section anchor="denial-of-service">
        <name>Denial of Service</name>

        <t>Real-time communication can be used to consume substantial
        bandwidth and processing resources. Communication Endpoints
        <bcp14>SHOULD</bcp14> implement appropriate rate limits and
        resource controls. Authority-Scope can include resource
        limitations that the protocol enforces at session
        establishment.</t>
      </section>
    </section>

    <section anchor="iana-considerations">
      <name>IANA Considerations</name>

      <t>This document defines several new headers and parameters that
      require IANA registration:</t>

      <ul spacing="normal">
        <li>Session-Intent header (registered under AGTP header
        registry)</li>
        <li>Communication-Modalities header</li>
        <li>Audio-Codecs, Video-Codecs headers (codec negotiation)</li>
        <li>Audio-Codec, Video-Codec response headers</li>
        <li>Authority-Scope tokens for communication
        (<tt>communication:audio:*</tt>, <tt>communication:video:*</tt>,
        <tt>communication:bilateral</tt>)</li>
      </ul>

      <t>Specific registry assignments will be detailed in a future
      revision once the AGTP header and scope token registries are
      established.</t>
    </section>

    <section anchor="open-questions">
      <name>Open Questions</name>

      <t>Several design decisions remain open for this revision:</t>

      <ul spacing="normal">
        <li>Whether to define an AGTP-specific real-time media framing
        or to reuse RTP framing carried over AGTP transport</li>
        <li>The relationship to WebRTC <xref target="RFC8825"/> for
        browser-based agents communicating over AGTP</li>
        <li>Whether to define agent-specific codecs (e.g., for
        low-bandwidth agent-to-agent voice that doesn't need to sound
        human) or to rely entirely on existing codec registries</li>
        <li>How AGTP-COMMUNICATION sessions interact with AGTP's intent
        methods for non-real-time exchanges within the same agent
        pair</li>
        <li>Multi-party conversation patterns and whether they belong as
        a v01 extension or as a separate companion specification</li>
      </ul>

      <t>These will be addressed in future revisions of this draft based
      on community feedback and implementation experience.</t>
    </section>

  </middle>

  <back>
    <references>
      <name>References</name>

      <references>
        <name>Normative References</name>

        <reference anchor="RFC2119" target="https://www.rfc-editor.org/info/rfc2119">
          <front>
            <title>Key words for use in RFCs to Indicate Requirement Levels</title>
            <author fullname="S. Bradner" initials="S." surname="Bradner"/>
            <date month="March" year="1997"/>
          </front>
          <seriesInfo name="BCP" value="14"/>
          <seriesInfo name="RFC" value="2119"/>
          <seriesInfo name="DOI" value="10.17487/RFC2119"/>
        </reference>

        <reference anchor="RFC3550" target="https://www.rfc-editor.org/info/rfc3550">
          <front>
            <title>RTP: A Transport Protocol for Real-Time Applications</title>
            <author fullname="H. Schulzrinne" initials="H." surname="Schulzrinne"/>
            <author fullname="S. Casner" initials="S." surname="Casner"/>
            <author fullname="R. Frederick" initials="R." surname="Frederick"/>
            <author fullname="V. Jacobson" initials="V." surname="Jacobson"/>
            <date month="July" year="2003"/>
          </front>
          <seriesInfo name="STD" value="64"/>
          <seriesInfo name="RFC" value="3550"/>
          <seriesInfo name="DOI" value="10.17487/RFC3550"/>
        </reference>

        <reference anchor="RFC8174" target="https://www.rfc-editor.org/info/rfc8174">
          <front>
            <title>Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words</title>
            <author fullname="B. Leiba" initials="B." surname="Leiba"/>
            <date month="May" year="2017"/>
          </front>
          <seriesInfo name="BCP" value="14"/>
          <seriesInfo name="RFC" value="8174"/>
          <seriesInfo name="DOI" value="10.17487/RFC8174"/>
        </reference>

        <reference anchor="RFC8825" target="https://www.rfc-editor.org/info/rfc8825">
          <front>
            <title>Overview: Real-Time Protocols for Browser-Based Applications</title>
            <author fullname="H. Alvestrand" initials="H." surname="Alvestrand"/>
            <date month="January" year="2021"/>
          </front>
          <seriesInfo name="RFC" value="8825"/>
          <seriesInfo name="DOI" value="10.17487/RFC8825"/>
        </reference>

        <reference anchor="AGTP">
          <front>
            <title>Agent Transfer Protocol (AGTP)</title>
            <author fullname="Chris Hood" initials="C." surname="Hood">
              <organization>Nomotic, Inc.</organization>
            </author>
            <date year="2026"/>
          </front>
          <seriesInfo name="Internet-Draft" value="draft-hood-independent-agtp-07"/>
        </reference>

        <reference anchor="AGTP-SESSION">
          <front>
            <title>AGTP Session Protocol</title>
            <author fullname="Chris Hood" initials="C." surname="Hood">
              <organization>Nomotic, Inc.</organization>
            </author>
            <date year="2026"/>
          </front>
          <seriesInfo name="Internet-Draft" value="draft-hood-agtp-session-00"/>
        </reference>
      </references>

      <references>
        <name>Informative References</name>

        <reference anchor="RFC7656" target="https://www.rfc-editor.org/info/rfc7656">
          <front>
            <title>A Taxonomy of Semantics and Mechanisms for Real-Time Transport Protocol (RTP) Sources</title>
            <author fullname="J. Lennox" initials="J." surname="Lennox"/>
            <author fullname="K. Gross" initials="K." surname="Gross"/>
            <author fullname="S. Nandakumar" initials="S." surname="Nandakumar"/>
            <author fullname="G. Salgueiro" initials="G." surname="Salgueiro"/>
            <author fullname="B. Burman" initials="B." surname="Burman"/>
            <date month="November" year="2015"/>
          </front>
          <seriesInfo name="RFC" value="7656"/>
          <seriesInfo name="DOI" value="10.17487/RFC7656"/>
        </reference>

        <reference anchor="RFC9000" target="https://www.rfc-editor.org/info/rfc9000">
          <front>
            <title>QUIC: A UDP-Based Multiplexed and Secure Transport</title>
            <author fullname="J. Iyengar" initials="J." surname="Iyengar" role="editor"/>
            <author fullname="M. Thomson" initials="M." surname="Thomson" role="editor"/>
            <date month="May" year="2021"/>
          </front>
          <seriesInfo name="RFC" value="9000"/>
          <seriesInfo name="DOI" value="10.17487/RFC9000"/>
        </reference>
      </references>
    </references>

    <section anchor="acknowledgments" numbered="false">
      <name>Acknowledgments</name>

      <t>This document builds on the broader AGTP family and incorporates
      architectural principles from established real-time media work
      including RTP/RTCP <xref target="RFC3550"/> and WebRTC
      <xref target="RFC8825"/>.</t>
    </section>

    <section anchor="contributors" numbered="false">
      <name>Contributors</name>

      <t>Contributors will be acknowledged in future revisions as
      community participation develops.</t>
    </section>
  </back>
</rfc>
