| Internet-Draft | AGTP-COMMUNICATION | May 2026 |
| Hood | Expires 19 November 2026 | [Page] |
This document specifies the AGTP Communication Protocol (AGTP-COMMUNICATION): the companion specification for real-time multi-modal communication between agents over the Agent Transfer Protocol (AGTP). AGTP-COMMUNICATION defines how voice, video, and other real-time media streams are exchanged between agents on the agent-native substrate, with native support for the wire-level identity, authority scope, and attribution that AGTP provides.¶
This is an early specification covering bilateral (two-agent) real-time communication. Multi-party conversations and conferencing patterns are out of scope for this revision and are deferred to future companion work.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 19 November 2026.¶
Copyright (c) 2026 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document.¶
The Agent Transfer Protocol (AGTP) [AGTP] defines a dedicated protocol substrate for agent-to-agent and agent-to-API communication. AGTP carries agent identity, authority scope, attribution records, and intent-aligned methods at the wire level, with traffic structurally identified as agent traffic by the protocol itself.¶
Agent communication is increasingly multi-modal. Agents communicate through voice when speaking to humans or to other voice-capable agents. Agents communicate through video when participating in visual interactions, screen sharing, or visual data exchange. Agents communicate through structured data streams for sensor data, telemetry, and continuous information flows. These real-time communication patterns require protocol-level support distinct from the request/response patterns AGTP's base methods address.¶
This document specifies how real-time multi-modal communication runs on AGTP. The design reuses established real-time media patterns where appropriate (drawing on the architectural principles of [RFC3550] and [RFC7656]) and defines only what is specific to agent-native communication on the AGTP substrate.¶
AGTP-SESSION [AGTP-SESSION] defines session establishment, lifecycle, and basic message exchange semantics on AGTP. AGTP-COMMUNICATION builds on AGTP-SESSION: real-time communication sessions are established through AGTP-SESSION's ESTABLISH method, with media-specific parameters negotiated as part of session setup.¶
In scope:¶
Out of scope for this revision:¶
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.¶
AGTP-COMMUNICATION extends AGTP's request/response model with real-time streaming semantics. The architectural model has three components.¶
Communication Sessions are established using AGTP-SESSION's ESTABLISH method with communication-specific parameters. The session carries the agent identity, authority scope, and attribution chain that apply throughout the communication.¶
Session establishment for communication is more involved than session establishment for request/response: media parameters must be negotiated, codecs agreed, and stream characteristics established before media can flow.¶
Media streams carry real-time data between Communication Endpoints. Each stream has a defined modality (audio, video, or structured data), a negotiated codec, and timing characteristics appropriate to its modality.¶
Media streams are framed for transport over AGTP. The framing preserves the timing and sequencing properties that real-time media requires while carrying the AGTP wire-level facts (identity, attribution) on each frame.¶
Control messages within a Communication Session manage stream lifecycle: opening streams, modifying parameters, handling quality degradation, and closing streams. Control messages use AGTP methods within the established session context.¶
Communication Sessions are established through AGTP-SESSION's
ESTABLISH method with the communication capability
declared.¶
A Communication Endpoint initiates a session by issuing ESTABLISH with a communication intent declaration:¶
ESTABLISH /sessions HTTP/AGTP/1.0 Agent-ID: <canonical Agent-ID> Authority-Scope: communication:bilateral Session-Intent: communication Communication-Modalities: audio, video Audio-Codecs: opus, g722 Video-Codecs: vp9, av1 Content-Type: application/agtp+json¶
The Communication-Modalities header declares which
modalities the initiator wishes to use. The Audio-Codecs
and Video-Codecs headers declare codecs the initiator
supports, in order of preference.¶
The receiving Communication Endpoint responds with the negotiated parameters or rejects the session:¶
HTTP/AGTP/1.0 200 OK Agent-ID: <canonical Agent-ID> Session-ID: <session identifier> Communication-Modalities: audio, video Audio-Codec: opus Video-Codec: vp9 Stream-Parameters: <negotiated stream parameters>¶
Successful establishment returns 200 with the negotiated parameters. Rejection returns appropriate AGTP status codes (451 Scope Violation for authority-scope issues, 463 Proposal Rejected for parameter mismatch, 503 Service Unavailable for capacity limitations).¶
Media streams within a Communication Session carry real-time data with timing, sequencing, and quality requirements appropriate to their modality.¶
Audio streams carry audio media between Communication Endpoints. Audio framing follows established real-time audio practice with adaptation for AGTP transport:¶
AGTP-COMMUNICATION reuses RTP timestamp and sequence semantics [RFC3550] where compatible, adapted for transport on AGTP rather than UDP. This preserves established real-time audio handling while gaining AGTP's wire-level identity and attribution properties.¶
Video streams carry video media between Communication Endpoints. Video framing addresses the additional complexity of variable frame sizes, key frame management, and bandwidth adaptation:¶
Structured data streams carry continuous data flows that are not audio or video: sensor telemetry, conversational state updates, real-time analytics, contextual data alongside other media.¶
Structured data streams have different real-time characteristics than audio or video. Timing may matter (sensor sampling rates) or may not (state updates). Loss tolerance varies by use case. Structured data stream parameters are negotiated during session establishment.¶
Real-time communication has quality requirements that AGTP must support at the transport layer. AGTP-COMMUNICATION specifies quality of service handling appropriate to each modality.¶
Audio communication typically requires latency under 150ms for natural conversational flow. Video communication tolerates higher latency but synchronization between audio and video is critical. Structured data streams have application-specific latency requirements.¶
When AGTP runs over QUIC [RFC9000], the underlying transport supports multiple streams with independent flow control, which enables appropriate handling of different modality requirements within a single Communication Session.¶
Communication Endpoints MUST be capable of adapting media parameters in response to bandwidth constraints. Control messages within a Communication Session signal:¶
Bandwidth adaptation is negotiated; both endpoints participate in the decision to adapt.¶
AGTP traffic on port 4480 SHOULD be treated with priority appropriate to its modality at the transport layer. Real-time audio and video streams require lower latency than request/response traffic; structured data streams may have varying requirements.¶
Network operators carrying AGTP traffic SHOULD consider that AGTP-COMMUNICATION sessions are likely to include latency-sensitive real-time media and apply appropriate QoS handling.¶
AGTP's attribution model applies to Communication Sessions: every session establishes attribution chains, and attribution records are produced for session lifecycle events.¶
Media content within streams is not, by default, recorded by the protocol. Recording is an application-layer decision made by governance frameworks or specific deployments. AGTP-COMMUNICATION provides the session-level attribution that recording systems can build on; it does not itself perform recording.¶
When recording is performed at the application layer, the attribution records produced by AGTP-COMMUNICATION provide verifiable evidence of session participants, authority scope, and session lifecycle that supports compliance with recording-relevant regulations.¶
Real-time communication on AGTP inherits AGTP's security properties: transport encryption (TLS 1.3 or QUIC), agent identity verification, and authority scope enforcement at the protocol layer.¶
Additional security considerations specific to communication:¶
Audio and video streams MUST NOT be replayable across sessions without the cryptographic markers that identify them as recordings. Session identifiers, timestamps, and attribution records carried with streams enable verification that media was captured in the context the recipient believes.¶
Communication Sessions may involve sensitive content (private conversations, confidential video, sensor data with privacy implications). AGTP's wire-level identity verification and attribution provide the structural facts that privacy frameworks require. Application-layer privacy controls build on these foundations.¶
Real-time communication can be used to consume substantial bandwidth and processing resources. Communication Endpoints SHOULD implement appropriate rate limits and resource controls. Authority-Scope can include resource limitations that the protocol enforces at session establishment.¶
This document defines several new headers and parameters that require IANA registration:¶
communication:audio:*, communication:video:*,
communication:bilateral)¶
Specific registry assignments will be detailed in a future revision once the AGTP header and scope token registries are established.¶
Several design decisions remain open for this revision:¶
These will be addressed in future revisions of this draft based on community feedback and implementation experience.¶
This document builds on the broader AGTP family and incorporates architectural principles from established real-time media work including RTP/RTCP [RFC3550] and WebRTC [RFC8825].¶
Contributors will be acknowledged in future revisions as community participation develops.¶