Internet-Draft AGTP-COMMUNICATION May 2026
Hood Expires 19 November 2026 [Page]
Workgroup:
Network Working Group
Internet-Draft:
draft-hood-agtp-communication-00
Published:
Intended Status:
Informational
Expires:
Author:
C. Hood
Nomotic, Inc.

AGTP Communication Protocol

Abstract

This document specifies the AGTP Communication Protocol (AGTP-COMMUNICATION): the companion specification for real-time multi-modal communication between agents over the Agent Transfer Protocol (AGTP). AGTP-COMMUNICATION defines how voice, video, and other real-time media streams are exchanged between agents on the agent-native substrate, with native support for the wire-level identity, authority scope, and attribution that AGTP provides.

This is an early specification covering bilateral (two-agent) real-time communication. Multi-party conversations and conferencing patterns are out of scope for this revision and are deferred to future companion work.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on 19 November 2026.

Table of Contents

1. Introduction

The Agent Transfer Protocol (AGTP) [AGTP] defines a dedicated protocol substrate for agent-to-agent and agent-to-API communication. AGTP carries agent identity, authority scope, attribution records, and intent-aligned methods at the wire level, with traffic structurally identified as agent traffic by the protocol itself.

Agent communication is increasingly multi-modal. Agents communicate through voice when speaking to humans or to other voice-capable agents. Agents communicate through video when participating in visual interactions, screen sharing, or visual data exchange. Agents communicate through structured data streams for sensor data, telemetry, and continuous information flows. These real-time communication patterns require protocol-level support distinct from the request/response patterns AGTP's base methods address.

This document specifies how real-time multi-modal communication runs on AGTP. The design reuses established real-time media patterns where appropriate (drawing on the architectural principles of [RFC3550] and [RFC7656]) and defines only what is specific to agent-native communication on the AGTP substrate.

1.1. Relationship to AGTP-SESSION

AGTP-SESSION [AGTP-SESSION] defines session establishment, lifecycle, and basic message exchange semantics on AGTP. AGTP-COMMUNICATION builds on AGTP-SESSION: real-time communication sessions are established through AGTP-SESSION's ESTABLISH method, with media-specific parameters negotiated as part of session setup.

1.2. Scope of This Document

In scope:

  • Bilateral real-time audio communication between agents
  • Bilateral real-time video communication between agents
  • Multi-modal exchange (audio plus video, structured data alongside media)
  • Codec negotiation and media format selection
  • Real-time media framing on AGTP transport
  • Quality of service handling at the AGTP layer
  • Integration with AGTP-SESSION for session lifecycle

Out of scope for this revision:

  • Multi-party conversations (three or more agents)
  • Conferencing patterns (mixers, SFUs, broadcast)
  • Recording and replay protocols
  • Voice-specific applications (telephony, IVR patterns)
  • Domain-specific conversational AI patterns

1.3. Conventions and Terminology

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.

2. Terminology

Communication Session:
An AGTP-SESSION established for real-time multi-modal communication between two agents, with media parameters negotiated during session establishment.
Media Stream:
A unidirectional flow of real-time media data within a Communication Session. A bilateral Communication Session typically carries two media streams (one in each direction) per modality.
Modality:
A category of real-time media. This specification addresses audio, video, and structured data modalities. Future revisions may address additional modalities.
Codec:
An encoding format for media data, negotiated between communicating agents during session establishment.
Communication Endpoint:
An AGTP-aware agent participating in a Communication Session. Identified by its canonical Agent-ID and carrying authority scope appropriate to the communication being undertaken.

3. Architectural Model

AGTP-COMMUNICATION extends AGTP's request/response model with real-time streaming semantics. The architectural model has three components.

3.1. Session Layer

Communication Sessions are established using AGTP-SESSION's ESTABLISH method with communication-specific parameters. The session carries the agent identity, authority scope, and attribution chain that apply throughout the communication.

Session establishment for communication is more involved than session establishment for request/response: media parameters must be negotiated, codecs agreed, and stream characteristics established before media can flow.

3.2. Media Layer

Media streams carry real-time data between Communication Endpoints. Each stream has a defined modality (audio, video, or structured data), a negotiated codec, and timing characteristics appropriate to its modality.

Media streams are framed for transport over AGTP. The framing preserves the timing and sequencing properties that real-time media requires while carrying the AGTP wire-level facts (identity, attribution) on each frame.

3.3. Control Layer

Control messages within a Communication Session manage stream lifecycle: opening streams, modifying parameters, handling quality degradation, and closing streams. Control messages use AGTP methods within the established session context.

4. Communication Session Establishment

Communication Sessions are established through AGTP-SESSION's ESTABLISH method with the communication capability declared.

4.1. ESTABLISH Request

A Communication Endpoint initiates a session by issuing ESTABLISH with a communication intent declaration:

ESTABLISH /sessions HTTP/AGTP/1.0
Agent-ID: <canonical Agent-ID>
Authority-Scope: communication:bilateral
Session-Intent: communication
Communication-Modalities: audio, video
Audio-Codecs: opus, g722
Video-Codecs: vp9, av1
Content-Type: application/agtp+json

The Communication-Modalities header declares which modalities the initiator wishes to use. The Audio-Codecs and Video-Codecs headers declare codecs the initiator supports, in order of preference.

4.2. ESTABLISH Response

The receiving Communication Endpoint responds with the negotiated parameters or rejects the session:

HTTP/AGTP/1.0 200 OK
Agent-ID: <canonical Agent-ID>
Session-ID: <session identifier>
Communication-Modalities: audio, video
Audio-Codec: opus
Video-Codec: vp9
Stream-Parameters: <negotiated stream parameters>

Successful establishment returns 200 with the negotiated parameters. Rejection returns appropriate AGTP status codes (451 Scope Violation for authority-scope issues, 463 Proposal Rejected for parameter mismatch, 503 Service Unavailable for capacity limitations).

4.3. Authority Scope Considerations

Communication Sessions carry significant authority implications. A session that includes audio capture and transmission grants the initiating agent the ability to capture and transmit audio for the session duration. Authority-Scope MUST include appropriate permissions for each modality:

  • communication:audio:capture for capturing audio
  • communication:audio:transmit for transmitting audio
  • communication:video:capture for capturing video
  • communication:video:transmit for transmitting video
  • communication:bilateral as a shorthand combining standard bilateral capture and transmission

Receivers MUST validate that the initiator's Authority-Scope includes appropriate permissions for the requested modalities.

5. Media Stream Semantics

Media streams within a Communication Session carry real-time data with timing, sequencing, and quality requirements appropriate to their modality.

5.1. Audio Streams

Audio streams carry audio media between Communication Endpoints. Audio framing follows established real-time audio practice with adaptation for AGTP transport:

  • Frames carry timestamp information for synchronization
  • Sequence numbers detect loss and reordering
  • Frame size is negotiated during session establishment
  • Codec-specific parameters (sample rate, channels) are negotiated

AGTP-COMMUNICATION reuses RTP timestamp and sequence semantics [RFC3550] where compatible, adapted for transport on AGTP rather than UDP. This preserves established real-time audio handling while gaining AGTP's wire-level identity and attribution properties.

5.2. Video Streams

Video streams carry video media between Communication Endpoints. Video framing addresses the additional complexity of variable frame sizes, key frame management, and bandwidth adaptation:

  • Frames carry timestamp and sequence information
  • Frame type (key/delta) is indicated
  • Codec-specific parameters (resolution, frame rate) are negotiated
  • Bandwidth adaptation signals are exchanged through control messages

5.3. Structured Data Streams

Structured data streams carry continuous data flows that are not audio or video: sensor telemetry, conversational state updates, real-time analytics, contextual data alongside other media.

Structured data streams have different real-time characteristics than audio or video. Timing may matter (sensor sampling rates) or may not (state updates). Loss tolerance varies by use case. Structured data stream parameters are negotiated during session establishment.

6. Quality of Service

Real-time communication has quality requirements that AGTP must support at the transport layer. AGTP-COMMUNICATION specifies quality of service handling appropriate to each modality.

6.1. Latency Requirements

Audio communication typically requires latency under 150ms for natural conversational flow. Video communication tolerates higher latency but synchronization between audio and video is critical. Structured data streams have application-specific latency requirements.

When AGTP runs over QUIC [RFC9000], the underlying transport supports multiple streams with independent flow control, which enables appropriate handling of different modality requirements within a single Communication Session.

6.2. Bandwidth Adaptation

Communication Endpoints MUST be capable of adapting media parameters in response to bandwidth constraints. Control messages within a Communication Session signal:

  • Bandwidth estimates from the receiving endpoint
  • Requested adaptations from the sending endpoint
  • Confirmation of parameter changes

Bandwidth adaptation is negotiated; both endpoints participate in the decision to adapt.

6.3. Priority Within AGTP

AGTP traffic on port 4480 SHOULD be treated with priority appropriate to its modality at the transport layer. Real-time audio and video streams require lower latency than request/response traffic; structured data streams may have varying requirements.

Network operators carrying AGTP traffic SHOULD consider that AGTP-COMMUNICATION sessions are likely to include latency-sensitive real-time media and apply appropriate QoS handling.

7. Attribution and Recording

AGTP's attribution model applies to Communication Sessions: every session establishes attribution chains, and attribution records are produced for session lifecycle events.

Media content within streams is not, by default, recorded by the protocol. Recording is an application-layer decision made by governance frameworks or specific deployments. AGTP-COMMUNICATION provides the session-level attribution that recording systems can build on; it does not itself perform recording.

When recording is performed at the application layer, the attribution records produced by AGTP-COMMUNICATION provide verifiable evidence of session participants, authority scope, and session lifecycle that supports compliance with recording-relevant regulations.

8. Security Considerations

Real-time communication on AGTP inherits AGTP's security properties: transport encryption (TLS 1.3 or QUIC), agent identity verification, and authority scope enforcement at the protocol layer.

Additional security considerations specific to communication:

8.1. Media Capture Authorization

Agents that capture audio or video MUST have appropriate Authority-Scope. This is enforced at session establishment. Capture without scope is a 451 Scope Violation.

8.2. Replay and Tampering

Audio and video streams MUST NOT be replayable across sessions without the cryptographic markers that identify them as recordings. Session identifiers, timestamps, and attribution records carried with streams enable verification that media was captured in the context the recipient believes.

8.3. Privacy Considerations

Communication Sessions may involve sensitive content (private conversations, confidential video, sensor data with privacy implications). AGTP's wire-level identity verification and attribution provide the structural facts that privacy frameworks require. Application-layer privacy controls build on these foundations.

8.4. Denial of Service

Real-time communication can be used to consume substantial bandwidth and processing resources. Communication Endpoints SHOULD implement appropriate rate limits and resource controls. Authority-Scope can include resource limitations that the protocol enforces at session establishment.

9. IANA Considerations

This document defines several new headers and parameters that require IANA registration:

Specific registry assignments will be detailed in a future revision once the AGTP header and scope token registries are established.

10. Open Questions

Several design decisions remain open for this revision:

These will be addressed in future revisions of this draft based on community feedback and implementation experience.

11. References

11.1. Normative References

[AGTP]
Hood, C., "Agent Transfer Protocol (AGTP)", Work in Progress, Internet-Draft, draft-hood-independent-agtp-07, , <https://datatracker.ietf.org/doc/html/draft-hood-independent-agtp-07>.
[AGTP-SESSION]
Hood, C., "AGTP Session Protocol", Work in Progress, Internet-Draft, draft-hood-agtp-session-00, , <https://datatracker.ietf.org/doc/html/draft-hood-agtp-session-00>.
[RFC2119]
Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, , <https://www.rfc-editor.org/info/rfc2119>.
[RFC3550]
Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson, "RTP: A Transport Protocol for Real-Time Applications", STD 64, RFC 3550, DOI 10.17487/RFC3550, , <https://www.rfc-editor.org/info/rfc3550>.
[RFC8174]
Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, , <https://www.rfc-editor.org/info/rfc8174>.
[RFC8825]
Alvestrand, H., "Overview: Real-Time Protocols for Browser-Based Applications", RFC 8825, DOI 10.17487/RFC8825, , <https://www.rfc-editor.org/info/rfc8825>.

11.2. Informative References

[RFC7656]
Lennox, J., Gross, K., Nandakumar, S., Salgueiro, G., and B. Burman, "A Taxonomy of Semantics and Mechanisms for Real-Time Transport Protocol (RTP) Sources", RFC 7656, DOI 10.17487/RFC7656, , <https://www.rfc-editor.org/info/rfc7656>.
[RFC9000]
Iyengar, J., Ed. and M. Thomson, Ed., "QUIC: A UDP-Based Multiplexed and Secure Transport", RFC 9000, DOI 10.17487/RFC9000, , <https://www.rfc-editor.org/info/rfc9000>.

Acknowledgments

This document builds on the broader AGTP family and incorporates architectural principles from established real-time media work including RTP/RTCP [RFC3550] and WebRTC [RFC8825].

Contributors

Contributors will be acknowledged in future revisions as community participation develops.

Author's Address

Chris Hood
Nomotic, Inc.