Bounded Capabilities for Agent Tool Interfaces: Problem Statement

Internet-Draft	Bounded Capabilities PS	July 2026
Pelov	Expires 4 January 2027	[Page]

Abstract

Deployed agent tool-interface protocols carry JSON Schema type declarations for tools, but a type declaration is not a contract: nothing signals that a tool is fully schema-bounded, conformance to declared schemas is self-certified by the declaring party, declarations are not pinned between discovery time and invocation time, and error channels are untyped by design. As a consequence, any decision made about a tool call — authorization, discovery, audit, or composition — requires a language model to interpret what the call means, even for the large class of tools that are not intrinsically open-ended. This document states that problem and poses questions for the community. It deliberately proposes no mechanism.¶

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶

This Internet-Draft will expire on 4 January 2027.¶

1. Introduction

Software agents driven by language models invoke external capabilities — tools — through interface protocols such as the Model Context Protocol [MCP]. These protocols were designed for model-mediated use: a model reads a tool's name, prose description, and type declarations, decides whether and how to call it, and interprets the result. For that purpose they work well, and their flexibility is a design goal, not a defect.¶

A different set of actors, however, must also make decisions about tool calls: authorization policies, discovery and matching systems, audit pipelines, and composition frameworks that connect one capability's output to another's input. For these actors the question is not "what does this call probably mean?" but "is this call permitted, well-formed, and consistent with what was reviewed?" — a question that should have a decidable answer wherever the underlying tool's interface boundary is complete.¶

Today it does not have one. This document examines why (Section 3), distinguishes two classes of capabilities with fundamentally different properties (Section 4), traces the consequences (Section 5), and poses open questions (Section 6). It makes no protocol proposal. The author believes that at this stage the question, not any particular design, is the contribution.¶

2. Conventions and Terminology

This document is informational and makes no normative statements. The key words "MUST" and "SHOULD" appear only inside direct quotations from other documents.¶

Capability:: A discrete operation an agent can invoke over a protocol: a tool, function, or API endpoint, together with its interface description.¶
Declaration:: Machine-readable interface metadata published by the party offering a capability (for example, a JSON Schema [I-D.bhutton-json-schema] for the capability's input).¶
Contract:: A declaration plus commitments: a stable identity for the declared interface, a defined conformance relationship between declaration and observed behavior, and defined consequences when behavior diverges.¶
Bounded capability:: A capability whose input, output, and error surfaces are fully described by schemas under a contract, such that whether a given invocation or result conforms is decidable without interpretation. Boundedness is a property of the declared interface, not of the external system: it does not imply that the operation or its side effects are deterministic, only that the declared input, success output, error outcomes, and relevant operational properties form a complete, versioned boundary that can be checked at runtime.¶
Dynamic capability:: A capability whose interface is intentionally open-ended (for example, natural-language input or output), such that interpreting an invocation inherently requires a model or a human.¶
Drift:: Divergence between a capability's observed behavior and its declared interface, or a change to the declaration itself after a relying party has made decisions based on it.¶

3. What Deployed Protocols Declare Today

This section uses MCP [MCP] as the concrete example because it is the most widely deployed agent tool-interface protocol at the time of writing. The observations are not criticisms of MCP relative to its own goals; they identify what a relying party can and cannot conclude from a conformant implementation.¶

3.1. Type Declarations Exist

Every MCP tool carries a required inputSchema (a JSON Schema object), and since the 2025-06-18 revision a tool may carry an optional outputSchema describing structured results. When an output schema is present, the specification states that "Servers MUST provide structured results that conform to this schema" and that "Clients SHOULD validate structured results against this schema" [MCP]. The type-declaration slot therefore exists. The gaps are in everything a contract would add on top of a declaration.¶

3.2. Gap 1: Boundedness Is Unsignaled

A tool whose inputSchema is {"type": "object"} — a form the specification explicitly sanctions — with no outputSchema and prose-only semantics is fully conformant. So is a tool with strict, closed schemas on both sides. Nothing in the protocol distinguishes them: there is no way for a provider to declare, or a client to require, that a capability is fully schema-bounded. A client can inspect what is present, but it cannot tell whether missing information means "not applicable" or simply "unspecified". A policy engine inspecting a tool list must therefore assume the lax case for every tool, which forces a model into the judgment loop even where the underlying operation is predictable.¶

3.3. Gap 2: Conformance Is Self-Certified

The party that declares the schemas is the party whose behavior the schemas describe. Validation of results against the declared output schema is an optional, client-side, runtime activity ("Clients SHOULD validate"). No third party attests that a capability behaves as declared, no evidence of conformance is accumulated or carried anywhere, and a relying party has no basis for trusting a declaration beyond trusting the declarer.¶

3.4. Gap 3: Declarations Are Not Pinned

The tool list — schemas included — may change at any time during a session; servers that declare the listChanged capability "SHOULD" notify clients when it does [MCP]. Declarations carry no version identifier or digest. There is consequently no artifact linking the schema a client (or a human reviewer, or a policy engine) authorized against at discovery time to the schema in force at call time. This is a time-of-check to time-of-use gap at the protocol level: an authorization decision cannot be bound to the interface it was made about.¶

3.5. Gap 4: Errors Are Untyped by Design

Tool execution errors are reported as a result flagged isError: true whose content is text — in the specification's words, "actionable feedback that language models can use to self-correct" [MCP]. There is no error schema field, and no place to declare stable error codes, retryability, partial success, or side effects already performed — the operational properties that failure-handling policy actually needs. The failure path — where authorization, retry policy, and audit matter most — is thus exactly the path where machine interpretation is mandatory for every tool, with no opt-out even for capabilities whose failure modes are enumerable.¶

4. Two Classes of Capabilities

A large fraction of the tools agents call in practice are not intrinsically open-ended: they accept structured input, perform a predictable operation (an HTTP API call, a database query, a computation), and return structured output. Their descriptions are translations of interfaces that were fully typed in their original setting. For these, the open-endedness of the agent tool interface is an artifact of the transport, not a property of the capability: the dynamism lives in the calling agent, not in the tool.¶

To be precise about the claim: calling such a capability bounded does not mean the external system or its side effects become deterministic. It means the declared input, success output, error outcomes, and relevant operational properties form a complete, versioned boundary that can be checked at runtime (Section 2). The effort then splits naturally into design-time bindings (which capability, under which contract revision, reviewed and authorized by whom) and runtime bindings (whether this call and this result conform to what was authorized) — a split for which current protocols provide no vocabulary, since nothing carries a decision from the first phase into the second.¶

Other capabilities are genuinely dynamic — search over unstructured corpora, natural-language question answering, generation — and for these, prose semantics and model interpretation are essential, not incidental.¶

Current protocols conflate the two classes. The cost of the conflation is asymmetric: dynamic capabilities lose nothing, while capabilities with completable boundaries lose the decidability they would otherwise support. Every relying party inherits the worst case.¶

The question this document poses is whether the classes should be distinguishable: whether there is value in a declared bounded capability class — capabilities that commit to complete, machine-checkable contracts covering input, output, and error surfaces — coexisting with the dynamic class rather than replacing it.¶

5. Consequences of the Conflation

Authorization:: A policy cannot be written over opaque strings. Where the interface carries no enforceable structure, an authorization decision requires a model to interpret the call, making the decision probabilistic and non-reproducible: two independent policy enforcement points can legitimately reach different conclusions about the same call. This runs opposite to the direction of authorization work elsewhere, such as Rich Authorization Requests [RFC9396], which moves authorization data toward typed, structured objects precisely so that decisions are explicit and reviewable.¶
Discovery:: Matching a need to a capability by prose description requires semantic interpretation, with the associated failure modes (including adversarial ones: a description is an untrusted input to the model that reads it). Typed contracts would allow a compatibility relation between capabilities to be computed rather than guessed.¶
Audit:: An audit trail of prose requests and prose results can only be re-interpreted, not re-validated. Schema-valid records with stable contract identities can be checked mechanically, long after the fact, against the contract that was in force.¶
Composition:: Connecting one capability's output to another's input is the basic operation of multi-step agent workflows. Without typed edges, every handoff is mediated by a model; with them, a large subclass of handoffs becomes a verifiable transformation.¶

6. Open Questions

This document poses the following questions without proposing answers.¶

Class declaration: Is there community interest in a declared bounded-capability class — schemas for input, output, and errors, with a defined conformance relationship — as an opt-in stratum within existing agent tool-interface protocols?¶
Contract identity: Should capability declarations carry stable identities (for example, digests over a canonical form such as JCS [RFC8785] applied to I-JSON [RFC7493] documents), so that a decision made at discovery time can be bound to the declaration it was made about and remain checkable at call time?¶
Attestation: Who attests that a capability conforms to its declared contract, on what evidence, and in what format? Are transparency mechanisms such as SCITT [RFC9943] an appropriate carriage for such statements?¶
Drift: How is post-deployment divergence between declared and observed behavior detected, reported, and acted upon — and should fail-closed handling be the default for capabilities that claimed the bounded class?¶
Composition boundary: What happens at the seam where bounded and dynamic components compose? Which guarantees survive the crossing, and how is the crossing itself made visible to policy?¶
Venue: Are these questions in scope for prospective agent-protocol work in the IETF, for existing authorization and identity work (OAuth, WIMSE), for a research group, or do they fall between venues — which would itself be a finding worth establishing?¶

7. Relationship to Existing Work

The Semantic Definition Format [RFC9880] addresses an adjacent problem for the Internet of Things: giving Things typed, reusable interaction descriptions. It demonstrates that the IETF has found typed capability description worth standardizing in a neighboring domain, but it does not address the agent-specific gaps described in Section 3: class signaling, attestation, pinning between discovery and invocation, and typed error surfaces in a model-mediated call path.¶

Rich Authorization Requests [RFC9396] and the SCITT architecture [RFC9943] are cited above as evidence that structured authorization data and supply-chain-style attestation, respectively, are live directions in the IETF that a bounded-capability class could build on.¶

The author has prototyped a governance layer along the lines sketched by the questions above — digest-pinned capability contracts over [RFC7493]/[RFC8785] canonical records, schema validation on both sides of an invocation, a restricted deterministic transformation language between capabilities, and fail-closed handling of contract drift — and can report that the bounded class is implementable with existing building blocks. Details are available from the author; the prototype is mentioned here only as an existence argument, not as a proposal.¶

8. Security Considerations

This entire document describes a security problem, summarized here.¶

Authorization decisions that depend on model interpretation of unstructured interface data are probabilistic and non-reproducible; they cannot serve as the basis of an auditable policy regime. The absence of pinned declarations (Section 3, Gap 3) creates a time-of-check to time-of-use exposure in every current deployment: the interface a human or policy engine approved is not verifiably the interface later invoked. Untyped error channels (Gap 4) route the highest-stakes control decisions — failure handling — through mandatory interpretation of attacker-influenceable text.¶

A bounded-capability class would narrow, not eliminate, the attack surface: it addresses interface-level decidability only. Content-level attacks (for example, injection through data a capability legitimately returns) and the behavior of the models themselves are out of scope for the questions posed here.¶

10. Informative References

[I-D.bhutton-json-schema]: Wright, A., Andrews, H., Hutton, B., and G. Dennis, "JSON Schema: A Media Type for Describing JSON Documents", Work in Progress, Internet-Draft, draft-bhutton-json-schema-01, 10 June 2022, <https://datatracker.ietf.org/doc/html/draft-bhutton-json-schema-01>.
[MCP]: Model Context Protocol contributors, "Model Context Protocol Specification, revision 2025-11-25", November 2025, <https://modelcontextprotocol.io/specification/2025-11-25>.
[RFC7493]: Bray, T., Ed., "The I-JSON Message Format", RFC 7493, DOI 10.17487/RFC7493, March 2015, <https://www.rfc-editor.org/rfc/rfc7493>.
[RFC8785]: Rundgren, A., Jordan, B., and S. Erdtman, "JSON Canonicalization Scheme (JCS)", RFC 8785, DOI 10.17487/RFC8785, June 2020, <https://www.rfc-editor.org/rfc/rfc8785>.
[RFC9396]: Lodderstedt, T., Richer, J., and B. Campbell, "OAuth 2.0 Rich Authorization Requests", RFC 9396, DOI 10.17487/RFC9396, May 2023, <https://www.rfc-editor.org/rfc/rfc9396>.
[RFC9880]: Koster, M., Ed., Bormann, C., Ed., and A. Keränen, "Semantic Definition Format (SDF) for Data and Interactions of Things", RFC 9880, DOI 10.17487/RFC9880, January 2026, <https://www.rfc-editor.org/rfc/rfc9880>.
[RFC9943]: Birkholz, H., Delignat-Lavaud, A., Fournet, C., Deshpande, Y., and S. Lasker, "An Architecture for Trustworthy and Transparent Digital Supply Chains", RFC 9943, DOI 10.17487/RFC9943, June 2026, <https://www.rfc-editor.org/rfc/rfc9943>.