AFiR: Post-Quantum Signed Inference Receipts as a TEE-Free Profile for IETF SPICE Inference Chain

Introduction

The Deployment Gap in the SPICE Inference Chain The SPICE Inference Chain defines two proof types for computational provenance:

ZKML proofs: mathematically certain, but proof generation takes minutes to hours per inference and is currently limited to models of approximately 100 million parameters or fewer.
TEE attestation: production-scale and real-time, but requires specific hardware (Intel TDX, AMD SEV-SNP, NVIDIA H100 Confidential Computing) and manufacturer PKI dependencies. Most serverless inference environments do not expose TEE primitives to the application layer.

The practical effect is that the SPICE Inference Chain, as currently defined, cannot be adopted in commodity cloud environments (serverless functions, container-based inference runtimes, shared GPU pools) without either accepting ZKML latency incompatible with real-time serving, or deploying specialized hardware unavailable in most production inference clouds. This leaves the majority of production AI inference volume outside the scope of any SPICE-conformant inference attestation.

AFiR Approach AFiR addresses this gap by defining a third proof type: post-quantum digital signature attestation using ML-DSA-65 (NIST FIPS 204 ). A post-quantum signature attestation makes the following proof statement: "Agent A, at timestamp T, signed a commitment over (input_hash, output_hash, model_id, tool_name, session_id) using ML-DSA-65 with key K. Key K is registered and publicly verifiable. The signature is unforgeable under standard lattice hardness assumptions (Module Learning With Errors, MLWE). A cryptographic receipt anchored on Base Mainnet via USDC provides a tamper-evident timestamp independent of any single party's infrastructure." This proof type does not require:

Specialized hardware (no TEE, no GPU confidential compute)
Proof generation delay (signing is 0.785ms per fragment)
Trust in a hardware manufacturer's PKI
Any changes to the inference runtime or model serving stack

AFiR is in production as of June 2026, operating on serverless infrastructure. All five primitives defined in this document are deployed, smoke-tested, and serving live traffic.

Relationship to Existing SPICE Drafts This document is a companion to, not a replacement of:

: defines the inference chain Merkle structure and ZKML/TEE proof types. AFiR adds a third proof type to this framework.
: AFiR's P1 (Signed Tool Calls) extends the actor chain by adding per-tool-invocation receipts at the tool execution layer.
: AFiR's P3 (KV Cache Signing) addresses a gap not covered by the intent chain: provenance of cached token prefixes served from distributed KV stores.

AFiR receipt entries are structurally compatible with the SPICE inference chain Merkle tree and MAY coexist with ZKML and TEE entries in the same session's inference chain.

Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 when, and only when, they appear in all capitals, as shown here.

AFiR Receipt:: A signed record produced by the AFiR signing layer before an inference output propagates to the next stage. Contains input commitment, output commitment, model identity, timestamp, nullifier, and a post-quantum digital signature.
Nullifier:: A unique, non-reusable identifier bound to each AFiR receipt, preventing replay of a valid receipt against a different output.
On-Chain Anchor:: A transaction on Base Mainnet containing the Merkle root of a session's inference chain, providing a tamper-evident timestamp independent of any single operator's infrastructure.
ML-DSA-65:: Module Lattice-based Digital Signature Algorithm, security parameter set 65, as defined in NIST FIPS 204 . Post-quantum secure under MLWE hardness assumptions.
Fragment:: The smallest unit of inference output for which an AFiR receipt is produced. In streaming inference, a fragment is a single generation step. In non-streaming inference, a fragment is the complete response.
KV Cache Prefix:: The cached key-value state from prior turns in a multi-turn conversation or agentic session, reused by the inference engine to avoid recomputing attention over prior tokens.

The AFiR Proof Type

Algorithm: ML-DSA-65 (NIST FIPS 204) AFiR uses ML-DSA-65 as its primary signature algorithm. ML-DSA-65 is the NIST-standardized post-quantum digital signature algorithm (FIPS 204, August 2024), providing:

Security level: NIST Level 3 (approximately 128-bit classical security, quantum-secure under MLWE)
Signature size: 3309 bytes
Public key size: 1952 bytes
Signing time: under 1ms on commodity hardware
Verification time: under 1ms on commodity hardware

The signed message for each AFiR receipt is the SHA-256 hash of the canonical JSON serialization of the receipt payload fields: input_hash, output_hash, model_id, model_fingerprint, tool_name (if applicable), session_id, iat, nullifier.

Performance Characteristics AFiR measured performance on commodity serverless infrastructure (2026):

Signing overhead per fragment: 0.785ms
End-to-end median wall latency: 241ms
On-chain receipt anchoring: approximately 7ms (Base Mainnet via USDC)
Throughput cost vs. baseline: 98.5% cheaper (tiered routing)
Speed vs. prior signing approach: 6.1x faster (223ms vs 1,369ms P50 wall-clock)

These measurements are from production traffic and represent the overhead of the complete AFiR signing pipeline including on-chain anchoring.

On-Chain Anchoring AFiR anchors the Merkle root of each session's inference chain on Base Mainnet via a USDC transfer carrying the root hash as calldata. This provides:

Tamper-evident timestamp from a public, decentralized ledger
Independence from any single operator's infrastructure
Permanent, publicly auditable record of the session root
Approximately 7ms latency from signing to on-chain confirmation

The on-chain anchor does not contain individual receipt payloads. Per-entry proof retrieval uses the inference registry URI, following the same architecture as defined in Section 5.

AFiR Entry Structure

Common Fields (SPICE-Compatible) AFiR entries include all REQUIRED common fields from Section 4.1. The entry type value is afir_pq_signature.

AFiR-Specific Fields

input_hash:: SHA-256 hash of the inference input (prompt or tool call parameters).
nullifier:: Unique non-reusable identifier for this receipt. Format: hex string, 32 bytes.
algorithm:: Signature algorithm used. One of: "ML-DSA-65" (primary, post-quantum), "ML-DSA-44" (compact, post-quantum), "Ed25519" (classical, transition support), "SLH-DSA" (reserved, FIPS 205), "FN-DSA" (reserved, FIPS 206).
public_key_hint:: First 16 bytes (hex) of the signing public key, for key disambiguation without transmitting the full key inline.
receipt_chain:: URI of the AFiR inference registry partition for this session.
on_chain_anchor:: Base Mainnet transaction hash containing the session Merkle root. OPTIONAL at entry level; REQUIRED in the token's inference_registry response for completed sessions.
phase:: For P1 (Signed Tool Calls): "before" or "after", indicating whether the receipt was produced before or after tool execution.

Full Entry Example The following is an example AFiR inference chain entry for a signed tool call (P1, before phase):

Five Signing Primitives AFiR ships five production primitives, each corresponding to a distinct layer of the AI inference stack.

P1 -- Signed Tool Calls Endpoints: POST /v1/afir/tool/sign and POST /v1/afir/tool/verify P1 produces a before-and-after receipt for every MCP or Agent-to-Agent (A2A) tool invocation. The "before" receipt is produced before the tool executes, binding: tool_name, tool_version, input_hash, model_id, session_id, parent_receipt_nullifier, iat. The "after" receipt is produced after the tool returns, binding: output_hash, tool_exit_status, latency_ms, parent_receipt_nullifier (the nullifier of the "before" receipt), iat. The nullifier chain from before to after ensures that a tool call receipt cannot be detached from its corresponding response receipt, and that replay of a valid before-receipt against a different tool response is detectable. P1 directly addresses the unsigned tool invocation vulnerability class present in MCP deployments. The AFiR signing sidecar intercepts the call before the MCP transport layer, requiring no changes to MCP server implementations.

P2 -- Cross-Agent Receipt Trees Endpoint: POST /v1/afir/tree/build P2 implements the inference chain Merkle tree architecture defined in using AFiR receipt entries as leaf nodes. When Agent A calls Agent B which calls Agent C, P2 builds a Merkle tree across all receipts produced in the session. The root hash is the inference_root included in the OAuth token. P2 is the AFiR reference implementation of the inference_root claim defined in Section 5.3. It is deployed and serving production traffic as of June 2026.

P3 -- KV Cache Signing Endpoint: POST /v1/afir/cache/sign P3 addresses a provenance gap not covered by the intent chain or the existing inference chain draft: the attestation of cached token prefixes served from distributed KV stores. In production agentic deployments using disaggregated prefill architectures, KV cache hit rates exceeding 90% have been measured. This means the majority of tokens served to the model in high-cache-hit deployments have no provenance attestation. P3 signs each KV cache entry at write time and validates the signature at read time before cached tokens are injected into the model's context. If a cached prefix does not match its receipt on retrieval, the request MUST fail before the prefix is injected into the model's context.

P4 -- Model Manifest Endpoints: POST /v1/afir/manifest/publish and GET /v1/afir/manifest/{nullifier} P4 provides TEE-free attestation of which model, which weights, and which quantization configuration served a given request. A Model Manifest is a signed document binding: model_id, model_fingerprint (SHA-256 of model weights plus architecture), quantization, serving_runtime, infrastructure, iat, and nullifier. The Model Manifest nullifier is included in all subsequent AFiR receipt entries produced during a session, creating a binding between every inference receipt and the specific model configuration that produced it. P4 addresses the Model Masquerading attack class identified in Section 1.1 without requiring TEE hardware. The trust basis is the operator's key management rather than hardware isolation. P4 is therefore appropriate for environments where TEE is unavailable, with this distinction explicitly understood.

P5 -- Crypto-Agile Signature Layer Endpoints: POST /v1/afir/sign and GET /v1/afir/algorithms P5 implements a crypto-agile signing endpoint supporting multiple post-quantum and classical signature algorithms under a single API surface. The algorithm is specified per-request and recorded in the receipt entry, making receipts from different algorithm generations cross-verifiable via the Merkle structure. P5 Supported Algorithms

Algorithm	Status	Standard	Notes
ML-DSA-65	Active	NIST FIPS 204	Primary, post-quantum
ML-DSA-44	Active	NIST FIPS 204	Compact, post-quantum
Ed25519	Active	RFC 8032	Classical, transition support
SLH-DSA	Reserved	NIST FIPS 205	Planned
FN-DSA	Reserved	NIST FIPS 206	Planned

Algorithm negotiation follows the same model as TLS cipher suite negotiation. When a customer needs to upgrade from ML-DSA-65 to a future algorithm, they change a single configuration field. Prior receipts remain verifiable under their original algorithm.

Merkle Tree Compatibility AFiR receipt entries are structurally compatible with the SPICE inference chain Merkle tree defined in Section 5.2. Leaf nodes are SHA-256 hashes of canonically serialized AFiR receipt entries (JSON Canonicalization Scheme ). The Merkle tree construction algorithm is identical to that defined in Section 5.3. The resulting inference_root is included in the OAuth token using the claim structure defined in Section 5.3, with inference_proof_type set to afir_ml_dsa_65 (see ).

Token Structure A token carrying an AFiR inference chain follows the full Truth Stack structure defined in Section 6, with inference_proof_type set to an AFiR algorithm identifier:

Tiered Verification with AFiR AFiR extends the tiered verification strategy from Section 7.4: AFiR Tiered Verification

Risk Level	Actor Chain	Intent Chain	Inference Chain
Low	Sync	Skip	Skip
Medium	Sync	Cached proof	AFiR signature check (<1ms)
High	Sync	Full	AFiR + on-chain anchor (~7ms)
Critical	Sync	Full	AFiR + on-chain + ZKML/TEE

Coexistence with ZKML and TEE Entries AFiR entries and ZKML/TEE entries MAY coexist in the same inference chain. The SPICE Inference Chain Merkle tree is agnostic to the proof type of individual entries; the root hash covers all entries regardless of type. Verifiers MUST check the "type" field of each entry and apply the verification procedure appropriate to that type. This is useful for deployments that use AFiR for real-time signing during inference and generate ZKML proofs asynchronously for high-value operations, or that run some agents on TEE-equipped hardware and others on commodity infrastructure.

Security Considerations

Post-Quantum Security Basis ML-DSA-65 is secure under the hardness of the Module Learning With Errors (MLWE) problem, which is believed to be hard for both classical and quantum computers. NIST standardized ML-DSA-65 in FIPS 204 (August 2024) following an eight-year public evaluation process. The security basis of AFiR signatures is mathematical (lattice hardness), not hardware-rooted. Both trust bases are valid; they are appropriate for different deployment contexts and threat models.

On-Chain Anchoring and Tamper Evidence The Base Mainnet on-chain anchor provides tamper evidence independent of AFiR operator infrastructure. An adversary wishing to forge an AFiR receipt for a past session must either forge an ML-DSA-65 signature (computationally infeasible under MLWE hardness) or rewrite Base Mainnet history (computationally infeasible under proof-of-stake consensus). Neither is feasible under standard assumptions.

Threat Coverage Compared to ZKML and TEE Threat Coverage by Proof Type

Threat	ZKML	TEE	AFiR
Model substitution	Yes	Yes	P4
Weight tampering	Yes	Yes	P4
Environment spoofing	No	Yes	No*
Replay of stale proofs	Yes	Yes	Yes
Tool call repudiation	No	No	P1
Cache poisoning	No	No	P3
Cross-agent chain break	No	No	P2
Output repudiation	Yes	Yes	Yes

* AFiR does not provide hardware-rooted proof that inference ran inside an isolated enclave. For deployments requiring environment isolation proof, TEE entries SHOULD be used for the relevant chain segments, potentially coexisting with AFiR entries as described in .

Key Management AFiR signing keys MUST be generated as ML-DSA-65 key pairs per FIPS 204, stored in a key management system with access logging, rotated on a configurable schedule (90 days RECOMMENDED), and bound to a single operator identity per key pair. Public keys SHOULD be published in a discoverable registry to allow verifiers to retrieve the full public key given the public_key_hint in an AFiR receipt entry.

IANA Considerations This document requests registration of the following inference_proof_type values for use with the inference_root claim defined in :

"afir_ml_dsa_65": AFiR post-quantum signature profile (ML-DSA-65, NIST FIPS 204)
"afir_ml_dsa_44": AFiR post-quantum signature profile (ML-DSA-44, NIST FIPS 204, compact)
"afir_ed25519": AFiR classical signature profile (Ed25519, transition)

No new JWT claims are defined by this document. The existing inference_root, inference_proof_type, and inference_registry claims defined in are used without modification.