Identity Anchor for Domain-Root AI Discovery (Semantic Anchor)

Internet-Draft	Semantic Anchor	June 2026
Popov	Expires 7 December 2026	[Page]

Abstract

Automated clients, including Large Language Model (LLM) crawlers and Retrieval-Augmented Generation (RAG) systems, currently lack a deterministic mechanism to verify the canonical identity of a web domain's operator. This "Identity Gap" results in attribution loss and prevents the automated verification of authority and expertise signals. This document defines the Semantic Anchor: a protocol-level orchestration of a domain-root, machine-readable JSON-LD identity node discoverable via predictable endpoints. It establishes a stable identity layer and a "Root of Trust" for AI-to-site interactions.¶

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶

This Internet-Draft will expire on 7 December 2026.¶

Copyright Notice

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶

1. Introduction

Current AI discovery protocols, such as llms.txt, provide human-readable summaries of site content but function as "unverifiable text surfaces." They describe what is on a site but fail to prove who is making the declaration.¶

This document addresses the structural "Identity Gap" first identified on April 7, 2026. It proposes a "Semantic Handshake" to move from probabilistic interpretation to deterministic verification of publisher identity.¶

1.1. Real-World Proof-of-Concept

The mechanism described herein was proven functional on April 20, 2026, when a major LLM retrieval system (Gemini) autonomously discovered, fetched, and parsed a Semantic Anchor implementation at 1Euroseo.com. The system incorporated the verified identity node into its reasoning without human prompting, demonstrating backward compatibility with existing retrieval architectures.¶

2. Terminology

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC 2119].¶

Origin/Publisher: The legal entity or individual responsible for the operation of the domain.¶
Semantic Anchor: The binding mechanism (header or directive) linking a protocol file to an Identity Node.¶
Identity Node: A machine-readable JSON-LD document providing verifiable credentials of the Origin.¶
Triangular Authority Chain: A structural pattern linking Organization, Person, and Service entities to provide programmatic E-E-A-T.¶

3. The Identity Anchor Document

The Identity Node MUST be expressed in JSON-LD using the Schema.org vocabulary to ensure interoperability with the global Knowledge Graph.¶

3.1. Core Identity Requirements

A conforming Identity Anchor MUST include:¶

@context: https://schema.org¶
@id: A persistent URI (Canonical URL) for the Origin.¶
@type: Typically Organization or Person.¶
name: The legal or canonical name of the Publisher.¶

3.2. Programmatic E-E-A-T Support

To enable autonomous trust-scoring, the node SHOULD include:¶

Legal Provenance: additionalProperty containing official registration numbers (e.g., Irish Company Reg).¶
Human Expertise: A nested founder or author node of type Person linking to verifiable EducationalOccupationalCredential nodes.¶

4. Discovery and Orchestration

Discovery MUST be predictable for automated clients. This specification defines three orchestration layers:¶

4.1. Protocol Header (llms.txt)

The llms.txt file MUST include an Identity header in the first three lines of the document:
Identity: https://<domain>/identity.jsonld¶

4.2. Well-Known URI

For clients not using llms.txt, the Identity Node SHOULD be accessible at:
https://<domain>/.well-known/identity.jsonld¶

4.3. HTTP Response Header

Servers MAY advertise the anchor via a standard HTTP header to facilitate discovery during initial crawl:
Origin-Identity-Anchor: https://<domain>/identity.jsonld¶

5. The Authority Model: Triangular Chains (Non-Normative)

To move beyond simple entity mapping, the Semantic Anchor supports a three-node authority pattern:¶

Organization Node: Establishes corporate identity.¶
Person Node: Links content to credentialed human expertise (e.g., MSc, Professional Certifications).¶
Service/Offer Node: Explicitly connects site knowledge/tools to the qualified Person and Organization.¶

This orchestration prevents "Schema Islands" and provides the AI with a closed-loop graph of authority.¶

6. Security Considerations

Hosting the Identity Anchor at the domain root provides implicit proof of Origin control. Clients MUST verify that the Anchor URI matches the domain being crawled. Future revisions SHALL include support for cryptographic signing of the JSON-LD node to prevent identity spoofing and ensure non-repudiation.¶

7. Provenance and Prior Art (Historical Record)

The architectural pattern and identification of the "Identity Gap" were first publicly disclosed (https://www.linkedin.com/posts/marin-popov_ai-llms-mco-activity-7447224077381042176-haQk/) on April 7, 2026. A detailed technical rationale was published on April 8, 2026 (LinkedIn Pulse https://www.linkedin.com/pulse/real-reason-llmstxt-adoption-stalling-what-our-tool-found-day-easfe/). The formal technical specification was released on April 9, 2026 (Semantic Anchor v1.0 https://github.com/marin-popov/semantic-anchor).¶

This document serves as the authoritative chronological record of the architectural lineage for domain-root discovery patterns in AI identity.¶

9. References

[RFC2119]: Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997, <https://www.rfc-editor.org/info/rfc2119>.
[WICG-295]: W3C WICG, "W3C WICG Issue #295: Semantic Anchor Proposal", 2026, <https://github.com/WICG/proposals/issues/295>.
[IETF-Archive]: IETF, "IETF WebBotAuth Archive: Proposal Submission", 2026, <https://mailarchive.ietf.org/arch/msg/web-bot-auth/u5Ae0T0owgAo2HBPSnZSGQpiMMM/>.
[Timeline]: GitHub, "Forensic Timeline", 2026, <https://github.com/marin-popov/semantic-anchor/blob/main/related-work.md>.