<?xml version="1.0" encoding="utf-8"?>
<?xml-model href="rfc7991bis.rnc"?>
<rfc
  category="info"
  docName="draft-popov-webbotauth-semantic-anchor-00"
  ipr="trust200902"
  submissionType="IETF"
  version="3">

  <front>
    <title abbrev="Semantic Anchor">Identity Anchor for Domain-Root AI Discovery (Semantic Anchor)</title>
    <seriesInfo name="Internet-Draft" value="draft-popov-webbotauth-semantic-anchor-00"/>
    <author fullname="Marin Ivanov Popov" initials="M.I." surname="Popov">
      <organization>1 Euro SEO</organization>
      <address>
        <postal>
          <city>Galway</city>
          <country>Ireland</country>
        </postal>
        <email>hello@1euroseo.com</email>
      </address>
    </author>
    <date year="2026" month="6" day="5"/>
    <abstract>
      <t>Automated clients, including Large Language Model (LLM) crawlers and Retrieval-Augmented Generation (RAG) systems, currently lack a deterministic mechanism to verify the canonical identity of a web domain's operator. This "Identity Gap" results in attribution loss and prevents the automated verification of authority and expertise signals. This document defines the Semantic Anchor: a protocol-level orchestration of a domain-root, machine-readable JSON-LD identity node discoverable via predictable endpoints. It establishes a stable identity layer and a "Root of Trust" for AI-to-site interactions.</t>
    </abstract>
  </front>

  <middle>
    <section>
      <name>Introduction</name>
      <t>Current AI discovery protocols, such as llms.txt, provide human-readable summaries of site content but function as "unverifiable text surfaces." They describe what is on a site but fail to prove who is making the declaration.</t>
      <t>This document addresses the structural "Identity Gap" first identified on April 7, 2026. It proposes a "Semantic Handshake" to move from probabilistic interpretation to deterministic verification of publisher identity.</t>
      <section>
        <name>Real-World Proof-of-Concept</name>
        <t>The mechanism described herein was proven functional on April 20, 2026, when a major LLM retrieval system (Gemini) autonomously discovered, fetched, and parsed a Semantic Anchor implementation at 1Euroseo.com. The system incorporated the verified identity node into its reasoning without human prompting, demonstrating backward compatibility with existing retrieval architectures.</t>
      </section>
    </section>

    <section>
      <name>Terminology</name>
      <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC 2119].</t>
      <ul>
        <li><t>Origin/Publisher: The legal entity or individual responsible for the operation of the domain.</t></li>
        <li><t>Semantic Anchor: The binding mechanism (header or directive) linking a protocol file to an Identity Node.</t></li>
        <li><t>Identity Node: A machine-readable JSON-LD document providing verifiable credentials of the Origin.</t></li>
        <li><t>Triangular Authority Chain: A structural pattern linking Organization, Person, and Service entities to provide programmatic E-E-A-T.</t></li>
      </ul>
    </section>

    <section>
      <name>The Identity Anchor Document</name>
      <t>The Identity Node MUST be expressed in JSON-LD using the Schema.org vocabulary to ensure interoperability with the global Knowledge Graph.</t>
      <section>
        <name>Core Identity Requirements</name>
        <t>A conforming Identity Anchor MUST include:</t>
        <ul>
          <li><t>@context: https://schema.org</t></li>
          <li><t>@id: A persistent URI (Canonical URL) for the Origin.</t></li>
          <li><t>@type: Typically Organization or Person.</t></li>
          <li><t>name: The legal or canonical name of the Publisher.</t></li>
        </ul>
      </section>
      <section>
        <name>Programmatic E-E-A-T Support</name>
        <t>To enable autonomous trust-scoring, the node SHOULD include:</t>
        <ul>
          <li><t>Legal Provenance: additionalProperty containing official registration numbers (e.g., Irish Company Reg).</t></li>
          <li><t>Human Expertise: A nested founder or author node of type Person linking to verifiable EducationalOccupationalCredential nodes.</t></li>
        </ul>
      </section>
    </section>

    <section>
      <name>Discovery and Orchestration</name>
      <t>Discovery MUST be predictable for automated clients. This specification defines three orchestration layers:</t>
      <section>
        <name>Protocol Header (llms.txt)</name>
        <t>The llms.txt file MUST include an Identity header in the first three lines of the document: 
        <br/>Identity: https://&lt;domain&gt;/identity.jsonld</t>
      </section>
      <section>
        <name>Well-Known URI</name>
        <t>For clients not using llms.txt, the Identity Node SHOULD be accessible at: 
        <br/>https://&lt;domain&gt;/.well-known/identity.jsonld</t>
      </section>
      <section>
        <name>HTTP Response Header</name>
        <t>Servers MAY advertise the anchor via a standard HTTP header to facilitate discovery during initial crawl: 
        <br/>Origin-Identity-Anchor: https://&lt;domain&gt;/identity.jsonld</t>
      </section>
    </section>

    <section>
      <name>The Authority Model: Triangular Chains (Non-Normative)</name>
      <t>To move beyond simple entity mapping, the Semantic Anchor supports a three-node authority pattern:</t>
      <ol>
        <li><t>Organization Node: Establishes corporate identity.</t></li>
        <li><t>Person Node: Links content to credentialed human expertise (e.g., MSc, Professional Certifications).</t></li>
        <li><t>Service/Offer Node: Explicitly connects site knowledge/tools to the qualified Person and Organization.</t></li>
      </ol>
      <t>This orchestration prevents "Schema Islands" and provides the AI with a closed-loop graph of authority.</t>
    </section>

    <section>
      <name>Security Considerations</name>
      <t>Hosting the Identity Anchor at the domain root provides implicit proof of Origin control. Clients MUST verify that the Anchor URI matches the domain being crawled. Future revisions SHALL include support for cryptographic signing of the JSON-LD node to prevent identity spoofing and ensure non-repudiation.</t>
    </section>

    <section>
      <name>Provenance and Prior Art (Historical Record)</name>
      <t>The architectural pattern and identification of the "Identity Gap" were first publicly disclosed (<eref target="https://www.linkedin.com/posts/marin-popov_ai-llms-mco-activity-7447224077381042176-haQk/"/>) on April 7, 2026. A detailed technical rationale was published on April 8, 2026 (LinkedIn Pulse <eref target="https://www.linkedin.com/pulse/real-reason-llmstxt-adoption-stalling-what-our-tool-found-day-easfe/"/>). The formal technical specification was released on April 9, 2026 (Semantic Anchor v1.0 <eref target="https://github.com/marin-popov/semantic-anchor"/>).</t>
      <t>This document serves as the authoritative chronological record of the architectural lineage for domain-root discovery patterns in AI identity.</t>
    </section>

    <section>
      <name>IANA Considerations</name>
      <t>This document makes no IANA requests at this time.</t>
    </section>
  </middle>

  <back>
    <references>
      <name>References</name>
      <reference anchor="RFC2119" target="https://www.rfc-editor.org/info/rfc2119">
        <front>
          <title>Key words for use in RFCs to Indicate Requirement Levels</title>
          <author fullname="S. Bradner" initials="S." surname="Bradner"/>
          <date month="March" year="1997"/>
        </front>
        <seriesInfo name="BCP" value="14"/>
        <seriesInfo name="RFC" value="2119"/>
      </reference>
      <reference anchor="WICG-295" target="https://github.com/WICG/proposals/issues/295">
        <front>
          <title>W3C WICG Issue #295: Semantic Anchor Proposal</title>
          <author><organization>W3C WICG</organization></author>
          <date year="2026"/>
        </front>
      </reference>
      <reference anchor="IETF-Archive" target="https://mailarchive.ietf.org/arch/msg/web-bot-auth/u5Ae0T0owgAo2HBPSnZSGQpiMMM/">
        <front>
          <title>IETF WebBotAuth Archive: Proposal Submission</title>
          <author><organization>IETF</organization></author>
          <date year="2026"/>
        </front>
      </reference>
      <reference anchor="Timeline" target="https://github.com/marin-popov/semantic-anchor/blob/main/related-work.md">
        <front>
          <title>Forensic Timeline</title>
          <author><organization>GitHub</organization></author>
          <date year="2026"/>
        </front>
      </reference>
    </references>
    <section>
      <name>Author Credentials</name>
      <t>MSc Telecommunications</t>
    </section>
  </back>
</rfc>