Network Working Group                                          Y. Collet
Internet-Draft                                                      Meta
Intended status: Informational                         S. Josefsson, Ed.
Expires: 14 April 2026                                   11 October 2025


                      xxHash fast digest algorithm
                       draft-josefsson-xxhash-00

Abstract

   Description of the xxHash fast digest algorithm.

About This Document

   This note is to be removed before publishing as an RFC.

   Status information for this document may be found at
   https://datatracker.ietf.org/doc/draft-josefsson-xxhash/.

   Source for this draft and an issue tracker can be found at
   https://gitlab.com/jas/ietf-xxhash.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on 14 April 2026.

Copyright Notice

   Copyright (c) 2025 IETF Trust and the persons identified as the
   document authors.  All rights reserved.


Collet & Josefsson        Expires 14 April 2026                 [Page 1]

Internet-Draft                   xxHash                     October 2025


   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents (https://trustee.ietf.org/
   license-info) in effect on the date of publication of this document.
   Please review these documents carefully, as they describe your rights
   and restrictions with respect to this document.

   This document may not be modified, and derivative works of it may not
   be created, and it may not be published except as an Internet-Draft.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
     1.1.  Operation notations . . . . . . . . . . . . . . . . . . .   3
   2.  XXH32 Algorithm Description . . . . . . . . . . . . . . . . .   4
     2.1.  Overview  . . . . . . . . . . . . . . . . . . . . . . . .   4
       2.1.1.  Step 1.  Initialize internal accumulators . . . . . .   5
       2.1.2.  Step 2.  Process stripes  . . . . . . . . . . . . . .   5
       2.1.3.  Step 3.  Accumulator convergence  . . . . . . . . . .   6
       2.1.4.  Step 4.  Add input length . . . . . . . . . . . . . .   6
       2.1.5.  Step 5.  Consume remaining input  . . . . . . . . . .   6
       2.1.6.  Step 6.  Final mix (avalanche)  . . . . . . . . . . .   7
       2.1.7.  Step 7.  Output . . . . . . . . . . . . . . . . . . .   7
   3.  XXH64 Algorithm Description . . . . . . . . . . . . . . . . .   7
     3.1.  Overview  . . . . . . . . . . . . . . . . . . . . . . . .   7
       3.1.1.  Step 1.  Initialize internal accumulators . . . . . .   8
       3.1.2.  Step 2.  Process stripes  . . . . . . . . . . . . . .   8
       3.1.3.  Step 3.  Accumulator convergence  . . . . . . . . . .   9
       3.1.4.  Step 4.  Add input length . . . . . . . . . . . . . .   9
       3.1.5.  Step 5.  Consume remaining input  . . . . . . . . . .  10
       3.1.6.  Step 6.  Final mix (avalanche)  . . . . . . . . . . .  10
       3.1.7.  Step 7.  Output . . . . . . . . . . . . . . . . . . .  10
   4.  XXH3 Algorithm Overview . . . . . . . . . . . . . . . . . . .  11
     4.1.  Seed and Secret . . . . . . . . . . . . . . . . . . . . .  12
       4.1.1.  Final Mixing Step (avalanche) . . . . . . . . . . . .  13
   5.  XXH3 Algorithm Description (for small inputs) . . . . . . . .  13
     5.1.  Empty input . . . . . . . . . . . . . . . . . . . . . . .  14
       5.1.1.  1-3 bytes of input  . . . . . . . . . . . . . . . . .  14
       5.1.2.  4-8 bytes of input  . . . . . . . . . . . . . . . . .  14
       5.1.3.  9-16 bytes of input . . . . . . . . . . . . . . . . .  15
   6.  XXH3 Algorithm Description (for medium inputs)  . . . . . . .  16
     6.1.  Step 1.  Initialize internal accumulators . . . . . . . .  16
     6.2.  Step 2.  Process the input  . . . . . . . . . . . . . . .  16
       6.2.1.  Mixing operation  . . . . . . . . . . . . . . . . . .  17
       6.2.2.  17-128 bytes of input . . . . . . . . . . . . . . . .  17
       6.2.3.  129-240 bytes of input  . . . . . . . . . . . . . . .  18
     6.3.  Step 3.  Finalization . . . . . . . . . . . . . . . . . .  19
   7.  XXH3 Algorithm Description (for large inputs) . . . . . . . .  19
     7.1.  Step 1.  Initialize internal accumulators . . . . . . . .  19


Collet & Josefsson        Expires 14 April 2026                 [Page 2]

Internet-Draft                   xxHash                     October 2025


     7.2.  Step 2.  Process blocks . . . . . . . . . . . . . . . . .  20
       7.2.1.  Step 2-1.  Process stripes in the block . . . . . . .  20
       7.2.2.  Step 2-2.  Scramble accumulators  . . . . . . . . . .  21
       7.2.3.  Step 3.  Process the last block and the last 64
               bytes . . . . . . . . . . . . . . . . . . . . . . . .  21
     7.3.  Step 4.  Finalization . . . . . . . . . . . . . . . . . .  21
   8.  Performance considerations  . . . . . . . . . . . . . . . . .  22
   9.  Reference Implementation  . . . . . . . . . . . . . . . . . .  23
   10. IANA Considerations . . . . . . . . . . . . . . . . . . . . .  23
   11. Security Considerations . . . . . . . . . . . . . . . . . . .  23
   12. Notices . . . . . . . . . . . . . . . . . . . . . . . . . . .  23
   13. Informative References  . . . . . . . . . . . . . . . . . . .  24
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  24

1.  Introduction

   This document describes the xxHash digest algorithm for both 32-bit
   and 64-bit variants, named XXH32 and XXH64.  The algorithm takes an
   input a message of arbitrary length and an optional seed value, then
   produces an output of 32 or 64-bit as "fingerprint" or "digest".

   xxHash is primarily designed for speed.  It is labeled non-
   cryptographic, and is not meant to avoid intentional collisions (same
   digest for 2 different messages), or to prevent producing a message
   with a predefined digest.

   XXH32 is designed to be fast on 32-bit machines.  XXH64 is designed
   to be fast on 64-bit machines.  Both variants produce different
   output.  However, a given variant shall produce exactly the same
   output, irrespective of the cpu / os used.  In particular, the result
   remains identical whatever the endianness and width of the cpu is.

1.1.  Operation notations

   All operations are performed modulo {32,64} bits.  Arithmetic
   overflows are expected.  XXH32 uses 32-bit modular operations.  XXH64
   and XXH3 use 64-bit modular operations.  When an operation ingests
   input or secret as multi-bytes values, it reads it using little-
   endian convention.

   *  +: denotes modular addition

   *  -: denotes modular subtraction

   *  *: denotes modular multiplication


Collet & Josefsson        Expires 14 April 2026                 [Page 3]

Internet-Draft                   xxHash                     October 2025


      -  *Exception:* In XXH3, if it is in the form (u128)x * (u128)y,
         it denotes 64-bit by 64-bit normal multiplication into a full
         128-bit result.

   *  X <<< s: denotes the value obtained by circularly shifting
      (rotating) X left by s bit positions.

   *  X >> s: denotes the value obtained by shifting X right by s bit
      positions.  Upper s bits become 0.

   *  X << s: denotes the value obtained by shifting X left by s bit
      positions.  Lower s bits become 0.

   *  X xor Y: denotes the bit-wise XOR of X and Y (same width).

   *  X | Y: denotes the bit-wise OR of X and Y (same width).

   *  ~X: denotes the bit-wise negation of X.

2.  XXH32 Algorithm Description

2.1.  Overview

   We begin by supposing that we have a message of any length L as
   input, and that we wish to find its digest.  Here L is an arbitrary
   nonnegative integer; L may be zero.  The following steps are
   performed to compute the digest of the message.

   The algorithm collect and transform input in _stripes_ of 16 bytes.
   The transforms are stored inside 4 "accumulators", each one storing
   an unsigned 32-bit value.  Each accumulator can be processed
   independently in parallel, speeding up processing for cpu with
   multiple execution units.

   The algorithm uses 32-bits addition, multiplication, rotate, shift
   and xor operations.  Many operations require some 32-bits prime
   number constants, all defined below:

     // 0b10011110001101110111100110110001
     static const u32 PRIME32_1 = 0x9E3779B1U;
     // 0b10000101111010111100101001110111
     static const u32 PRIME32_2 = 0x85EBCA77U;
     // 0b11000010101100101010111000111101
     static const u32 PRIME32_3 = 0xC2B2AE3DU;
     // 0b00100111110101001110101100101111
     static const u32 PRIME32_4 = 0x27D4EB2FU;
     // 0b00010110010101100110011110110001
     static const u32 PRIME32_5 = 0x165667B1U;


Collet & Josefsson        Expires 14 April 2026                 [Page 4]

Internet-Draft                   xxHash                     October 2025


   These constants are prime numbers, and feature a good mix of bits 1
   and 0, neither too regular, nor too dissymmetric.  These properties
   help dispersion capabilities.

2.1.1.  Step 1.  Initialize internal accumulators

   Each accumulator gets an initial value based on optional seed input.
   Since the seed is optional, it can be 0.

     u32 acc1 = seed + PRIME32_1 + PRIME32_2;
     u32 acc2 = seed + PRIME32_2;
     u32 acc3 = seed + 0;
     u32 acc4 = seed - PRIME32_1;

2.1.1.1.  Special case: input is less than 16 bytes

   When the input is too small (< 16 bytes), the algorithm will not
   process any stripes.  Consequently, it will not make use of parallel
   accumulators.

   In this case, a simplified initialization is performed, using a
   single accumulator:

     u32 acc  = seed + PRIME32_5;

   The algorithm then proceeds directly to step 4.

2.1.2.  Step 2.  Process stripes

   A stripe is a contiguous segment of 16 bytes.  It is evenly divided
   into 4 _lanes_, of 4 bytes each.  The first lane is used to update
   accumulator 1, the second lane is used to update accumulator 2, and
   so on.

   Each lane read its associated 32-bit value using *little-endian*
   convention.

   For each {lane, accumulator}, the update process is called a _round_,
   and applies the following formula:

     accN = accN + (laneN * PRIME32_2);
     accN = accN <<< 13;
     accN = accN * PRIME32_1;

   This shuffles the bits so that any bit from input _lane_ impacts
   several bits in output _accumulator_. All operations are performed
   modulo 2^32.


Collet & Josefsson        Expires 14 April 2026                 [Page 5]

Internet-Draft                   xxHash                     October 2025


   Input is consumed one full stripe at a time.  Step 2 is looped as
   many times as necessary to consume the whole input, except for the
   last remaining bytes which cannot form a stripe (< 16 bytes).  When
   that happens, move to step 3.

2.1.3.  Step 3.  Accumulator convergence

   All 4 lane accumulators from the previous steps are merged to produce
   a single remaining accumulator of the same width (32-bit).  The
   associated formula is as follows:

     acc = (acc1 <<< 1) + (acc2 <<< 7) + (acc3 <<< 12) + (acc4 <<< 18);

2.1.4.  Step 4.  Add input length

   The input total length is presumed known at this stage.  This step is
   just about adding the length to accumulator, so that it participates
   to final mixing.

     acc = acc + (u32)inputLength;

   Note that, if input length is so large that it requires more than
   32-bits, only the lower 32-bits are added to the accumulator.

2.1.5.  Step 5.  Consume remaining input

   There may be up to 15 bytes remaining to consume from the input.  The
   final stage will digest them according to following pseudo-code:

     while (remainingLength >= 4) {
         lane = read_32bit_little_endian(input_ptr);
         acc = acc + lane * PRIME32_3;
         acc = (acc <<< 17) * PRIME32_4;
         input_ptr += 4; remainingLength -= 4;
     }

     while (remainingLength >= 1) {
         lane = read_byte(input_ptr);
         acc = acc + lane * PRIME32_5;
         acc = (acc <<< 11) * PRIME32_1;
         input_ptr += 1; remainingLength -= 1;
     }

   This process ensures that all input bytes are present in the final
   mix.


Collet & Josefsson        Expires 14 April 2026                 [Page 6]

Internet-Draft                   xxHash                     October 2025


2.1.6.  Step 6.  Final mix (avalanche)

   The final mix ensures that all input bits have a chance to impact any
   bit in the output digest, resulting in an unbiased distribution.
   This is also called avalanche effect.

     acc = acc xor (acc >> 15);
     acc = acc * PRIME32_2;
     acc = acc xor (acc >> 13);
     acc = acc * PRIME32_3;
     acc = acc xor (acc >> 16);

2.1.7.  Step 7.  Output

   The XXH32() function produces an unsigned 32-bit value as output.

   For systems which require to store and/or display the result in
   binary or hexadecimal format, the canonical format is defined to
   reproduce the same value as the natural decimal format, hence follows
   *big-endian* convention (most significant byte first).

3.  XXH64 Algorithm Description

3.1.  Overview

   XXH64's algorithm structure is very similar to XXH32 one.  The major
   difference is that XXH64 uses 64-bit arithmetic, speeding up memory
   transfer for 64-bit compliant systems, but also relying on cpu
   capability to efficiently perform 64-bit operations.

   The algorithm collects and transforms input in _stripes_ of 32 bytes.
   The transforms are stored inside 4 "accumulators", each one storing
   an unsigned 64-bit value.  Each accumulator can be processed
   independently in parallel, speeding up processing for cpu with
   multiple execution units.

   The algorithm uses 64-bit addition, multiplication, rotate, shift and
   xor operations.  Many operations require some 64-bit prime number
   constants, all defined below:


Collet & Josefsson        Expires 14 April 2026                 [Page 7]

Internet-Draft                   xxHash                     October 2025


     // 0b1001111000110111011110011011000110000101111010111100101010000111
     static const u64 PRIME64_1 = 0x9E3779B185EBCA87ULL;
     // 0b1100001010110010101011100011110100100111110101001110101101001111
     static const u64 PRIME64_2 = 0xC2B2AE3D27D4EB4FULL;
     // 0b0001011001010110011001111011000110011110001101110111100111111001
     static const u64 PRIME64_3 = 0x165667B19E3779F9ULL;
     // 0b1000010111101011110010100111011111000010101100101010111001100011
     static const u64 PRIME64_4 = 0x85EBCA77C2B2AE63ULL;
     // 0b0010011111010100111010110010111100010110010101100110011111000101
     static const u64 PRIME64_5 = 0x27D4EB2F165667C5ULL;

   These constants are prime numbers, and feature a good mix of bits 1
   and 0, neither too regular, nor too dissymmetric.  These properties
   help dispersion capabilities.

3.1.1.  Step 1.  Initialize internal accumulators

   Each accumulator gets an initial value based on optional seed input.
   Since the seed is optional, it can be 0.

     u64 acc1 = seed + PRIME64_1 + PRIME64_2;
     u64 acc2 = seed + PRIME64_2;
     u64 acc3 = seed + 0;
     u64 acc4 = seed - PRIME64_1;

3.1.1.1.  Special case: input is less than 32 bytes

   When the input is too small (< 32 bytes), the algorithm will not
   process any stripes.  Consequently, it will not make use of parallel
   accumulators.

   In this case, a simplified initialization is performed, using a
   single accumulator:

     u64 acc  = seed + PRIME64_5;

   The algorithm then proceeds directly to step 4.

3.1.2.  Step 2.  Process stripes

   A stripe is a contiguous segment of 32 bytes.  It is evenly divided
   into 4 _lanes_, of 8 bytes each.  The first lane is used to update
   accumulator 1, the second lane is used to update accumulator 2, and
   so on.

   Each lane read its associated 64-bit value using *little-endian*
   convention.


Collet & Josefsson        Expires 14 April 2026                 [Page 8]

Internet-Draft                   xxHash                     October 2025


   For each {lane, accumulator}, the update process is called a _round_,
   and applies the following formula:

   round(accN,laneN):
     accN = accN + (laneN * PRIME64_2);
     accN = accN <<< 31;
     return accN * PRIME64_1;

   This shuffles the bits so that any bit from input _lane_ impacts
   several bits in output _accumulator_. All operations are performed
   modulo 2^64.

   Input is consumed one full stripe at a time.  Step 2 is looped as
   many times as necessary to consume the whole input, except for the
   last remaining bytes which cannot form a stripe (< 32 bytes).  When
   that happens, move to step 3.

3.1.3.  Step 3.  Accumulator convergence

   All 4 lane accumulators from previous steps are merged to produce a
   single remaining accumulator of same width (64-bit).  The associated
   formula is as follows.

   Note that accumulator convergence is more complex than 32-bit
   variant, and requires to define another function called
   _mergeAccumulator()_:

   mergeAccumulator(acc,accN):
     acc  = acc xor round(0, accN);
     acc  = acc * PRIME64_1;
     return acc + PRIME64_4;

   which is then used in the convergence formula:

     acc = (acc1 <<< 1) + (acc2 <<< 7) + (acc3 <<< 12) + (acc4 <<< 18);
     acc = mergeAccumulator(acc, acc1);
     acc = mergeAccumulator(acc, acc2);
     acc = mergeAccumulator(acc, acc3);
     acc = mergeAccumulator(acc, acc4);

3.1.4.  Step 4.  Add input length

   The input total length is presumed known at this stage.  This step is
   just about adding the length to accumulator, so that it participates
   to final mixing.

     acc = acc + inputLength;


Collet & Josefsson        Expires 14 April 2026                 [Page 9]

Internet-Draft                   xxHash                     October 2025


3.1.5.  Step 5.  Consume remaining input

   There may be up to 31 bytes remaining to consume from the input.  The
   final stage will digest them according to following pseudo-code:

     while (remainingLength >= 8) {
         lane = read_64bit_little_endian(input_ptr);
         acc = acc xor round(0, lane);
         acc = (acc <<< 27) * PRIME64_1;
         acc = acc + PRIME64_4;
         input_ptr += 8; remainingLength -= 8;
     }

     if (remainingLength >= 4) {
         lane = read_32bit_little_endian(input_ptr);
         acc = acc xor (lane * PRIME64_1);
         acc = (acc <<< 23) * PRIME64_2;
         acc = acc + PRIME64_3;
         input_ptr += 4; remainingLength -= 4;
     }

     while (remainingLength >= 1) {
         lane = read_byte(input_ptr);
         acc = acc xor (lane * PRIME64_5);
         acc = (acc <<< 11) * PRIME64_1;
         input_ptr += 1; remainingLength -= 1;
     }

   This process ensures that all input bytes are present in the final
   mix.

3.1.6.  Step 6.  Final mix (avalanche)

   The final mix ensures that all input bits have a chance to impact any
   bit in the output digest, resulting in an unbiased distribution.
   This is also called avalanche effect.

     acc = acc xor (acc >> 33);
     acc = acc * PRIME64_2;
     acc = acc xor (acc >> 29);
     acc = acc * PRIME64_3;
     acc = acc xor (acc >> 32);

3.1.7.  Step 7.  Output

   The XXH64() function produces an unsigned 64-bit value as output.


Collet & Josefsson        Expires 14 April 2026                [Page 10]

Internet-Draft                   xxHash                     October 2025


   For systems which require to store and/or display the result in
   binary or hexadecimal format, the canonical format is defined to
   reproduce the same value as the natural decimal format, hence follows
   *big-endian* convention (most significant byte first).

4.  XXH3 Algorithm Overview

   XXH3 comes in two different versions: XXH3-64 and XXH3-128 (or
   XXH128), producing 64 and 128 bits of output, respectively.

   XXH3 uses different algorithms for small (0-16 bytes), medium (17-240
   bytes), and large (241+ bytes) inputs.  The algorithms for small and
   medium inputs are optimized for performance.  The three algorithms
   are described in the following sections.

   Many operations require some 64-bit prime number constants, which are
   mostly the same constants used in XXH32 and XXH64, all defined below:

   // 0b10011110001101110111100110110001
   static const u64 PRIME32_1 = 0x9E3779B1U;
   // 0b10000101111010111100101001110111
   static const u64 PRIME32_2 = 0x85EBCA77U;
   // 0b11000010101100101010111000111101
   static const u64 PRIME32_3 = 0xC2B2AE3DU;
   // 0b1001111000110111011110011011000110000101111010111100101010000111
   static const u64 PRIME64_1 = 0x9E3779B185EBCA87ULL;
   // 0b1100001010110010101011100011110100100111110101001110101101001111
   static const u64 PRIME64_2 = 0xC2B2AE3D27D4EB4FULL;
   // 0b0001011001010110011001111011000110011110001101110111100111111001
   static const u64 PRIME64_3 = 0x165667B19E3779F9ULL;
   // 0b1000010111101011110010100111011111000010101100101010111001100011
   static const u64 PRIME64_4 = 0x85EBCA77C2B2AE63ULL;
   // 0b0010011111010100111010110010111100010110010101100110011111000101
   static const u64 PRIME64_5 = 0x27D4EB2F165667C5ULL;
   // 0b0001011001010110011001111001000110011110001101110111100111111001
   static const u64 PRIME_MX1 = 0x165667919E3779F9ULL;
   // 0b1001111110110010000111000110010100011110100110001101111100100101
   static const u64 PRIME_MX2 = 0x9FB21C651E98DF25ULL;

   The XXH3_64bits() function produces an unsigned 64-bit value.  The
   XXH3_128bits() function produces a XXH128_hash_t struct containing
   low64 and high64 - the lower and higher 64-bit half values of the
   result, respectively.

   For systems requiring storing and/or displaying the result in binary
   or hexadecimal format, the canonical format is defined to reproduce
   the same value as the natural decimal format, hence following *big-
   endian* convention (most significant byte first).


Collet & Josefsson        Expires 14 April 2026                [Page 11]

Internet-Draft                   xxHash                     October 2025


4.1.  Seed and Secret

   XXH3 provides seeded hashing by introducing two configurable
   constants used in the hashing process: the seed and the secret.  The
   seed is an unsigned 64-bit value, and the secret is an array of bytes
   that is at least 136 bytes in size.  The default seed is 0, and the
   default secret is the following 192-byte value:

   static const u8 defaultSecret[192] = {
     0xb8, 0xfe, 0x6c, 0x39, 0x23, 0xa4, 0x4b, 0xbe,
     0x7c, 0x01, 0x81, 0x2c, 0xf7, 0x21, 0xad, 0x1c,
     0xde, 0xd4, 0x6d, 0xe9, 0x83, 0x90, 0x97, 0xdb,
     0x72, 0x40, 0xa4, 0xa4, 0xb7, 0xb3, 0x67, 0x1f,
     0xcb, 0x79, 0xe6, 0x4e, 0xcc, 0xc0, 0xe5, 0x78,
     0x82, 0x5a, 0xd0, 0x7d, 0xcc, 0xff, 0x72, 0x21,
     0xb8, 0x08, 0x46, 0x74, 0xf7, 0x43, 0x24, 0x8e,
     0xe0, 0x35, 0x90, 0xe6, 0x81, 0x3a, 0x26, 0x4c,
     0x3c, 0x28, 0x52, 0xbb, 0x91, 0xc3, 0x00, 0xcb,
     0x88, 0xd0, 0x65, 0x8b, 0x1b, 0x53, 0x2e, 0xa3,
     0x71, 0x64, 0x48, 0x97, 0xa2, 0x0d, 0xf9, 0x4e,
     0x38, 0x19, 0xef, 0x46, 0xa9, 0xde, 0xac, 0xd8,
     0xa8, 0xfa, 0x76, 0x3f, 0xe3, 0x9c, 0x34, 0x3f,
     0xf9, 0xdc, 0xbb, 0xc7, 0xc7, 0x0b, 0x4f, 0x1d,
     0x8a, 0x51, 0xe0, 0x4b, 0xcd, 0xb4, 0x59, 0x31,
     0xc8, 0x9f, 0x7e, 0xc9, 0xd9, 0x78, 0x73, 0x64,
     0xea, 0xc5, 0xac, 0x83, 0x34, 0xd3, 0xeb, 0xc3,
     0xc5, 0x81, 0xa0, 0xff, 0xfa, 0x13, 0x63, 0xeb,
     0x17, 0x0d, 0xdd, 0x51, 0xb7, 0xf0, 0xda, 0x49,
     0xd3, 0x16, 0x55, 0x26, 0x29, 0xd4, 0x68, 0x9e,
     0x2b, 0x16, 0xbe, 0x58, 0x7d, 0x47, 0xa1, 0xfc,
     0x8f, 0xf8, 0xb8, 0xd1, 0x7a, 0xd0, 0x31, 0xce,
     0x45, 0xcb, 0x3a, 0x8f, 0x95, 0x16, 0x04, 0x28,
     0xaf, 0xd7, 0xfb, 0xca, 0xbb, 0x4b, 0x40, 0x7e,
   };

   The seed and the secret can be optionally specified using the
   *_withSecret and *_withSeed versions of the hash function.

   The seed and the secret cannot be specified simultaneously
   (*_withSecretAndSeed is actually *_withSeed for short and medium
   inputs <= 240 bytes, and *_withSecret for large inputs).  When one is
   specified, the other one uses the default value.  There is one
   exception though: when input is large (> 240 bytes) and a seed is
   given, a secret is derived from the seed value and the default secret
   using the following procedure:


Collet & Josefsson        Expires 14 April 2026                [Page 12]

Internet-Draft                   xxHash                     October 2025


   deriveSecret(u64 seed):
     u64 derivedSecret[24] = defaultSecret[0:192];
     for (i = 0; i < 12; i++) {
       derivedSecret[i*2] += seed;
       derivedSecret[i*2+1] -= seed;
     }
     return derivedSecret; // convert to u8[192] (little-endian)

   The derivation treats the secrets as 24 64-bit values.  In XXH3
   algorithms, the secret is always read similarly by treating a
   contiguous segment of the array as one or more 32-bit or 64-bit
   values. *The secret values are always read using little-endian
   convention*.

4.1.1.  Final Mixing Step (avalanche)

   To make sure that all input bits have a chance to impact any bit in
   the output digest (avalanche effect), the final step of the XXH3
   algorithm is usually one of the two fixed operations that mix the
   bits in a 64-bit value.  These operations are denoted avalanche() and
   avalanche_XXH64() in the following XXH3 description.

   avalanche(u64 x):
     x = x xor (x >> 37);
     x = x * PRIME_MX1;
     x = x xor (x >> 32);
     return x;

   avalanche_XXH64(u64 x):
     x = x xor (x >> 33);
     x = x * PRIME64_2;
     x = x xor (x >> 29);
     x = x * PRIME64_3;
     x = x xor (x >> 32);
     return x;

5.  XXH3 Algorithm Description (for small inputs)

   The algorithm for small inputs (0-16 bytes of input) is further
   divided into 4 cases: empty, 1-3 bytes, 4-8 bytes, and 9-16 bytes of
   input.

   The algorithm uses byte-swap operations.  The byte-swap operation
   reverses the byte order in a 32-bit or 64-bit value.  It is denoted
   bswap32 and bswap64 for its 32-bit and 64-bit versions, respectively.


Collet & Josefsson        Expires 14 April 2026                [Page 13]

Internet-Draft                   xxHash                     October 2025


5.1.  Empty input

   The hash of empty input is calculated from the seed and a segment of
   the secret:

   XXH3_64_empty():
     u64 secretWords[2] = secret[56:72];
     return avalanche_XXH64(seed xor secretWords[0] xor secretWords[1]);

   XXH3_128_empty():
     u64 secretWords[4] = secret[64:96];
     return {avalanche_XXH64(seed xor secretWords[0] xor secretWords[1]), // lower half
             avalanche_XXH64(seed xor secretWords[2] xor secretWords[3])}; // higher half

5.1.1.  1-3 bytes of input

   The algorithm starts from a single 32-bit value combining the input
   bytes and its length:

   u32 combined = (u32)input[inputLength-1] | ((u32)inputLength << 8) |
                  ((u32)input[0] << 16) | ((u32)input[inputLength>>1] << 24);
   // LSB          8       16           24                    MSB
   //  | last byte | length | first byte | middle-or-last byte |

   Then the final output is calculated from the value and the first 8
   bytes (XXH3-64) or 16 bytes (XXH3-128) of the secret to produce the
   final result.  The secret here is read as 32-bit values instead of
   the usual 64-bit values.

   XXH3_64_1to3():
     u32 secretWords[2] = secret[0:8];
     u64 value = ((u64)(secretWords[0] xor secretWords[1]) + seed) xor (u64)combined;
     return avalanche_XXH64(value);

   XXH3_128_1to3():
     u32 secretWords[4] = secret[0:16];
     u64 low = ((u64)(secretWords[0] xor secretWords[1]) + seed) xor (u64)combined;
     u64 high = ((u64)(secretWords[2] xor secretWords[3]) - seed) xor (u64)(bswap32(combined) <<< 13);
     // note that the bswap32(combined) <<< 13 above is 32-bit rotate
     return {avalanche_XXH64(low), // lower half
             avalanche_XXH64(high)}; // higher half

   Note that the XXH3-64 result is the lower half of XXH3-128 result.

5.1.2.  4-8 bytes of input

   The algorithm starts from reading the first and last 4 bytes of the
   input as little-endian 32-bit values, and a modified seed:


Collet & Josefsson        Expires 14 April 2026                [Page 14]

Internet-Draft                   xxHash                     October 2025


   u32 inputFirst = input[0:4];
   u32 inputLast = input[inputLength-4:inputLength];
   u64 modifiedSeed = seed xor ((u64)bswap32((u32)lowerHalf(seed)) << 32);

   Again, these values are combined with a segment of the secret to
   produce the final value.

   XXH3_64_4to8():
     u64 secretWords[2] = secret[8:24];
     u64 combined = (u64)inputLast | ((u64)inputFirst << 32);
     u64 value = ((secretWords[0] xor secretWords[1]) - modifiedSeed) xor combined;
     value = value xor (value <<< 49) xor (value <<< 24);
     value = value * PRIME_MX2;
     value = value xor ((value >> 35) + inputLength);
     value = value * PRIME_MX2;
     value = value xor (value >> 28);
     return value;

   XXH3_128_4to8():
     u64 secretWords[2] = secret[16:32];
     u64 combined = (u64)inputFirst | ((u64)inputLast << 32);
     u64 value = ((secretWords[0] xor secretWords[1]) + modifiedSeed) xor combined;
     u128 mulResult = (u128)value * (u128)(PRIME64_1 + (inputLength << 2));
     u64 high = higherHalf(mulResult); // mulResult >> 64
     u64 low = lowerHalf(mulResult); // mulResult & 0xFFFFFFFFFFFFFFFF
     high = high + (low << 1);
     low = low xor (high >> 3);
     low = low xor (low >> 35);
     low = low * PRIME_MX2;
     low = low xor (low >> 28);
     high = avalanche(high);
     return {low, high};

5.1.3.  9-16 bytes of input

   The algorithm starts from reading the first and last 8 bytes of the
   input as little-endian 64-bit values:

   u64 inputFirst = input[0:8];
   u64 inputLast = input[inputLength-8:inputLength];

   Once again, these values are combined with a segment of the secret to
   produce the final value.


Collet & Josefsson        Expires 14 April 2026                [Page 15]

Internet-Draft                   xxHash                     October 2025


   XXH3_64_9to16():
     u64 secretWords[4] = secret[24:56];
     u64 low = ((secretWords[0] xor secretWords[1]) + seed) xor inputFirst;
     u64 high = ((secretWords[2] xor secretWords[3]) - seed) xor inputLast;
     u128 mulResult = (u128)low * (u128)high;
     u64 value = inputLength + bswap64(low) + high + (u64)(lowerHalf(mulResult) xor higherHalf(mulResult));
     return avalanche(value);

   XXH3_128_9to16():
     u64 secretWords[4] = secret[32:64];
     u64 val1 = ((secretWords[0] xor secretWords[1]) - seed) xor inputFirst xor inputLast;
     u64 val2 = ((secretWords[2] xor secretWords[3]) + seed) xor inputLast;
     u128 mulResult = (u128)val1 * (u128)PRIME64_1;
     u64 low = lowerHalf(mulResult) + ((u64)(inputLength - 1) << 54);
     u64 high = higherHalf(mulResult) + ((u64)higherHalf(val2) << 32) + (u64)lowerHalf(val2) * PRIME32_2;
     // the above line can also be simplified to higherHalf(mulResult) + val2 + (u64)lowerHalf(val2) * (PRIME32_2 - 1);
     low = low xor bswap64(high);
     // the following three lines are in fact a 128x64 -> 128 multiplication ({low,high} = (u128){low,high} * PRIME64_2)
     u128 mulResult2 = (u128)low * (u128)PRIME64_2;
     low = lowerHalf(mulResult2);
     high = higherHalf(mulResult2) + high * PRIME64_2;
     return {avalanche(low), // lower half
             avalanche(high)}; // higher half

6.  XXH3 Algorithm Description (for medium inputs)

   This algorithm is used for medium inputs (17-240 bytes of input).
   Its internal hash state is stored inside 1 (XXH3-64) or 2 (XXH3-128)
   "accumulators", each storing an unsigned 64-bit value.

6.1.  Step 1.  Initialize internal accumulators

   The accumulator(s) are initialized based on the input length.

   // For XXH3-64
   u64 acc = inputLength * PRIME64_1;

   // For XXH3-128
   u64 acc[2] = {inputLength * PRIME64_1, 0};

6.2.  Step 2.  Process the input

   This step is further divided into two cases: one for 17-128 bytes of
   input, and one for 129-240 bytes of input.


Collet & Josefsson        Expires 14 April 2026                [Page 16]

Internet-Draft                   xxHash                     October 2025


6.2.1.  Mixing operation

   This step uses a mixing operation that mixes a 16-byte segment of
   data, a 16-byte segment of secret and the seed into a 64-bit value as
   a building block.  This operation treat the segment of data and
   secret as little-endian 64-bit values.

   mixStep(u8 data[16], size secretOffset, u64 seed):
     u64 dataWords[2] = data[0:16];
     u64 secretWords[2] = secret[secretOffset:secretOffset+16];
     u128 mulResult = (u128)(dataWords[0] xor (secretWords[0] + seed)) *
                      (u128)(dataWords[1] xor (secretWords[1] - seed));
     return lowerHalf(mulResult) xor higherHalf(mulResult);

   The mixing operation is always invoked in groups of two in XXH3-128,
   where two 16-byte segments of data are mixed with a 32-byte segment
   of secret, and the accumulators are updated accordingly.

   mixTwoChunks(u8 data1[16], u8 data2[16], size secretOffset, u64 seed):
     u64 dataWords1[2] = data1[0:16]; // again, little-endian conversion
     u64 dataWords2[2] = data2[0:16];
     acc[0] = acc[0] + mixStep(data1, secretOffset, seed);
     acc[1] = acc[1] + mixStep(data2, secretOffset + 16, seed);
     acc[0] = acc[0] xor (dataWords2[0] + dataWords2[1]);
     acc[1] = acc[1] xor (dataWords1[0] + dataWords1[1]);

   The input is split into several 16-byte chunks and mixed, and the
   result is added to the accumulator(s).

6.2.2.  17-128 bytes of input

   The input is read as _N_ 16-byte chunks starting from the beginning
   and _N_ chunks starting from the end, where _N_ is the smallest
   number that these 2_N_ chunks cover the whole input.  These chunks
   are paired up and mixed, and the results are accumulated to the
   accumulator(s).


Collet & Josefsson        Expires 14 April 2026                [Page 17]

Internet-Draft                   xxHash                     October 2025


   // the loop variable `i` should be signed to avoid underflow in implementation
   processInput_XXH3_64_17to128():
     u64 numRounds = ((inputLength - 1) >> 5) + 1;
     for (i = numRounds - 1; i >= 0; i--) {
       size offsetStart = i*16;
       size offsetEnd = inputLength - i*16 - 16;
       acc += mixStep(input[offsetStart:offsetStart+16], i*32, seed);
       acc += mixStep(input[offsetEnd:offsetEnd+16], i*32+16, seed);
     }

   processInput_XXH3_128_17to128():
     u64 numRounds = ((inputLength - 1) >> 5) + 1;
     for (i = numRounds - 1; i >= 0; i--) {
       size offsetStart = i*16;
       size offsetEnd = inputLength - i*16 - 16;
       mixTwoChunks(input[offsetStart:offsetStart+16], input[offsetEnd:offsetEnd+16], i*32, seed);
     }

6.2.3.  129-240 bytes of input

   The input is split into 16-byte (XXH3-64) or 32-byte (XXH3-128)
   chunks.  The first 128 bytes are first mixed chunk by chunk, followed
   by an intermediate avalanche operation.  Then the remaining full
   chunks are processed, and finally the last 16/32 bytes are treated as
   a chunk to process.


Collet & Josefsson        Expires 14 April 2026                [Page 18]

Internet-Draft                   xxHash                     October 2025


   processInput_XXH3_64_129to240():
     u64 numChunks = inputLength >> 4;
     for (i = 0; i < 8; i++) {
       acc += mixStep(input[i*16:i*16+16], i*16, seed);
     }
     acc = avalanche(acc);
     for (i = 8; i < numChunks; i++) {
       acc += mixStep(input[i*16:i*16+16], (i-8)*16 + 3, seed);
     }
     acc += mixStep(input[inputLength-16:inputLength], 119, seed);

   processInput_XXH3_128_129to240():
     u64 numChunks = inputLength >> 5;
     for (i = 0; i < 4; i++) {
       mixTwoChunks(input[i*32:i*32+16], input[i*32+16:i*32+32], i*32, seed);
     }
     acc[0] = avalanche(acc[0]);
     acc[1] = avalanche(acc[1]);
     for (i = 4; i < numChunks; i++) {
       mixTwoChunks(input[i*32:i*32+16], input[i*32+16:i*32+32], (i-4)*32 + 3, seed);
     }
     // note that the half-chunk order and the seed is different here
     mixTwoChunks(input[inputLength-16:inputLength], input[inputLength-32:inputLength-16], 103, (u64)0 - seed);

6.3.  Step 3.  Finalization

   The final result is extracted from the accumulator(s).

   XXH3_64_17to240():
     return avalanche(acc);

   XXH3_128_17to240():
     u64 low = acc[0] + acc[1];
     u64 high = (acc[0] * PRIME64_1) + (acc[1] * PRIME64_4) + (((u64)inputLength - seed) * PRIME64_2);
     return {avalanche(low), // lower half
             (u64)0 - avalanche(high)}; // higher half

7.  XXH3 Algorithm Description (for large inputs)

   This algorithm is used for inputs larger than 240 bytes.  The
   internal hash state is stored inside 8 "accumulators", each one
   storing an unsigned 64-bit value.

7.1.  Step 1.  Initialize internal accumulators

   The accumulators are initialized to fixed constants:


Collet & Josefsson        Expires 14 April 2026                [Page 19]

Internet-Draft                   xxHash                     October 2025


   u64 acc[8] = {
     PRIME32_3, PRIME64_1, PRIME64_2, PRIME64_3,
     PRIME64_4, PRIME32_2, PRIME64_5, PRIME32_1};

7.2.  Step 2.  Process blocks

   The input is consumed and processed one full block at a time.  The
   size of the block depends on the length of the secret.  Specifically,
   a block consists of several 64-byte stripes.  The number of stripes
   per block is floor((secretLength-64)/8) . For the default 192-byte
   secret, there are 16 stripes in a block, and thus the block size is
   1024 bytes.

   secretLength = lengthInBytes(secret);    // default 192; at least 136
   stripesPerBlock = (secretLength-64) / 8; // default 16; at least 9
   blockSize = 64 * stripesPerBlock;        // default 1024; at least 576

   The process of processing a full block is called a _round_. It
   consists of the following two sub-steps:

7.2.1.  Step 2-1.  Process stripes in the block

   A stripe is evenly divided into 8 lanes, of 8 bytes each.  In an
   accumulation step, one stripe and a 64-byte contiguous segment of the
   secret are used to update the accumulators.  Each lane reads its
   associated 64-bit value using little-endian convention.

   The accumulation step applies the following procedure:

   accumulate(u64 stripe[8], size secretOffset):
     u64 secretWords[8] = secret[secretOffset:secretOffset+64];
     for (i = 0; i < 8; i++) {
       u64 value = stripe[i] xor secretWords[i];
       acc[i xor 1] = acc[i xor 1] + stripe[i];
       acc[i] = acc[i] + (u64)lowerHalf(value) * (u64)higherHalf(value);
                         // (value and 0xFFFFFFFF) * (value >> 32)
     }

   The accumulation step is repeated for all stripes in a block, using
   different segments of the secret, starting from the first 64 bytes
   for the first stripe, and offset by 8 bytes for each following round:

   round_accumulate(u8 block[blockSize]):
     for (n = 0; n < stripesPerBlock; n++) {
       u64 stripe[8] = block[n*64:n*64+64]; // 64 bytes = 8 u64s
       accumulate(stripe, n*8);
     }


Collet & Josefsson        Expires 14 April 2026                [Page 20]

Internet-Draft                   xxHash                     October 2025


7.2.2.  Step 2-2.  Scramble accumulators

   After the accumulation steps are finished for all stripes in the
   block, the accumulators are scrambled using the last 64 bytes of the
   secret.

   round_scramble():
     u64 secretWords[8] = secret[secretLength-64:secretLength];
     for (i = 0; i < 8; i++) {
       acc[i] = acc[i] xor (acc[i] >> 47);
       acc[i] = acc[i] xor secretWords[i];
       acc[i] = acc[i] * PRIME32_1;
     }

   A round is thus a round_accumulate followed by a round_scramble:

   round(u8 block[blockSize]):
     round_accumulate(block);
     round_scramble();

   Step 2 is looped to consume the input until there are less than or
   equal to blockSize bytes of input left.  Note that we leave the last
   block to the next step even if it is a full block.

7.2.3.  Step 3.  Process the last block and the last 64 bytes

   Accumulation steps are run for the stripes in the last block, except
   for the last stripe (whether it is full or not).  After that, run a
   final accumulation step by treating the last 64 bytes as a stripe.
   Note that the last 64 bytes might overlap with the second-to-last
   block.

   // len is the size of the last block (1 <= len <= blockSize)
   lastRound(u8 block[], size len, u64 lastStripe[8]):
     size nFullStripes = (len-1)/64;
     for (n = 0; n < nFullStripes; n++) {
       u64 stripe[8] = block[n*64:n*64+64];
       accumulate(stripe, n * 8);
     }
     accumulate(lastStripe, secretLength - 71);

7.3.  Step 4.  Finalization

   In the finalization step, a merging procedure is used to extract a
   single 64-bit value from the accumulators, using an initial seed
   value and a 64-byte segment of the secret.


Collet & Josefsson        Expires 14 April 2026                [Page 21]

Internet-Draft                   xxHash                     October 2025


   finalMerge(u64 initValue, size secretOffset):
     u64 secretWords[8] = secret[secretOffset:secretOffset+64];
     u64 result = initValue;
     for (i = 0; i < 4; i++) {
       // 64-bit by 64-bit multiplication to 128-bit full result
       u128 mulResult = (u128)(acc[i*2] xor secretWords[i*2]) *
                        (u128)(acc[i*2+1] xor secretWords[i*2+1]);
       result = result + (lowerHalf(mulResult) xor higherHalf(mulResult));
                         // (mulResult and 0xFFFFFFFFFFFFFFFF) xor (mulResult >> 64)
     }
     return avalanche(result);

   XXH3-128 runs the merging procedure twice for the two halves of the
   result, using different secret segments and different initial values
   derived from the total input length.  The XXH3-64 result is just the
   lower half of the XXH3-128 result.

   XXH3_64_large():
     return finalMerge((u64)inputLength * PRIME64_1, 11);

   XXH3_128_large():
     return {finalMerge((u64)inputLength * PRIME64_1, 11), // lower half
             finalMerge(~((u64)inputLength * PRIME64_2), secretLength - 75)}; // higher half

8.  Performance considerations

   The xxHash algorithms are simple and compact to implement.  They
   provide a system independent "fingerprint" or digest of a message of
   arbitrary length.

   The algorithm allows input to be streamed and processed in multiple
   steps.  In such case, an internal buffer is needed to ensure data is
   presented to the algorithm in full stripes.

   On 64-bit systems, the 64-bit variant XXH64 is generally faster to
   compute, so it is a recommended variant, even when only 32-bit are
   needed.

   On 32-bit systems though, positions are reversed: XXH64 performance
   is reduced, due to its usage of 64-bit arithmetic.  XXH32 becomes a
   faster variant.

   Finally, when vector operations are possible, XXH3 is likely the
   faster variant.


Collet & Josefsson        Expires 14 April 2026                [Page 22]

Internet-Draft                   xxHash                     October 2025


9.  Reference Implementation

   A reference library written in C is available at
   https://www.xxhash.com [XXHASH].  The web page also links to multiple
   other implementations written in many different languages.  It links
   to the github project page (https://github.com/Cyan4973/xxHash) where
   an issue board (https://github.com/Cyan4973/xxHash/issues) can be
   used for further public discussions on the topic.

10.  IANA Considerations

   None

11.  Security Considerations

   xxHash is not a cryptographic hash function, and has not been
   designed for use in any cryptographic setting.

   Implementations has to follow best practices to avoid security
   concerns, and users needs to continously re-evaulate implementations
   for security vulnerabilities.

12.  Notices

   This document is derived from
   https://github.com/Cyan4973/xxHash/blob/dev/doc/xxhash_spec.md commit
   d66a9cb6f223b1f15cceebee39bc07ed0bd4ad3d with notice below, with all
   changes are tracked at https://gitlab.com/jas/ietf-xxhash using Git.

   Version 0.2.0 (29/06/23)

   Version changes

   v0.2.0: added XXH3 specification, by Adrien Wu
   v0.1.1: added a note on rationale for selection of constants
   v0.1.0: initial release

   Copyright (c) Yann Collet

   Permission is granted to copy and distribute this document
   for any purpose and without charge,
   including translations into other languages
   and incorporation into compilations,
   provided that the copyright notice and this notice are preserved,
   and that any substantive changes or deletions from the original
   are clearly marked.
   Distribution of this document is unlimited.


Collet & Josefsson        Expires 14 April 2026                [Page 23]

Internet-Draft                   xxHash                     October 2025


13.  Informative References

   [XXHASH]   "xxHash", n.d., <https://xxhash.com/>.

Authors' Addresses

   Yann Collet
   Meta
   Email: cyan@meta.com


   Simon Josefsson (editor)
   Email: simon@josefsson.org


Collet & Josefsson        Expires 14 April 2026                [Page 24]