Internet-Draft EESP Stateless Encryption July 2025
Xia & Jiang Expires 8 January 2026 [Page]
Workgroup:
IPSECME Working Group
Internet-Draft:
draft-xia-ipsecme-eesp-stateless-encryption-01
Published:
Intended Status:
Standards Track
Expires:
Authors:
L. Xia
Huawei Technologies
W. Jiang
Huawei Technologies

Stateless Encryption Scheme of Enhanced Encapsulating Security Payload (EESP)

Abstract

This draft first introduces several use cases for stateless encryption, analyzes and compares some existing stateless encryption schemes in the industry, and then attempts to propose a general and flexible stateless encryption scheme based on the summarized requirements.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on 8 January 2026.

Table of Contents

1. Introduction

Recently, with the emergence of more new scenarios such as high-performance cloud services, AI large model computing, and 5G mobile backhaul networks, higher requirements have been put forward for the hardware friendliness, performance, and flexibility of the IPsec ESP protocol. A new protocol design, EESP [I-D.ietf-ipsecme-eesp] [I-D.ietf-ipsecme-eesp-ikev2], is being discussed and formulated. EESP focuses on solving issues such as introducing more fine-grained sub-child-SAs, adapting the ESP header and trailer format, and allowing parts of the transport layer header to be unencrypted, and implementing flexible expansion of EESP new features through options.

In addition to the issues listed above that are being addressed, stateless encryption is also a very important point. Its basic idea is to dynamically calculate data keys based on a small number of master keys (for AES-GCM, the encryption key and authentication key are combined), which helps optimize hardware resource limitations, performance optimization, and key negotiation complexity in large-scale IPsec session scenarios. This draft first introduces several use cases for stateless encryption, analyzes and compares some existing stateless encryption schemes in the industry, and then attempts to propose a general and flexible stateless encryption scheme based on the summarized requirements.

2. Use Cases

2.1. General Computing of Cloud Service

Public cloud services provide IPsec VPN access for massive users, and the servers in their infrastructure need to support massive IPsec session access. If hardware supports IPsec, the hardware should support session-based encryption and decryption, and the data keys of different sessions are isolated. The server needs to maintain the security connection context between the server and a large number of clients, and the hardware with limited memory cannot store the huge context. Note that the client and server do not belong to the same trusted domain in this case.

The stateless encryption scheme in the [PSP] solution proposed by Google is used to address the above hardware memory overhead problem. Its main principle is to derive a data key based on the master key on the server side, and the client side obtains the data key through an out-of-band method. It has:

  • Pros: Save half of total session contexts. Furthermore, since the master key is owned by server and not shared, key leakage affects only one server;

  • Cons: When a large number of new sessions are created, the data key negotiation is along the out of band slow path in real time, the first packet transmit will be delayed, and which results in performance degrade.

2.2. Cluster Communication in HPC Network

As shown in the below figure, encrypted communication is required between different instances of large-scale HPC jobs, the security session number is at the scale of O(M * N * N). So, an efficient security context management mechanism is required to solve the problem of large-scale security sessions. Note that all communication instances of a HPC job belong to the same trusted domain.

                           M Jobs
        +------------------------------------------+
        | +----------------------------------------+-+
        | | +--------------------------------------+-+-+
        | | |               Job 0                  | | |
        | | |  +---------+ +---------+ +---------+ | | |
        | | |  |Instance1| |Instance2| |Instance3| | | |
        | | |  +---------+ +---------+ +---------+ | | |
        +-+-+--------------------------------------+ | |
          +-+----------------------------------------+ |
            +------------------+-----------------------+
                               |
                               |Deploy Jobs
                               |to Server Cluster
                               |
+------------------------------V--------------------------------------+
|                        Server Cluster                               |
|                                                                     |
| +-----------+             +-----------+             +-----------+   |
| |+----------++            |+----------++            |+----------++  |
| ||+---------+++           ||+---------+++           ||+---------+++ |
| |||Instancei||| Ciphertext|||Instancej||| Ciphertext|||Instancek||| |
| |||  Keyi   ||<----------->||  Keyj   ||<----------->||  Keyk   ||| |
| +++---------+||           +++---------+||           +++---------+|| |
|  ++----------+|            ++----------+|            ++----------+| |
|   +----+------+             +-----------+             +-------+---+ |
|        |                    Ciphertext                        |     |
|        +------------------------------------------------------+     |
|                                                                     |
+---------------------------------------------------------------------+



Figure 1: Encrypted Communication for Large Scale HPC Networks

The stateless encryption scheme defined by [UEC_TSS] can be used to solve the above problem. The main principle is that all communication instances of a HPC job belong to the same trust domain and share the same master key for both receiving and sending directions. It has:

  • Pros:

    • Better than Google PSP,it saves all security session contexts;

    • The communication parties do not need to store data keys, and the increase of the number of instances and connections of the HPC job does not affect the number of security contexts;

    • Without out of band slow path data key negotiation, the first packet delay is small;

    • Data keys can be updated through the TSC.epoch.

  • Cons:

    • Master key leakage affects the entire trusted domain;

    • The context content can be generated based on the SSI / Source IP / Destination IP field. Although the context content is flexible, the calculation overhead increases.

2.3. NIC/DPU Pool for General Computing

To cope with large-scale traffic access (e.g., computing server access to storage networks) and efficiently utilize network card resources, NIC resource pooling is an effective solution. For north-south traffic from client access to servers, the NIC resource pool must be transparent to the application, allowing a client to access resources behind any NIC in the pool. When using encrypted connections, all NICs must share the same key for a client's access. At this point, the NICs in a resource pool belong to the same trust domain, so stateless encryption sharing the master key is applicable. This saves data key synchronization between NICs and reduces the storage of security sessions and data keys on them in scenarios with a large number of secure client connections. The client obtains the data key for this encrypted connection through an out-of-band method, which can be derived from the master key and context. Encrypted connections and contexts can be isolated based on flows or VM instances. As shown in the figure below:

                      VM Pool
+--------------------------------------------------+
|                                                  |
|       +----+  +----+  +----+  +----+             |
|       | VM |  | VM |  | VM |  | VM |             |
|       +----+  +----+  +----+  +----+             |
|                                                  |
|    +----------------------------------+          |
|    |                                  |          |
|    |  NIC pool with shared master key |          |
|    |       and security context       |          |
|    |   +-----+  +-----+     +-----+   |          |
|    |   | NIC |  | NIC | ... | NIC |   |          |
|    |   +---X\*  +-/-*-+     +---/++   |          |
|    |      / \ \\ /  |\       --/ |    |          |
|    +------/--\-/X\--+-\\-----//--+----+          |
+----------/---\/---\\+---\---/----+---------------+
           /   /\     \\-  \ /     |
          /   /  Ciphertext X\     |
          /  /    \-  |   \X  \    |
         / //  --- \  |  // \\ \   |
         // ---    \  | /     \\\\ |
        //--        \ |/        \\\|
   +--------+   +----\*--+   +----\|--+
   | client |   | client |   | client |
   +--------+   +--------+   +--------+


Figure 2: Encrypted Communication for NIC Pool

Similarly, the NIC resource pool can also be used for east-west traffic access between VMs. In this case, all NICs are in the same security domain and can share a master key, and different data keys can be dynamically generated based on different encryption connection contexts.

2.4. AI Computing

As shown in the figure below, in a AI computing network, a computing task is collaboratively executed by a group of CPUs & XPUs located in the same trust domain or across trust domains (in the case of cross-trust domains, they are interconnected as proxies through DPU). For CPUs & XPUs within the same trust domain, stateless encryption sharing the same master key can eliminate the complexity and latency of key negotiation between chips. For interconnection across trust domains, the DPU needs to perform encryption connection proxy functions between two trust domains (local trusted domain and global trusted domain). At this time, the DPU simultaneously possesses the master keys of the two trust domains, calculates the data key for intra-domain communication in each domain based on its context, and then uses the calculated two data keys to complete the secure connection proxy across trust domains.

                +-----------------------------+
                |         Trusted Domain 1    |
                | +-----+ +-----+     +-----+ |
                | | CPU | | CPU | ... | CPU | |
                | +-----+ +-----+     +-----+ |
                | +-----+ +-----+     +-----+ |
                | | XPU | | XPU | ... | XPU | |
                | +-----+ +-----+     +-----+ |
                | +-----+ +-----+     +-----+ |
                | | XPU | | XPU | ... | XPU | |
                | +-----+ +-----+     +-----+ |
                ++----------+-----+----------++
                 |DPU/Switch|     |DPU/Switch|
                 +-----+----+     +------+---+
                       |   Global Trusted|Domain
       +---------------+-----------------+------------------+
 +-----+----+     +----+-----+       +---+------+    +------+---+
 |DPU/Switch|     |DPU/Switch|       |DPU/Switch|    |DPU/Switch|
++----------+-----+----------++     ++----------+----+----------+-+
| +-----+ +-----+     +-----+ |     | +-----+ +-----+     +-----+ |
| | CPU | | CPU | ... | CPU | |     | | CPU | | CPU | ... | CPU | |
| +-----+ +-----+     +-----+ |     | +-----+ +-----+     +-----+ |
| +-----+ +-----+     +-----+ |     | +-----+ +-----+     +-----+ |
| | XPU | | XPU | ... | XPU | |     | | XPU | | XPU | ... | XPU | |
| +-----+ +-----+     +-----+ |     | +-----+ +-----+     +-----+ |
| +-----+ +-----+     +-----+ |     | +-----+ +-----+     +-----+ |
| | XPU | | XPU | ... | XPU | |     | | XPU | | XPU | ... | XPU | |
| +-----+ +-----+     +-----+ |     | +-----+ +-----+     +-----+ |
|         Trusted Domain 2    |     |         Trusted Domain 3    |
+-----------------------------+     +-----------------------------+


Figure 3: Encrypted Communication for AI Computing Network

3. Requirement Summary

Based on the above use cases, the requirements for a general and flexible stateless encryption scheme are as follows:

4. EESP Stateless Encryption Scheme

TBD.

5. Security Considerations

TBD.

6. IANA Considerations

TBD.

7. Informative References

[PSP]
"PSP Architecture Specification", n.d., <https://github.com/google/psp/blob/main/doc/PSP_Arch_Spec.pdf>.
[UEC_TSS]
"Ultra Ethernet Specification v1.0", n.d., <https://ultraethernet.org/wp-content/uploads/sites/20/2025/06/UE-Specification-6.11.25.pdf>.
[I-D.ietf-ipsecme-eesp]
Klassert, S., Antony, A., and C. Hopps, "Enhanced Encapsulating Security Payload (EESP)", Work in Progress, Internet-Draft, draft-ietf-ipsecme-eesp-01, , <https://datatracker.ietf.org/doc/html/draft-ietf-ipsecme-eesp-01>.
[I-D.ietf-ipsecme-eesp-ikev2]
Klassert, S., Antony, A., Brunner, T., and V. Smyslov, "IKEv2 negotiation for Enhanced Encapsulating Security Payload (EESP)", Work in Progress, Internet-Draft, draft-ietf-ipsecme-eesp-ikev2-00, , <https://datatracker.ietf.org/doc/html/draft-ietf-ipsecme-eesp-ikev2-00>.

Authors' Addresses

Liang Xia
Huawei Technologies
Weiyu Jiang
Huawei Technologies