---
eip: 7885
title: Precompile for NTT operations
description: Proposal to add a precompiled contract that performs number theoretical transformation (NTT) and inverse (InvNTT).
author: Renaud Dubois (@rdubois-crypto), Simon Masson (@simonmasson), Yoon Hyoung Lee (@yhl125)
discussions-to: https://ethereum-magicians.org/t/eip-7885-precompile-for-ntt-operations/22895
status: Draft
type: Standards Track
category: Core
created: 2025-02-12
---

## Abstract

This proposal creates a precompiled contract that performs NTT and Inverse NTT transformations. This provides a way to efficiently perform fast polynomial multiplication for post-quantum and STARK cryptography.

## Motivation

With the recent advances in quantum computing, there are increased concerns for the quantum threat against Ethereum. Today ECDSA is the EOA signature algorithms, which is vulnerable to attacks by quantum computers. Efficient replacement algorithms use polynomial multiplication as the core operation. Once NTT and Inverse NTT are available, the remaining of the verification algorithm is trivial. Choosing to integrate NTT and InvNTT instead of a specific algorithm provides agility, as DILITHIUM or FALCON or any equivalent can be implemented with a modest cost from those operators. NTT is also of interest to speed-up STARK verifiers. This single operator would thus benefit to both the Ethereum scaling and post-quantum threat mitigation.

## Specification

### Constants

<!-- TODO -->

| Name                | Value | Comment            |
|---------------------|-------|--------------------|
| NTT_FW              | TBD   | precompile address |
| NTT_INV             | TBD   | precompile address |
| NTT_VECMULMOD       | TBD   | precompile address |
| NTT_VECADDMOD       | TBD   | precompile address |

We introduce *four* separate precompiles to perform the following operations:

- NTT_FW - to perform the forward NTT transformation (Negative wrap convolution) with a gas cost of `600` gas,
- NTT_INV - to perform the inverse NTT transformation (Negative wrap convolution) with a gas cost of `600` gas,
- NTT_VECMULMOD - to perform vectorized modular multiplication with a gas cost formula defined in the corresponding section,
- NTT_VECADDMOD - to perform vectorized modular addition with a gas cost formula defined in the corresponding section.

### Field parameters

The NTT_FW and NTT_INV are fully defined by the following set of parameters:
Let $R$ be a cyclotomic ring of the form $R=\mathbb F_q[X]/(X^n+1)$. In these notations,

- $n$ is the degree and is a power of 2,
- $\mathbb F_q$ is the prime field where $q \equiv 1 \mod 2n$,
- $\omega$ is a $n$-th root of unity in $\mathbb F_q$,
- $\psi$ is a $2n$-th root of unity in $\mathbb F_q$.

Any element $a \in R$ is a polynomial of degree at most $n-1$ with integer coefficients, written
as $a=\sum_{i=0}^{n-1} a_iX^i$

### NTT_FW

The NTT transformation is described by the following algorithm.

**Input:** A vector $a = (a[0], a[1], \dots, a[n-1]) \in \mathbb F_q^n$ in standard order, where $q$ is a prime such that $q \equiv 1 \mod 2n$ and $n$ is a power of two, and a precomputed table $\Psi_\text{rev} \in \mathbb{Z}_q^n$ storing powers of $\psi$ in bit-reversed order.

**Output:** $a \leftarrow \text{NTT\_FW}(a)$ in bit-reversed order.

```plaintext
t ← n
for m = 1 to n-1 by 2m do
    t ← t / 2
    for i = 0 to m-1 do
        j1 ← 2 ⋅ i ⋅ t
        j2 ← j1 + t - 1
        S ← Ψrev[m + i]
        for j = j1 to j2 do
            U ← a[j]
            V ← a[j + t] ⋅ S
            a[j] ← (U + V) mod q
            a[j + t] ← (U - V) mod q
        end for
    end for
end for
return a
```

### NTT_INV

The Inverse NTT is described by the following algorithm.

**Input:** A vector $a = (a[0], a[1], \dots, a[n-1]) \in \mathbb F_q^n$ in bit-reversed order, where $q$ is a prime such that $q \equiv 1 \mod 2n$ and $n$ is a power of two, and a precomputed table $\Psi^{-1}_\text{rev} \in \mathbb F_q^n$ storing powers of $\psi^{-1}$ in bit-reversed order.

**Output:** $a \leftarrow \text{NTT\_INV}(a)$ in standard order.

```plaintext
t ← 1
for m = n to 1 by m/2 do
    j1 ← 0
    h ← m / 2
    for i = 0 to h-1 do
        j2 ← j1 + t - 1
        S ← Ψ⁻¹rev[h + i]
        for j = j1 to j2 do
            U ← a[j]
            V ← a[j + t]
            a[j] ← (U + V) mod q
            a[j + t] ← (U - V) ⋅ S mod q
        end for
        j1 ← j1 + 2t
    end for
    t ← 2t
end for
for j = 0 to n-1 do
    a[j] ← a[j] ⋅ n⁻¹ mod q
end for
return a
```

### NTT_VECMULMOD

The NTT_VECMULMOD is functions similarly to SIMD, but operates with larger input and output sizes.

**Input:** Two vectors $a = (a[0], a[1], \dots, a[n-1]), b=(b[0], b[1], \dots, b[n-1]) \in \mathbb F_q^n$ where $n$ and $q$ are defined above.

**Output:** The element-wise product $(a[0]\cdot b[0] \mod q, a[1]\cdot b[1]\mod q, \dots, a[n-1]\cdot b[n-1] \mod q)$.

**Gas cost:** Denoting $k$ to be the smallest power of $2$ larger than $\log_2(q)$, the gas cost of this operation is $k\log_2(n) / 8$.

### NTT_VECADDMOD

The NTT_VECMULMOD is similar to SIMD in the functioning, but operates with larger sizes in input and output.

**Input:** Two vectors $a = (a[0], a[1], \dots, a[n-1]), b=(b[0], b[1], \dots, b[n-1]) \in \mathbb F_q^n$ where $n$ and $q$ are defined above.

**Output:** The element-wise addition $(a[0]+ b[0] \mod q, a[1]+ b[1]\mod q, \dots, a[n-1]+ b[n-1] \mod q)$.

**Gas cost:** Denoting $k$ to be the smallest power of $2$ larger than $\log_2(q)$, the gas cost of this operation is $k\log_2(n) /32$.

## Rationale

If $f$ and $g$ are two polynomials of $R$, then
$f\times g= \text{NTT\_INV}(\text{NTT\_VECMULMOD}(
\text{NTT\_FW}(a), \text{NTT\_FW}(b)))$ is equal to the product of $f$ and $g$ in $R$. The algorithm has a complexity of $n \log_2n$ rather than $n^2$ with the classical schoolbook multiplication algorithm.

### Gas Cost Analysis

The gas cost model for EIP-7885 precompiles is designed to target approximately 50-55 mgas/s performance, consistent with existing precompiled contracts like ECRECOVER. Two implementations with different gas costs are available:

#### NTT Precompiles (0x12, 0x13) - Scheme-Specific Cost Model

**Gas Costs by Implementation**:

| Scheme             | Ring Degree | nocgo NTT_FW | nocgo NTT_INV | cgo NTT_FW | cgo NTT_INV |
| ------------------ | ----------- | ------------ | ------------- | ---------- | ----------- |
| Falcon-512         | 512         | 790 gas      | 790 gas       | 500 gas    | 500 gas     |
| Falcon-1024        | 1024        | 1,750 gas    | 1,750 gas     | 1,080 gas  | 1,080 gas   |
| ML-DSA (Dilithium) | 256         | 220 gas      | 270 gas       | 256 gas    | 340 gas     |

**Pure Go (nocgo)**: Zero-dependency implementation.

**liboqs (cgo)**: Offers 36-38% lower gas costs for NTT operations.

**Rationale**: Gas costs are calibrated per cryptographic scheme based on actual execution performance. NTT computation complexity varies by ring degree and modulus size, with costs optimized to maintain consistent throughput across different post-quantum schemes.

#### Vector Operations (0x14, 0x15) - Dynamic Cost Model

**VECMULMOD Gas Cost**: `ceil(0.32 × N)`

**VECADDMOD Gas Cost**: `ceil(0.3 × N)`

**Cost Components**:

- Linear scaling with vector length N (ring degree)
- VECMULMOD: 0.32 gas per element
- VECADDMOD: 0.3 gas per element

**Benchmark Validation**: The dynamic cost model achieves target performance:

- Falcon-512 VECMULMOD (164 gas): 56.17 mgas/s (~2.9μs)
- Falcon-512 VECADDMOD (154 gas): 54.45 mgas/s (~2.8μs)
- Falcon-1024 VECMULMOD (328 gas): 54.67 mgas/s (~6.0μs)
- Falcon-1024 VECADDMOD (308 gas): 52.84 mgas/s (~5.8μs)
- ML-DSA VECMULMOD (82 gas): 41.89 mgas/s (~2.0μs)
- ML-DSA VECADDMOD (77 gas): 46.81 mgas/s (~1.6μs)

The gas cost formulas accurately reflect actual execution costs while maintaining ~50-55 mgas/s performance target across all tested cryptographic standards.

### Fields of interest

The implementation applies for many fields of interest for cryptography. In particular, the design applies for:

- FALCON: $q=3 \cdot 2^{12}+1$ (one of the NIST winners for post-quantum signature scheme),
- DILITHIUM: $q=2^{23}-2^{13}+1$ (one of the NIST winners for post-quantum signature scheme),
- KYBER: $q=13 \cdot 2^8+1$ (one of the NIST winners for post-quantum key encapsulation mechanism),
- Babybear: $q=15 \cdot 2^{27}+1$ (Risc0),
- Goldilocks: $q=2^{64}-2^{32}+1$ (Polygon's Plonky2),
- M31: $q=2^{31}-1$ (Circle STARKS, STwo, Plonky3),
- StarkCurve: $q=2^{251}+17 \cdot 2^{192}+1$

### Benchmarks

#### Pure solidity

To illustrate the interest of the precompile, the assets provide the measured gas const for a single NTT and extrapolates the minimal gas cost taking into account the required number of NTT_FW and NTT_INV. The provided assets use pure Yul optimizations, with memory access hacks. It is unlikely that more than one order of magnitude could be spared on such a minimal code. 

|Use case| Parameters                   | single NTT gas cost         |  Required NTT(FW/INV)    | Estimated NTT/Full cost |
|--|------------------------|---------------------|---------------------|---|
|Falcon| $q=12289, n=512$       | 1.8 M | 1 NTTFW+1 NTTINV |3.6 M| 
|Dilithium| $q=2^{23}-2^{13}+1, n=256$| 460K | 4 NTTFW +1 NTTINV|2.3M|

Falcon cost has been measured over a full implementation and is compliant to the estimation. Dilithium cost is evaluated assuming

This demonstrates that using pure solidity enables L2s with low gas fees to experiment with FALCON in the short term, whereas it is too expensive to do so on L1.
Adopting this EIP, the signature verification of Falcon can be reduced to **1500** gas, and a similar result is expected for Dilithium.
Adopting the hash function as a separate EIP would enable a gas verification cost of 2000 gas.
This is in line with the ratio looking at SUPERCOP implementations.

#### Native Client Implementation

Two implementations have been developed for op-geth with four precompiled contracts at addresses `0x12`, `0x13`, `0x14`, and `0x15`:

**Pure Go Implementation (nocgo)** - Default:

- Zero external dependencies
- Gas costs: Falcon-512 (790 gas), Falcon-1024 (1,750 gas), ML-DSA NTT_FW (220 gas), NTT_INV (270 gas)
- VECMULMOD: `ceil(0.32 × N)` gas, VECADDMOD: `ceil(0.3 × N)` gas

**liboqs Implementation (cgo)** - High Performance:

- Uses liboqs library via CGO
- 36-38% lower gas costs for NTT operations
- Gas costs: Falcon-512 (500 gas), Falcon-1024 (1,080 gas), ML-DSA NTT_FW (256 gas), NTT_INV (340 gas)
- Same VECMULMOD/VECADDMOD costs as nocgo variant

Both implementations support:

- NTT_FW (0x12): Forward NTT transformation
- NTT_INV (0x13): Inverse NTT transformation
- VECMULMOD (0x14): Element-wise modular multiplication
- VECADDMOD (0x15): Element-wise modular addition
- Input format: ring_degree (4 bytes) + modulus (8 bytes) + coefficients (Falcon: uint16×N, ML-DSA: int32×N)

Benchmark results on Intel(R) Xeon(R) CPU @ 2.20GHz (liboqs implementation):

```
BenchmarkPrecompiledNTT_FW/NTT_FW-Falcon-512-Gas=500-4
    377011      9411 ns/op      500 gas/op        53.12 mgas/s
BenchmarkPrecompiledNTT_FW/NTT_FW-Falcon-1024-Gas=1080-4
    176936     20419 ns/op     1080 gas/op        52.89 mgas/s
BenchmarkPrecompiledNTT_FW/NTT_FW-ML-DSA-256-Gas=256-4
    740838      4785 ns/op      256 gas/op        53.49 mgas/s

BenchmarkPrecompiledNTT_INV/NTT_INV-Falcon-512-Gas=500-4
    372236      9374 ns/op      500 gas/op        53.33 mgas/s
BenchmarkPrecompiledNTT_INV/NTT_INV-Falcon-1024-Gas=1080-4
    176300     20316 ns/op     1080 gas/op        53.15 mgas/s
BenchmarkPrecompiledNTT_INV/NTT_INV-ML-DSA-256-Gas=340-4
    558224      6396 ns/op      340 gas/op        53.15 mgas/s

BenchmarkPrecompiledNTT_VECMULMOD/VECMULMOD-Falcon-512-Gas=164-4
   1237585      2919 ns/op      164 gas/op        56.17 mgas/s
BenchmarkPrecompiledNTT_VECMULMOD/VECMULMOD-Falcon-1024-Gas=328-4
    592540      5999 ns/op      328 gas/op        54.67 mgas/s
BenchmarkPrecompiledNTT_VECMULMOD/VECMULMOD-ML-DSA-256-Gas=82-4
   1726333      1957 ns/op       82 gas/op        41.89 mgas/s

BenchmarkPrecompiledNTT_VECADDMOD/VECADDMOD-Falcon-512-Gas=154-4
   1273167      2828 ns/op      154 gas/op        54.45 mgas/s
BenchmarkPrecompiledNTT_VECADDMOD/VECADDMOD-Falcon-1024-Gas=308-4
    606345      5829 ns/op      308 gas/op        52.84 mgas/s
BenchmarkPrecompiledNTT_VECADDMOD/VECADDMOD-ML-DSA-256-Gas=77-4
   2178388      1645 ns/op       77 gas/op        46.81 mgas/s

BenchmarkPrecompiledEcrecover/-Gas=3000-4
   65388      57182 ns/op     3000 gas/op        52.46 mgas/s
```

The benchmarks demonstrate that all NTT precompiles maintain competitive or better throughput compared to existing precompiled contracts like ECRECOVER, processing approximately 52-56 million gas per second. Both implementations (nocgo and cgo) provide efficient support for multiple post-quantum cryptographic schemes including Falcon-512/1024 and ML-DSA/Dilithium, with the liboqs variant offering ~36-38% lower gas costs for NTT operations.

#### End-to-End Signature Verification with Precompiles

Integration testing demonstrates the practical impact of NTT precompiles on full signature verification. Tests were conducted using modified ZKNOX implementations (ETHFALCON and ETHDILITHIUM) against an op-geth client with NTT precompile support.

**Test Environment**: OP-Stack testnet with NTT precompiles enabled.

| Algorithm | Verification Method | Pure Solidity | With Precompile | Gas Saved |
|-----------|-------------------|---------------|-----------------|-----------|
| ETHFALCON (Falcon-512) | Direct | 1,800,000 | 479,341 | 1,320,659 (73.4%) |
| ETHDILITHIUM (ML-DSA) | PKContract-based | 9,746,410 | 7,618,412 | 2,127,998 |
| ETHDILITHIUM (ML-DSA) | Direct (struct) | 7,874,354 | 5,732,354 | 2,142,000 |

**Key Findings**:

- **ETHFALCON**: Achieves 73.4% gas reduction with NTT precompiles, reducing signature verification cost from 1.8M gas to under 480K gas.
- **ETHDILITHIUM**: NTT precompiles save over 2.1M gas per signature verification.

These results validate that NTT precompiles significantly reduce gas costs for post-quantum signature verification, making lattice-based cryptography more practical for Ethereum applications.

## Backwards Compatibility

There are no backward compatibility questions.

## Test Cases

The reference implementation includes comprehensive test coverage across multiple layers:

### NTT Precompile Tests (0x12)

**Malformed Input Tests** - 8 test cases covering:

- Invalid operation codes (values other than 0 or 1)
- Invalid ring degrees (non-power-of-2, < 16)
- Zero or non-NTT-friendly moduli (not congruent to 1 mod 2N)
- Coefficients exceeding modulus bounds
- Input length mismatches

**Functional Tests**:

- Forward NTT transformation with ring degree 16, modulus 97
- Inverse NTT transformation ensuring `INTT(NTT(x)) = x`
- Round-trip validation for data integrity

**Crypto Standards Benchmarks**:

- Falcon-512: n=512, q=12289
- Kyber-128: n=256, q=3329
- Dilithium-256: n=256, q=8380417

### Vector Operations Tests (0x13, 0x14)

**Unified Malformed Input Tests** - 7 test cases covering:

- Invalid ring degrees for both VECMULMOD and VECADDMOD
- Zero or non-NTT-friendly moduli
- Input length mismatches (expecting 2\*N vectors)
- Coefficient bounds validation

**Functional Tests**:

- Element-wise multiplication: `result[i] = (a[i] * b[i]) mod q`
- Element-wise addition: `result[i] = (a[i] + b[i]) mod q`
- Validation with small test vectors (ring degree 16, modulus 97)

**Crypto Standards Benchmarks** - Performance validation with:

- Falcon-512 parameters (VECMULMOD: 164 gas, VECADDMOD: 154 gas)
- Falcon-1024 parameters (VECMULMOD: 328 gas, VECADDMOD: 308 gas)
- ML-DSA-256 parameters (VECMULMOD: 82 gas, VECADDMOD: 77 gas)

Benchmark data are available in the EIP assets:

**Pure Go Implementation (nocgo):**

- [NTT Precompile Benchmark Results](../assets/eip-7885/op-geth/nocgo/benchmark_results/BenchmarkPrecompiledNTT)
- [ECRECOVER Benchmark Results](../assets/eip-7885/op-geth/nocgo/benchmark_results/BenchmarkPrecompiledEcrecover)

**liboqs Implementation (cgo):**

- [NTT Precompile Benchmark Results](../assets/eip-7885/op-geth/cgo/benchmark_results/BenchmarkPrecompiledNTT)
- [ECRECOVER Benchmark Results](../assets/eip-7885/op-geth/cgo/benchmark_results/BenchmarkPrecompiledEcrecover)

## Reference Implementation

- [Python reference implementation](../assets/eip-7885/pythonref)
- [Solidity reference implementation](../assets/eip-7885/solidity/src/ZKNOX_NTT.sol)
- OP-Geth implementations in the assets of this EIP:
  - [Pure Go (nocgo)](../assets/eip-7885/op-geth/nocgo)
  - [liboqs (cgo)](../assets/eip-7885/op-geth/cgo)

All implementations have been validated over a large base of reference vectors, and implementing both FALCON and DILITHIUM algorithms as demonstration of the usefulness of the precompile.

## Security Considerations

<!-- TODO -->

Needs discussion.

## Copyright

Copyright and related rights waived via [CC0](../LICENSE.md).
