Notebook 04 — Polynomial rings and NTT#
Goal: understand why ML-KEM uses polynomials instead of matrices, and what makes the NTT possible.
So far, each LWE sample was n scalars (a, b). ML-KEM instead uses polynomials in R_q = Z_q[x]/(x^n + 1). A single polynomial stores n coefficients, and one polynomial multiplication is equivalent to a structured matrix-vector product. The ring structure is what enables the Number-Theoretic Transform.
import numpy as np
import time
import matplotlib.pyplot as plt
from pqc_edu.polyring import Poly
from pqc_edu.ntt import ntt, intt, poly_mul_ntt, Q, N
Hand-built example — multiply two polynomials in Z_17[x]/(x^4 + 1). The anti-cyclic rule x^4 = -1 means that any term with degree >= 4 wraps around with a sign flip.
# (1 + 2x) * (3 + x^3) in Z_17[x]/(x^4 + 1)
a = Poly(np.array([1, 2, 0, 0]), 17)
b = Poly(np.array([3, 0, 0, 1]), 17)
c = a * b
print('a =', a.coeffs)
print('b =', b.coeffs)
print('a*b =', c.coeffs)
# Verify by hand: (1+2x)*(3+x^3) = 3 + 6x + x^3 + 2x^4
# = 3 + 6x + x^3 + 2*(-1) (since x^4 = -1)
# = 1 + 6x + 0x^2 + x^3 in Z_17
print('expected: [1, 6, 0, 1]')
a = [1 2 0 0]
b = [3 0 0 1]
a*b = [1 6 0 1]
expected: [1, 6, 0, 1]
The NTT is a Fast Fourier Transform over Z_q. It turns polynomial multiplication (O(n^2)) into pointwise multiplication (O(n)) plus two transforms (O(n log n)). Requirement: q prime with n | (q - 1). For ML-KEM, q = 3329 = 13*256 + 1, so the 256-th roots of unity exist. The primitive root used is 17.
# Forward/inverse NTT roundtrip
rng = np.random.default_rng(42)
p = Poly(rng.integers(0, Q, N), Q)
p_recovered = intt(ntt(p))
print('NTT roundtrip matches:', p == p_recovered)
NTT roundtrip matches: True
# Timing: schoolbook vs NTT
rng = np.random.default_rng(0)
a = Poly(rng.integers(0, Q, N), Q)
b = Poly(rng.integers(0, Q, N), Q)
iters = 50
t0 = time.time()
for _ in range(iters):
_ = a * b # schoolbook
t_school = (time.time() - t0) / iters
t0 = time.time()
for _ in range(iters):
_ = poly_mul_ntt(a, b)
t_ntt = (time.time() - t0) / iters
print(f'schoolbook: {t_school*1000:.2f} ms/op')
print(f'NTT-based : {t_ntt*1000:.2f} ms/op')
print(f'speedup : {t_school/t_ntt:.1f}x')
schoolbook: 0.39 ms/op
NTT-based : 1.39 ms/op
speedup : 0.3x
Why did our NTT come out slower? The asymptotic cost of NTT-based multiplication is O(n log n) versus O(n²) for schoolbook. But our educational NTT is written as explicit Python loops for readability, while the schoolbook multiplication gets to use a single vectorized numpy broadcast per coefficient. For production Kyber/ML-KEM, the NTT is written in C or hand-optimized assembly with precomputed zetas — and in that setting it is roughly 10–20× faster than schoolbook at n=256. The punchline is asymptotic and algorithmic: doubling n barely slows the NTT (one extra butterfly layer) but quadruples the schoolbook cost. Our measurement reflects Python’s constant-factor cost, not the underlying algorithm.
Takeaway: in ML-KEM, operands are stored in NTT domain. Additions are coefficient-wise (same in both domains). Multiplications become pointwise_mul. You only pay the O(n log n) INTT at the end when you need the time-domain polynomial back. Next: how to turn this machinery into a cryptosystem -> 05_ring_lwe.ipynb.