March 17, 2026

twisted hessian curves over Fq[e]

introduction

most production ECC uses Weierstrass or Montgomery curves over prime fields. twisted hessian curves are a different family, defined by:

aX^3 + Y^3 + Z^3 = dXYZ

in projective coordinates. i came across a 2021 paper by Grini, Chillali, and Mouanis that does all the arithmetic over a local ring with a nilpotent element instead of a finite field. i decided to implement it in Rust.

the implementation is no_std, has zero runtime dependencies, and uses const generics to make the field modulus a compile-time parameter. i verified correctness against the test vectors in the paper.

the ring Fq[e]

instead of working over a plain finite field $F_q$ , i work over the local ring $F_q[\varepsilon]$ where $\varepsilon^2 = 0$ . elements take the form $a + b\varepsilon$ where both $a$ and $b$ are in $F_q$ . same dual number construction as automatic differentiation, applied to ECC.

since $\varepsilon^2 = 0$ , multiplication collapses to:

(a + b\varepsilon)(c + d\varepsilon) = ac + (ad + bc)\varepsilon

the $bd\varepsilon^2$ term vanishes, which means ring multiplication costs two field multiplications and two field additions. the ring is not a field though: elements like $0 + b\varepsilon$ are non-zero yet have no inverse. these zero divisors show up in the point addition formulas.

in Rust, this is a pair of field elements parameterized by the prime modulus as a const generic:

pub struct RingElement<const Q: u64> {
    a: Fq<Q>,  // constant part
    b: Fq<Q>,  // epsilon coefficient
}

an element is invertible iff its constant part $a \neq 0$ . the inverse is:

(a + b\varepsilon)^{-1} = a^{-1} - ba^{-2}\varepsilon

which you can verify by expanding $(a + b\varepsilon)(a^{-1} - ba^{-2}\varepsilon) = 1$ . i got this wrong initially by forgetting the sign on the $ba^{-2}\varepsilon$ term; the test suite caught it. in $F_{11}[\varepsilon]$ : $\text{inv}(5 + 3\varepsilon) = 9 + 10\varepsilon$ , since $5^{-1} = 9$ in $F_{11}$ and the $\varepsilon$ coefficient is $-3 \cdot 9^2 \equiv -3 \cdot 4 \equiv 10 \pmod{11}$ .

i compute field inverses via extended GCD, with all intermediate arithmetic using checked_add, checked_mul, checked_sub to avoid silent overflow. slower than Fermat’s little theorem ( $a^{-1} = a^{p-2}$ ), but correct and the performance difference is irrelevant at these field sizes.

curve arithmetic

a twisted hessian curve requires parameters $a$ and $d$ such that $a(27a - d^3)$ is invertible in the ring. this is the non-singularity condition, analogous to $\Delta \neq 0$ for weierstrass curves. if it fails, the implementation panics at construction time.

points live in projective coordinates $[X:Y:Z]$ with identity element $[0:-1:1]$ . i implemented point addition following “Algorithm 3.1” from the paper, which defines a primary formula and a fallback:

pub fn add(&self, other: &Self, a: RingElement<Q>) -> Self {
    // formula (1): X3 = X1^2*Y2*Z2 - X2^2*Y1*Z1
    let x3 = x1_squared.mul(other.y).mul(other.z)
        .sub(x2_squared.mul(self.y).mul(self.z));
    // Y3 and Z3 follow the same pattern

    if is_zero {
        // fallback formula (2) from Theorem 2.1
        // used when formula (1) produces [0:0:0]
    }
}

the paper needs two formulas, which surprised me. hessian curves are supposed to have a unified addition law, but operating over $F_q[\varepsilon]$ instead of a field changes the picture. there are zero divisors in the ring (i.e elements where $a = 0$ but $b \neq 0$ ), and these can cause the primary formula to degenerate to $[0:0:0]$ . when that happens, the fallback formula kicks in. if both formulas produce $[0:0:0]$ , both input points are invalid and the code panics.

negation is swapping Y and Z: $-[X:Y:Z] = [X:Z:Y]$ .

scalar multiplication is double-and-add. point doubling reuses the addition function with self as both arguments (i.e $P + P$ ), which works because the formulas don’t special-case doubling. windowed methods or Montgomery ladders would be faster, but for small fields the naive approach is fine. i benchmarked the projective operations using divan across 13 primes from 41 to 97, and the overhead of checked arithmetic is negligible at these sizes.

a worked example

here’s the test vector from Section 3.1 of the paper. we work over $F_5[\varepsilon]$ with curve parameters $a = 1 + \varepsilon$ , $d = 1 + \varepsilon$ .

first, check the validity condition. $27 \equiv 2 \pmod{5}$ , so:

27a = 2(1+\varepsilon) = 2 + 2\varepsilon

d^3 = (1+\varepsilon)^3 = 1 + 3\varepsilon

(since all higher powers of $\varepsilon$ vanish). then:

a(27a - d^3) = (1+\varepsilon)(1-\varepsilon) = 1

which is invertible, so the curve is valid.

take the point $P = [1, 2, 3+\varepsilon]$ . to verify it’s on the curve, plug into $aX^3 + Y^3 + Z^3 = dXYZ$ :

\text{LHS} = (1+\varepsilon)(1) + 8 + (3+\varepsilon)^3 = (1+\varepsilon) + 3 + (2+2\varepsilon) = 1 + 3\varepsilon \pmod{5}

\text{RHS} = (1+\varepsilon)(1)(2)(3+\varepsilon) = (1+\varepsilon)(6+2\varepsilon) = 6 + 8\varepsilon = 1 + 3\varepsilon \pmod{5}

where $(3+\varepsilon)^3 = (4+\varepsilon)(3+\varepsilon) = 12 + 7\varepsilon \equiv 2 + 2\varepsilon$ . computing scalar multiples:

$4P = [1, 4, 3+2\varepsilon]$
$5P = [1, 3+2\varepsilon, 4+3\varepsilon]$
$35P = [1, 3, 2]$
$45P = [0, -1, 1]$ (the identity)

the point has order 45. the paper uses this for Diffie-Hellman: Alice picks private key 4, Bob picks 35. Alice’s public key is $4P$ , Bob’s is $35P$ . Alice computes $4 \cdot (35P) = 140P$ , and $140 \bmod 45 = 5$ , so the shared secret is $5P = [1, 3+2\varepsilon, 4+3\varepsilon]$ . Bob arrives at the same point via $35 \cdot (4P)$ .

all of this matches the implementation’s output. verifying projective point arithmetic over a ring by hand is painful, the paper’s test vectors were essential.

diffie-hellman key exchange

i built an ECDH module on top of the curve. given a generator point with known order:

let curve = TwistedHessianCurve::new(a, d);
let dh = DiffieHellman::new(curve, generator, order);

let (_, alice_public) = dh.generate_keypair(alice_private);
let (_, bob_public) = dh.generate_keypair(bob_private);

let alice_shared = dh.compute_shared_secret(alice_private, &bob_public);
let bob_shared = dh.compute_shared_secret(bob_private, &alice_public);
// alice_shared == bob_shared

the constructor asserts order * generator == identity, which prevents passing in wrong parameters. compile-time enforcement is not possible since the order depends on the curve instance.

the point order itself is computed with a brute-force search up to $Q^2$ , which is the maximum possible group order for a curve over $F_q[\varepsilon]$ . for the test vector above, that means checking up to 25 scalar multiples, which finishes instantly. for larger fields this becomes the bottleneck, and computing group order efficiently requires Schoof’s algorithm, which is out of scope here.

design decisions

some of the trade-offs i made:

const generics for the modulus. Fq<5> and Fq<11> are distinct types at compile time. mixing field elements across different moduli is the kind of bug that compiles fine and silently gives wrong answers, so having the type system reject it is worth the ergonomic cost.

#![deny(clippy::arithmetic_side_effects)] everywhere. all arithmetic goes through checked_add, checked_mul, checked_sub with explicit panics on overflow. the clippy lint rejects any raw +, -, * on integers. verbose, but i’d rather have a loud panic than a silent wrong answer in a crypto library.

no_std by default. the library only enables std during tests via #![cfg_attr(not(test), no_std)]. zero runtime dependencies (i.e empty [dependencies] section in Cargo.toml). divan for benchmarks, proptest for property-based testing, and rand are dev-dependencies only. i also use crabtime to generate benchmark variants across different field sizes at compile time, which avoids writing the same bench function 13 times.

property-based testing for field inverses. rather than hand-picking test cases, i use proptest to verify $a \cdot a^{-1} = 1$ for every non-zero element in $F_{7919}$ . 7919 is large enough that bugs in the modular arithmetic surface quickly, but small enough that the test runs in under a second. i also have algebraic property tests for the ring (i.e commutativity, associativity, distributivity, $\varepsilon^2 = 0$ ), which caught a subtle issue with the subtraction implementation early on.

limitations and future work

point order computation is brute-force, iterating up to $Q^2$ . for a curve over $F_5$ , that’s 25 iterations. for $F_{97}$ , it’s 9409. anything beyond small primes becomes impractical. scalar multiplication is naive double-and-add rather than windowed or multi-scalar multiplication techniques. the u64 representation caps the field size at around $2^{63}$ , which is far too small for real-world cryptography where primes are typically 256 bits.

that said, the goal was to learn the math and verify correctness against the paper’s test vectors. the code is correct for small fields, and that’s what i was after.

i want to port this to use arkworks field traits, which would allow plugging in arbitrary field backends and support for larger primes. optimizing scalar multiplication with windowed methods is also on the list. i’m also curious whether the $\varepsilon$ component can carry auxiliary information through the computation without affecting the constant part, similar to how dual numbers propagate derivatives in AD.

Thanks for reading!