Things built on the tower substrate — in silicon, over radio, and inside neural networks. The lesson that repeats across everything below: the tower's properties must be wired in by construction, not hoped for emergently. What worked and what didn't are both documented.
Three primorial rings on a $20 Tang Nano 20K FPGA. Each design has per-channel GF(p) multipliers, CRT reconstruction, and ECC testing — every ring element round-tripped and multiplied, every codeword's error correction exercised. Each of the three proofs runs twice: exhaustively in RTL simulation, then on the device at the 27 MHz board clock with the verdict on the LEDs.
Each rung adds one channel, widens every datapath, and adds a reduction stage: roughly +40–60% logic per rung (2,922 → 4,670 → 6,849 LUT4). Three rungs deep, every resource axis stays under half the device. Field-only rings are lean: a historical fattened-ring design used 31% of the FPGA for multiply alone. The silicon builds disable the toolchain's carry-chain inference: running these exhaustive proofs on the device exposed inferred ALU carry cells computing wrong somewhere downstream of synthesis in the open-source flow — a passing gate-level netlist simulation exonerated the synthesizer — routed around, and queued for an upstream report.
The fourth design leaves the primorial path. In Z/65,535 = Z/3 × Z/5 × Z/17 × Z/257 — the internet checksum's ring, four Fermat primes — every channel's unit count is a power of two, so multiplying units reduces to adding their discrete logarithms, and the wrap-around of each addition is plain bit truncation. Per channel: a logarithm table, a masked add, an exponential table. The classic trade is tables instead of multipliers; the question is what the tables cost.
Three ESP32 boards form an ESP-NOW mesh. Each board CRT-encodes data into Z/510,510 residues (7 bytes), broadcasts, and the receiver reconstructs and checks parity. Measurements, not simulations — in three acts: detect, recover, self-check.
| Phase | What it tests | Result |
|---|---|---|
| CRT roundtrip | encode counter → broadcast → reconstruct → verify | 24/24 |
| ECC clean | syndrome = (0,0,0) for valid codewords over radio | 40/40 |
| ECC injection | corrupt one data channel per packet, verify detection | 40/40 |
corrupt mod 2 (+1): syndrome (5, 12, 14) corrupt mod 3 (+1): syndrome (7, 8, 15) corrupt mod 5 (+1): syndrome (6, 4, 10) corrupt mod 7 (+1): syndrome (1, 10, 16)All 3 parity channels fire for every injection — the no-blind-spots property, confirmed over the air.
Detection is half the ECC story; the mesh also recovers. One value's 7 residues are sharded across the 3 boards (channels 2, 3, 5 / 7, 11 / 13, 17), one board is silenced each round, and a surviving board reconstructs the value over radio from the residues that still exist — subset CRT, no retransmission.
NTP-style two-way time transfer, pairwise around the triangle 0→1→2→0. The three pairwise offsets sum to zero identically, so the measured sum — the closure — is pure estimation error: a self-check the mesh computes about its own synchronization with no reference clock, the radio sibling of the parity syndrome below.
The closure's first real catch was not the injected lie. It was a ~5 ms scheduler-tick bias in our own measurement path — blocking radio reads quantized every timestamp pickup — found and fixed because the closure refused to sit at zero when the leader role rotated.
Instead of one monolithic output head, use 7 independent CRT output heads, one per channel, over a shared backbone. The structural numbers below follow from the decomposition itself (computed at Z/510,510); the detection numbers are measured.
Train the 3 parity heads as extra objectives and every prediction carries a free self-check: the syndrome — how many parity heads disagree with the data heads. A token that contradicts itself announces it.
Measured: Elman RNN (h=128), PyTorch, 3 seeds, 2,361 characters of natural English (Carroll, 58-char vocabulary). Detection reaches 100% from 85% model accuracy onward and degrades gracefully below. The larger ring also wins at fixed capacity: at h=64, k=8 beats k=7 on both accuracy (64% vs 47%) and detection (99% vs 95%) — the extra channel adds an independent training signal.
The syndrome answers a per-token question: is this output self-consistent? Run sequentially, the same label-free signal answers a deployment question: has the input distribution changed? Each block of output, the syndrome count places a bet against the hypothesis “operation is in-distribution”; the running wealth the bets accumulate is the alarm, fired when it reaches 1/α. By Ville's inequality, the false-alarm budget α then holds no matter how long you watch or when you stop watching — on one condition: each bet must be fair against the null given the blocks already seen. This demo buys that by design, drawing its blocks independently.
The model here is deliberately under-sized (h = 16, ~47% accuracy), so even in-distribution blocks carry 13–27 syndrome positions — a raw rule that flags any syndrome false-flags every clean stream at block 1. The betting version, reading the same signal on the same streams, stays quiet on 97 of 100. Corruption shows what the accumulation buys: corrupted blocks mostly overlap the clean range, so a fixed threshold waits for rare extreme blocks, while the wealth compounds the persistent mild shift across ordinary ones.
On real text (h = 64), the three parity channels can also bet separately, and how their evidence combines is a design law. Split the channels' input streams by design and the product of per-channel wealths is a valid alarm by construction: it localizes — in 18/18 single-channel shifts the shifted channel's wealth alarms and holds the max, unshifted channels never cross (0/36) — and it beats the pooled bet (median alarm at block 6 vs 20 under corruption: a sum dilutes a one-channel shift under two quiet channels' noise). On a shared stream that multiplication is invalid, measurably — the naive product alarms 10/60 clean streams, 3.3× past the budget — but each channel's wealth alone is still valid, so betting each at a third of the budget and alarming when any crosses needs no independence at all, and the alarm pattern names the broken part: corrupt one parity head's weights and that channel alone alarms (9/9 severe, 9/9 mild — the mild ones with no flaggable typical block); corrupt a data head and all three alarm at once (12/12) — a wrong reconstruction disagrees with every parity check. Naming costs about one block over the pooled bet on loud faults, and wins on mild ones.
Measured: 5 seeds, Elman RNN (h=16) at Z/510,510, a 10-sentence pangram corpus, one sentence per block with state reset — the model is deterministic, so the null distribution of each block's syndrome count is enumerated from its own emissions, not estimated. In-distribution here means the memorized 10 sentences; any unseen text is out-of-distribution for this toy. On real text (h = 64), where the null must be estimated from m calibration draws, false-alarm control survives at the composed budget α + δ (0/180 fresh clean streams); the price lands in detection delay, shrinking as √m. Independent blocks are this rig's choice, not the method's requirement: fairness given the past survives dependent reading policies, at the price of calibrating the null per reading state rather than once — skip that, and a dependent stream lands 6× past the budget.
Tried and conclusively failed — kept here so they stay failed:
CRT decomposition is a structural technique, not a silver bullet. At character-level vocabulary it costs more output parameters than a monolithic head; at V = 1,000 it has 17× fewer, but the monolithic model still converges faster (all per-channel predictions must agree, so errors compound). The durable value is the syndrome: free error detection that no monolithic architecture has.
Same ring arithmetic, three substrates:
The FPGA is the proof engine (exhaustive verification in silicon), the GPU the batch engine, the ESP32 the radio mesh node — throughput was never its job.