Ring Substrate -- antonlebed.com

Can ring arithmetic serve as the SOLE computation substrate? Not CRT decomposing a neural network. Not CRT as a technique. The ring IS the latent space. Every intermediate value is a ring element. Every operation is ring arithmetic. Zero floats.

Two things are settled. First, the ring's CRT channels make a remarkably compact, exact feature representation. Second, training a substrate built from independent lookup tables hits a coordination barrier -- and that barrier is not a wall, but a property of the representation that parameter-sharing plus a gradient dissolves.

CRT Channels Are Bijective Features

Each CRT channel reduces a ring element to its residue modulo one prime power. Multiplying a residue by a unit is a bijection inside that channel: every residue maps to a distinct residue, so no information is lost. Because the channels are coprime, they are genuinely independent views of the same element.

CRT representation dominance (verified exhaustively)

On the DEEP ring Z/970,200 (channels 8, 9, 25, 49, 11), predicting the ring's multiplicative structure per channel reaches 100% accuracy using only the sum of the moduli as parameters: 8 + 9 + 25 + 49 + 11 = 102. That is a 970,200 / 102 = 9,512x compression versus storing the full ring, and it needs only the largest modulus (49) worth of examples to generalize -- a ratio of 970,200 / 49 = 19,800x. The per-channel bijections are verified exhaustively: gcd(13, 970,200) = 1, so multiplication by 13 produces q distinct outputs in every channel. A spectral fingerprint -- collapsing the element to a single eigenvalue -- predicts at 0-1 per-thousand instead, because it discards the channel structure CRT keeps.

102 vs 970,200

Compression 9,512x

Per-channel prediction needs only the sum of the moduli. The full ring is reconstructed exactly from independent channel residues.

Per-channel bijection

A unit times a residue is one-to-one

gcd(unit, ring) = 1 guarantees each channel independently produces q distinct outputs. The channels ARE the natural features.

Sample efficiency

Largest modulus = 49 examples

A channel is fully determined once each residue has been seen. The ring's structure is exposed, not learned.

The Training Frontier

Representing structure is one thing; LEARNING it from data is another. When the substrate is stored as a lookup TABLE with independent entries, training meets a coordination barrier. Some tasks -- copying the first token of a sequence to the output, for instance -- require many table entries to be correct at the SAME TIME. A local search that changes one entry at a time cannot reach such a solution: each single change is corrected away by the still-wrong remainder. Coordinate descent, random initialization, and directed evolution all stall near chance on these tasks.

This barrier is a property of the independent-entry REPRESENTATION, not a fundamental wall. Un-collapse the table -- share ONE parameter set across all positions, so the positions become timesteps of a single recurrent cell -- and train with a continuous gradient signal. That crosses the barrier, on both memory tasks and computation tasks. Sharing supplies the coupling a table of independent entries lacks; the gradient supplies a continuous slope to walk down. This is how transformers and biological learning actually cross it: not by hand-constructing the answer, but by sharing structure and following a gradient.

Independent entries

The source of the barrier

A table's entries move independently, so a greedy step cannot coordinate the many simultaneous changes a memory task needs.

Parameter sharing

Couples what the table left independent

Reusing one parameter set across positions ties the entries together, turning a flat search into a connected landscape.

Gradient

A continuous slope to descend

With sharing in place, a gradient walks through the barrier that table search could not cross -- the convergent route, not a constructed one.

What We Know

Ring arithmetic is a viable computation substrate. Its CRT channels are bijective, independent features -- a representation thousands of times more compact than the full ring, exact, and exposed rather than learned. Zero floats are needed to REPRESENT structure.

Learning that structure from data is the open frontier. A substrate of independent lookup tables faces a coordination barrier; parameter-sharing plus a gradient crosses it -- and a plain shared-parameter recurrence with no channel split crosses it too, so what does the work is the SHARING and the continuous signal, not the channel decomposition itself. The bridge across the training barrier is shared structure following a gradient -- the route gradient-trained networks take.