Can ring arithmetic serve as the SOLE computation substrate? Not CRT decomposing a neural network. Not CRT as a technique. The ring IS the latent space. Every intermediate value is a ring element. Every operation is ring arithmetic. Zero floats.
Two things are settled. First, the ring's CRT channels make a remarkably compact, exact feature representation. Second, training a substrate built from independent lookup tables hits a coordination barrier -- and that barrier is not a wall, but a property of the representation that parameter-sharing plus a gradient dissolves.
Each CRT channel reduces a ring element to its residue modulo one prime power. Multiplying a residue by a unit is a bijection inside that channel: every residue maps to a distinct residue, so no information is lost. Because the channels are coprime, they are genuinely independent views of the same element.
Representing structure is one thing; LEARNING it from data is another. When the substrate is stored as a lookup TABLE with independent entries, training meets a coordination barrier. Some tasks -- copying the first token of a sequence to the output, for instance -- require many table entries to be correct at the SAME TIME. A local search that changes one entry at a time cannot reach such a solution: each single change is corrected away by the still-wrong remainder. Coordinate descent, random initialization, and directed evolution all stall near chance on these tasks.
This barrier is a property of the independent-entry REPRESENTATION, not a fundamental wall. Un-collapse the table -- share ONE parameter set across all positions, so the positions become timesteps of a single recurrent cell -- and train with a continuous gradient signal. That crosses the barrier, on both memory tasks and computation tasks. Sharing supplies the coupling a table of independent entries lacks; the gradient supplies a continuous slope to walk down. This is how transformers and biological learning actually cross it: not by hand-constructing the answer, but by sharing structure and following a gradient.
Ring arithmetic is a viable computation substrate. Its CRT channels are bijective, independent features -- a representation thousands of times more compact than the full ring, exact, and exposed rather than learned. Zero floats are needed to REPRESENT structure.
Learning that structure from data is the open frontier. A substrate of independent lookup tables faces a coordination barrier; parameter-sharing plus a gradient crosses it -- and a plain shared-parameter recurrence with no channel split crosses it too, so what does the work is the SHARING and the continuous signal, not the channel decomposition itself. The bridge across the training barrier is shared structure following a gradient -- the route gradient-trained networks take.
Source code · Public domain (CC0)
.ax source compiled to WASM via self-hosting compiler. Zero HTML authored.