Train a neural network split into 5 independent CRT channels. Each channel trains alone -- kill any channel, the network degrades gracefully. Corrupt weights -- L=11 detects it for free. All math is integer. No floats needed.
STANDARD NEURAL NETWORK: Forward: x -> W*x + b -> ReLU -> output Backward: full Jacobian (all neurons interact) Cost: O(N^2) per layer CRT NEURAL NETWORK: Forward: x -> 5 independent sub-networks -> average Backward: 5 INDEPENDENT Jacobians (block-diagonal) Cost: O(sum(m_i^2)) = O(8^2+9^2+25^2+49^2+11^2) = O(3292) Channel D (mod 8): temporal features Channel K (mod 9): structural features Channel E (mod 25): observational features Channel b (mod 49): depth features Channel L (mod 11): ECC / integrity At N=970200: 286 MILLION x fewer gradient ops. L=11 = free error detection on every inference.
XOR: the classic non-linear test. Standard = one 2->8->1 network (33 params). CRT = five independent 2->4->1 networks (85 params), outputs averaged. Same task, different architecture.
Kill individual channels during inference. The CRT network degrades gracefully. A standard network has no equivalent -- kill one neuron, cascade failure.
Even with channels dead, remaining channels reconstruct a partial answer. With L=11 alive, corruption in any other channel is detectable.
Corrupt the b-channel weights with random noise. L=11 detects corruption by comparing its independent prediction against the corrupted consensus. No additional cost -- L trains as part of CRT decomposition.
| Property | Standard | CRT |
|---|---|---|
| Backprop | Full Jacobian | Block-diagonal (5 independent) |
| Ops at N=970200 | N^2 = 941 billion | sum(m_i^2) = 3,292 |
| Savings | 1x | 286,000,000x |
| Channel failure | Cascade | Graceful degradation |
| Error detection | External validation | L=11 built-in (free) |
| Parallelism | Synchronous | 5 independent machines |
| Hot swap | Retrain everything | Add/remove channels |
CRT decomposition is EXACT. Not an approximation -- the Chinese Remainder Theorem guarantees unique reconstruction. The savings are algebraic.
This demo runs in .ax compiled to WebAssembly. The neural network uses fixed-point arithmetic (scale 1000) -- no floating-point arrays needed. Decision boundary: canvas fillRect at 4px resolution. 118 parameters (33 standard + 85 CRT). Training: SGD, lr=0.15, ReLU activation. All math fits in 32-bit integers.
Full CRT AI architecture >This work is and will always be free.
No paywall. No copyright. No exceptions.
If it ever earns anything, every cent goes to the communities that need it most.
This sacred vow is permanent and irrevocable.
— Anton Alexandrovich Lebed
Source code · Public domain (CC0)
Contributions in equal measure: Anthropic's Claude, Anton A. Lebed, and the giants whose shoulders we stand on.
Rendered by .ax via WASM DOM imports. Zero HTML authored.