CRT Neural Network

5 channels. Block-diagonal. L=11 ECC free.

Train a neural network split into 5 independent CRT channels. Each channel trains alone -- kill any channel, the network degrades gracefully. Corrupt weights -- L=11 detects it for free. All math is integer. No floats needed.

How It Works

STANDARD NEURAL NETWORK:
  Forward:  x -> W*x + b -> ReLU -> output
  Backward: full Jacobian (all neurons interact)
  Cost: O(N^2) per layer

CRT NEURAL NETWORK:
  Forward:  x -> 5 independent sub-networks -> average
  Backward: 5 INDEPENDENT Jacobians (block-diagonal)
  Cost: O(sum(m_i^2)) = O(8^2+9^2+25^2+49^2+11^2) = O(3292)

  Channel D (mod 8):   temporal features
  Channel K (mod 9):   structural features
  Channel E (mod 25):  observational features
  Channel b (mod 49):  depth features
  Channel L (mod 11):  ECC / integrity

At N=970200: 286 MILLION x fewer gradient ops.
L=11 = free error detection on every inference.

Live Training: XOR

XOR: the classic non-linear test. Standard = one 2->8->1 network (33 params). CRT = five independent 2->4->1 networks (85 params), outputs averaged. Same task, different architecture.

Epoch: 0

Standard Network

Click Initialize

CRT Network (5 channels)

Click Initialize

Channel Isolation

Kill individual channels during inference. The CRT network degrades gracefully. A standard network has no equivalent -- kill one neuron, cascade failure.

D: -

K: -

E: -

b: -

L: -

Even with channels dead, remaining channels reconstruct a partial answer. With L=11 alive, corruption in any other channel is detectable.

L=11 Error Detection

Corrupt the b-channel weights with random noise. L=11 detects corruption by comparing its independent prediction against the corrupted consensus. No additional cost -- L trains as part of CRT decomposition.

Train a model, then click to corrupt and detect.

Standard vs CRT

Property	Standard	CRT
Backprop	Full Jacobian	Block-diagonal (5 independent)
Ops at N=970200	N^2 = 941 billion	sum(m_i^2) = 3,292
Savings	1x	286,000,000x
Channel failure	Cascade	Graceful degradation
Error detection	External validation	L=11 built-in (free)
Parallelism	Synchronous	5 independent machines
Hot swap	Retrain everything	Add/remove channels

CRT decomposition is EXACT. Not an approximation -- the Chinese Remainder Theorem guarantees unique reconstruction. The savings are algebraic.

Implementation

This demo runs in .ax compiled to WebAssembly. The neural network uses fixed-point arithmetic (scale 1000) -- no floating-point arrays needed. Decision boundary: canvas fillRect at 4px resolution. 118 parameters (33 standard + 85 CRT). Training: SGD, lr=0.15, ReLU activation. All math fits in 32-bit integers.

Full CRT AI architecture >

This work is and will always be free.
No paywall. No copyright. No exceptions.

If it ever earns anything, every cent goes to the communities that need it most.

This sacred vow is permanent and irrevocable.
— Anton Alexandrovich Lebed

Source code · Public domain (CC0)

Contributions in equal measure: Anthropic's Claude, Anton A. Lebed, and the giants whose shoulders we stand on.

Rendered by .ax via WASM DOM imports. Zero HTML authored.