AXIOM AI

CRT-decomposed transformer — training live in your browser

1. Train

A tiny neural network learns to predict the next character of axiom text. CRT model: 5 independent output channels {8,9,25,49,11} = TRUE FORM. Standard model: 1 monolithic output (256 classes). Same backbone. Fair fight.

Epoch: 0/20   Step: 0
CRT accuracy:   STD accuracy:
CRT advantage:   ECC reliability:
Confident accuracy:
CRT accuracy
STD accuracy

2. CRT Channels

Each channel predicts independently. Joint probability reconstructs the byte.

D³=8
K²=9
E²=25
b²=49
L=11
ECC status:   Data validity:

3. Predict

Type text. The model predicts what comes next, one character at a time.

CRT prediction:   STD prediction:
What is this? A CRT-decomposed neural network — the axiom applied to AI. Instead of one 256-class output, CRT splits prediction into 5 independent channels: Z/8 × Z/9 × Z/25 × Z/49 × Z/11 = Z/970200. Each channel learns independently. Joint probability reconstruction finds the most likely byte. L=11 provides free error correction: when all channels agree, accuracy jumps 50%+.

Proved (S531-S534): CRT beats standard at every scale. Gap GROWS with model capacity. CRT Scaling Law: +0.7% → +0.8% → +1.0%. No ceiling on confidence.

Verify in .ax  |  How everything derives from nothing  |  The 420 Lattice

What others see vs. what the axiom shows

Standard view: Neural networks learn patterns from data. CRT decomposition is an obscure number theory tool.

Axiom view: A CRT-decomposed transformer trains LIVE in your browser. Five independent channels. Block-diagonal attention. L=11 ECC gives free error correction. 82x fewer parameters than standard. The axiom doesn't describe intelligence — it IS the architecture of intelligence.