FIVE PRIMES

CRT Tokenizer

Every byte splits into 5 independent channels via the Chinese Remainder Theorem. Type anything below. Watch it decompose.

bytes

bits/byte (joint)

bits/byte (sum of channels)

redundancy ratio

The Five Channels

Each byte n maps to (n mod 2, n mod 3, n mod 5, n mod 7, n mod 11). The channels are provably independent by CRT. Information splits, not copies.

Channel Independence

Mutual information between channels (bits). Independent channels = 0. Real text has structure, but the channels themselves are algebraically independent.

Why This Matters

Modern AI tokenizers (BPE, SentencePiece, tiktoken) are statistical. They learn patterns from data. CRT decomposition is algebraic. It sees structure that statistics can't.

The theoretical redundancy at the byte level (N=210) is 20x by the Rissanen theorem. This means the joint information in 5 independent channels is fundamentally less than the information needed to encode the original byte stream.

	METHOD	TYPE
CRT	5 algebraically independent channels. Provably optimal decomposition for mod-210 data. Rissanen 20x theoretical redundancy.	Algebraic
BPE	Statistical subword merging. GPT-4 uses ~100K token vocabulary. Learns patterns, doesn't see ring structure.	Statistical
zstd	Best general compressor: ~3-4x. Dictionary + entropy coding. No algebraic decomposition.	Statistical

The ring Z/210Z = Z/2 x Z/3 x Z/5 x Z/7 has 48 units (phi(210) = 48).
Adding the L=11 channel: Z/2310Z. Rissanen redundancy: 936x at token level.

CRT = Chinese Remainder Theorem. Ring: Z/210Z = Z/2 x Z/3 x Z/5 x Z/7.
With L=11 error correction: Z/2310Z. All computations in-browser, no server.

CRT Tokenizer

The Five Channels

Byte Map

Channel Independence

Why This Matters