Drop any file. Watch every byte decompose into five independent prime channels. The redundancy isn't theoretical — you'll see it.
How much structure CRT reveals. Uniform random = 1x. Maximum = 20x.
Each byte b (0-209 in the data ring mod 210) splits into (b%2, b%3, b%5, b%7, b%11). Independent channels, independent entropies.
Each pixel = one byte, colored by dominant CRT channel. Structure leaps out of "random" data.
Histogram of all 256 byte values. Uniform = flat. Structure = peaks. CRT exploits both.
| CRT Decomposition | Standard (gzip, zstd) | |
|---|---|---|
| Method | Algebraic: n → (n%2, n%3, n%5, n%7, n%11) | Statistical: LZ77 + Huffman / FSE |
| Channels | 5 independent (provably: CRT isomorphism) | 1 monolithic stream |
| Error correction | Built-in: L=11 channel detects+corrects for free | None (CRC detects, doesn't correct) |
| Redundancy seen | Up to 20x (Rissanen, byte ring) | ~3x typical (empirical) |
| Parallelism | 5 channels, 5 cores, zero coordination | Sequential by design |
| Foundation | Chinese Remainder Theorem (2000+ years) | Ad hoc pattern matching (1977) |
Every byte lives in Z/210Z = Z/2 × Z/3 × Z/5 × Z/7 × Z/11
The data ring. 210 = 2 × 3 × 5 × 7 × 11. Bytes 210-255 wrap (mod 210).
Joint entropy: H(X) ≤ 8 bits
Channel entropies: H(X%2) + H(X%3) + H(X%5) + H(X%7) + H(X%11)
Redundancy ratio = sum of channel entropies / joint entropy.
If channels were independent (CRT guarantees this for mod-210 values), the sum equals the joint. Deviation above 1x = structure CRT can exploit. For structured data (text, images), small channels saturate while large ones concentrate — the ratio climbs, revealing exploitable redundancy.
The deeper result (Rissanen MDL): a learning algorithm using CRT channels needs sum(p_i)/N = 28/210 parameters vs monolithic. Stochastic complexity savings: ~20x at byte scale. This isn't compression ratio — it's how much faster you learn the data.