CRT Compression | antonlebed.com

Decomposition

Bytes

Joint H (bits)

Sum H (bits)

1.00x

Redundancy

20x

Max (uniform)

Redundancy Gauge

How much structure CRT reveals. Uniform random = 1x. Maximum = 20x.

1.00x

Five Channels

Each byte b (0-209 in the data ring mod 210) splits into (b%2, b%3, b%5, b%7, b%11). Independent channels, independent entropies.

Byte Heatmap

Each pixel = one byte, colored by dominant CRT channel. Structure leaps out of "random" data.

Byte Distribution

Histogram of all 256 byte values. Uniform = flat. Structure = peaks. CRT exploits both.

Why This Matters

	CRT Decomposition	Standard (gzip, zstd)
Method	Algebraic: n → (n%2, n%3, n%5, n%7, n%11)	Statistical: LZ77 + Huffman / FSE
Channels	5 independent (provably: CRT isomorphism)	1 monolithic stream
Error correction	Built-in: L=11 channel detects+corrects for free	None (CRC detects, doesn't correct)
Redundancy seen	Up to 20x (Rissanen, byte ring)	~3x typical (empirical)
Parallelism	5 channels, 5 cores, zero coordination	Sequential by design
Foundation	Chinese Remainder Theorem (2000+ years)	Ad hoc pattern matching (1977)

The Math

Every byte lives in Z/210Z = Z/2 × Z/3 × Z/5 × Z/7 × Z/11
The data ring. 210 = 2 × 3 × 5 × 7 × 11. Bytes 210-255 wrap (mod 210).

Joint entropy: H(X) ≤ 8 bits
Channel entropies: H(X%2) + H(X%3) + H(X%5) + H(X%7) + H(X%11)
Redundancy ratio = sum of channel entropies / joint entropy.

If channels were independent (CRT guarantees this for mod-210 values), the sum equals the joint. Deviation above 1x = structure CRT can exploit. For structured data (text, images), small channels saturate while large ones concentrate — the ratio climbs, revealing exploitable redundancy.

The deeper result (Rissanen MDL): a learning algorithm using CRT channels needs sum(p_i)/N = 28/210 parameters vs monolithic. Stochastic complexity savings: ~20x at byte scale. This isn't compression ratio — it's how much faster you learn the data.