CRT Neural Network

Train a neural network with 5 independent CRT channels. Block-diagonal. No cross-channel gradients. CC0.

How It Works

STANDARD NEURAL NETWORK:
  Forward:  x -> W*x + b -> activation -> output
  Backward: full Jacobian dL/dW (all neurons interact)
  Cost: O(N^2) per layer for N neurons

CRT NEURAL NETWORK:
  Forward:  x -> CRT decompose -> 5 independent sub-networks -> CRT reconstruct
  Backward: 5 INDEPENDENT Jacobians (block-diagonal)
  Cost: O(sum(m_i^2)) = O(8^2 + 9^2 + 25^2 + 49^2 + 11^2) = O(3292)
  vs standard O(N^2) = O(970200^2) = O(941,488,040,000)

  Channel D (mod 8):  8-dim sub-network   — temporal features
  Channel K (mod 9):  9-dim sub-network   — structural features
  Channel E (mod 25): 25-dim sub-network  — observational features
  Channel b (mod 49): 49-dim sub-network  — depth features
  Channel L (mod 11): 11-dim sub-network  — ECC / integrity

SAVINGS: 941 billion / 3,292 = 285,934,000x fewer gradient computations
L=11 channel: free error detection on inference. If L disagrees: output is corrupt.

The 5 channels train INDEPENDENTLY. You can train them on 5 different machines.
You can train them at 5 different speeds. You can add/remove channels without retraining.

This is not an approximation. CRT decomposition is EXACT.

1. Live Training

Task: Epochs: Learning rate:

Standard Network

CRT Network (5 channels)

2. Channel Isolation

Kill individual channels during inference. The network still works (with reduced precision).

3. L=11 Error Detection

Corrupt weights in one channel. L=11 detects the corruption.

4. Training Speed Comparison

CC0 1.0 Universal - No Rights Reserved. CRT neural networks are public domain prior art.
5 independent channels. Block-diagonal backprop. L=11 = self-healing inference.
antonlebed.com | CC0 License