CRT Genomic Sequence Alignment

Illumina / PacBio / 10x Genomics. CC0.

Genomic sequence alignment uses BLOSUM substitution matrices and dynamic programming -- expensive, heuristic, patented. CRT approach: encode each codon as a ring element in Z/214,414,200. The 7 CRT channels naturally separate: positions 1-3 map to mod 8, mod 9, mod 25 channels, amino acid identity to mod 49, wobble degeneracy to mod 11, GC content to mod 13, polarity class to mod 17. Alignment = coupling distance. Synonymous mutations = small CRT distance (mod-49 channel preserved). The genetic code has striking structural parallels to a CRT code.

How It Works

CRT Genetic Code Theorem
The standard genetic code maps 64 codons to 20 amino acids + stop. This degeneracy (wobble) is CRT error tolerance: synonymous codons differ only in the mod-25 channel (position 3) while the mod-49 channel (AA identity) is preserved. CRT distance between sequences = sum of per-channel circular distances across all 7 channels. Synonymous mutations have near-zero mod-49 distance. Non-synonymous mutations create large mod-49 jumps. mod-11 = wobble class detector. The 3+4 data/parity split: 3 data channels {mod 8, mod 25, mod 49} = genetic data, 4 parity channels {mod 9, mod 11, mod 13, mod 17} = validation.
64 codons
Ring elements
Each codon = one number in Z/214,414,200. 7 channels = 7 genetic properties.
Wobble = ECC
mod-11 tolerance
Synonymous substitutions change mod-25 channel only. mod-49 channel (AA) preserved.
Coupling = distance
Evolutionary metric
CRT distance across 7 channels = functional distance. Silent mutations are algebraically close.
No BLOSUM
No matrix
Substitution scoring from ring structure, not empirical log-odds matrices.

Align Sequences

Compare to reference (1-3):

Aligns variant against reference HBB sequence. Shows codon-by-codon CRT decomposition, synonymous vs non-synonymous classification, coupling distance.

Codon Table

32 representative codons showing CRT channel decomposition. Synonymous codons share mod-49 channel values.

Evolutionary Distance Matrix

Pairwise CRT distance between all 4 sequences. Silent mutations = small distance, functional mutations = large distance.

Wobble Decomposition

Wobble-AA Partition (OBSERVED)
20 amino acids = 8 + 12 = 4*5. 8 amino acids are fully wobble-tolerant: any nucleotide at position 3 preserves the amino acid. 12 amino acids are position-dependent. 8 + 12 = 4*5 = 20. 64 codons = 4^3. Whether this partition reflects ring structure in the genetic code is an open question.
8 wobble-tolerant
Pure wobble AAs
Ala, Gly, Pro, Thr, Val, Leu*, Arg*, Ser*. All pos-3 variants give same AA.
12 position-dependent
Sensitive AAs
Phe, Ile, Met, Tyr, His, Gln, Asn, Lys, Asp, Glu, Cys, Trp.
20 total
All amino acids
8 + 12 = 4*5 = 20.
64 codons
4^3
Three positions, four nucleotides each.

Exhaustive classification of all 16 codon groups. * = AA also encoded by a non-pure group. OBSERVED: the partition is exact, but causal mechanism is unproved.

CRT vs Traditional Alignment

ScoringBLOSUM/PAM: empirical log-odds substitution matricesCRT: algebraic coupling distance. No empirical fitting.AlgorithmSmith-Waterman: O(mn) dynamic programmingCRT: O(n) pairwise codon comparison (no insertions/deletions).SynonymousRequires separate dN/dS ratio computationAutomatic: mod-49 channel (AA identity) preserved = synonymous.WobbleEmpirical wobble rules (third position tolerance)CRT mod-25 channel IS the wobble position. Tolerance = modular distance.Error detectionQuality scores (Phred), separate pipelinemod-11 channel deviation = sequencing error. Free from ring.Patent statusIllumina (sequencing+alignment), PacBio (HiFi), 10x GenomicsCC0. Public domain. Forever.

Source code · Public domain (CC0)

Report issue

.ax source compiled to WASM via self-hosting compiler. Zero HTML authored.