VARZIN Atlas v2 · Phases I–XX Complete

A symbolic language
with recoverable structure

LUXVAR: 801 roots, Affine Memory Family Aff(Z_N) characterized, Recall Formula derived. Twenty phases. Three research paths forward.

Stable Roots
801
30 in Core registry
Phonotactic Uniqueness
0.738
CI [0.703, 0.781] vs 6 languages
AI Convergence
0.812
inter-model ARI · Claude↔GPT=1.000
Family Accuracy
74.2%
Full corpus N=1M · Δ=+24.2%
Protocol Spine
+0.186
real vs shuffled modularity
Frequency Encoding
ABSENT
5 tests · n=704 · confirmed
Affine Operators
48
Aff(Z₁₂)=Z₁₂⋊Z₁₂× · gcd(a,N)=1
Mirror-13 MDL
13.11 bits
rank #1/1,035 equiv. · OPTIMAL not NECES.
Recall Formula
4-part
Struct+Prior+HypSpace+Obs · POT verified
External Memory
NOT EST.
Phase 19 · z=−1.042 · pipeline verified
Research Statistics
Family Distribution — Core-30
AI Model Agreement (ARI)
Phase Progress
Tests: Positive vs Negative
Corpus Note
Generation: 801 roots × 6 semantic fields × 128 frequency classes × 5 conscious layers = 1.3B rows
Sampling: 500K–1M rows per benchmark run · seed-reproducible
Method: Python / NumPy · full scripts in GitHub
Preprint: DOI 10.5281/zenodo.20691858
LUXVAR Dictionary

Core-30 Registry

Frequency values (144–1008 Hz) are historical symbolic design labels — confirmed not recoverable from word form, family, or protocol context (5 independent computational tests, n=704, all negative). They are preserved as part of the original ontological design and appear in word metadata for archival purposes only. No causal or empirical claims are made about these frequencies.

Showing Core-30 (30/801 roots). Full corpus: Preprint dataset

ALL
ELUZ
SHA
NAR
ZAR
RAHT
SAR
OTHER
Word
Family
Axis
Freq
AI
Morpheme Families

The Recoverable Structure

Independently discovered by Claude, GPT-4, and Gemini. Inter-model ARI = 0.812. Claude↔GPT-4 = 1.000.

// VPE-001B+ — 3 independent AI systems, no prior LUXVAR knowledge Claude ↔ GPT-4 ARI = 1.000 // perfect agreement Claude ↔ Gemini ARI = 0.718 GPT-4 ↔ Gemini ARI = 0.718 Mean inter-model ARI = 0.812 Mean ARI vs designed axes = 0.146 // much lower → Structure is MORPHEMIC, not AXIAL
Word Network

Graph View

Word→Family→Protocol ontology graph. Three layers. Click any word to explore.

Engine: Canvas (optimized for 30 nodes). For 800+ nodes: Cytoscape.js or Sigma.js recommended.
Protocol spine (PT-001 / PST-001): Real modularity = +0.140 Shuffled mean = -0.046 Delta = +0.186 ← protocol is REAL structure
Word Comparison

Compare Two Words

Select any two Core-30 words. See phonotactic similarity, shared family/protocol, and structural distance.

VS
Project History

VARZIN Timeline

Complete Results

All Findings — Phases I–XX

Every confirmed result from 20 phases. 28 positive · 10 negative · 2 pending.

TestPhaseKey MetricStatus
Structural Model

What LUXVAR Is — Evidence Based

CONFIRMED (Phases I–XX): Phonotactic fingerprint (uniqueness = 0.738) Morpheme families (accuracy = 74.2%) Protocol spine (Δmod = +0.186) Sub-character grounding (MASK-001 M1+M2 survive) Affine family Aff(Z_N) (48 ops · Conj.1 N=5..24 · known algebraic struct.) Mirror-13 MDL-minimal (13.11 bits · rank #1/1035 · OPTIMAL not NECES.) 5 Invariants verified (was_forced_true:False · 0 randomness) Recall formula 4-part (Struct+Prior+HypSpace+Obs · POT suite) Wrong prior catastrophic (acc=0.000 · POT-004 confirmed) CONFIRMED ABSENT: Frequency encoding (Δ = −3.6%, ×5 tests, n=704) Semantic axis recovery (no shuffle-control signal) Generative grammar (GVT blind 8.3% < chance 14.3%) Hexacore clusters (k=3, not k=6) External memory field (Phase 19 · z=−1.042 · NOT ESTABLISHED) ETFM as computational core (SIM-ETFM-001 · ΔH=0.012 · viz only) Frequency computation (SIM-FREQ-001 · ΔH=0.044 · weak) OPEN (Three Paths): PATH A: Conjecture 1 proof (Lean/Coq · no blocker) PATH B: VPE-001A (κ≥0.40 · ≥5 blind raters · blocker: raters) PATH C: External memory (z>2.0 · affine receiver · blocker: participant)
Negative Results Archive

15 Confirmed Negatives (Phases I–XX)

Reported as first-class scientific outcomes. These define what LUXVAR is by defining what it is not.

Methodology

How the Research Was Conducted

Corpus Generation
# 801 roots × 6 fields × 128 freq × 5 layers rows = 801 × 6 × 128 × 5 # = 3,077,760 unique combos # × variations = 1.3B rows generator = Python / NumPy seed = reproducible
Full generator scripts available on GitHub. Each benchmark run uses a fixed random seed.
Sampling Strategy
sample_size = 500K–1M rows split = stratified label = Derived_From field family_map = W2F dict (30 words) freq_parser = multi-format valid_freqs = {144,432,474,528,777,1008,...}
Sampling is stratified to preserve class balance. Frequency parser handles 10+ raw formats.
ML Pipeline
features = 34-dim phonotactic vector model = RandomForest(300 trees) cv = StratifiedKFold(k=3) metric = accuracy vs majority shuffle_ctrl = 20 trials per test significance = Δ > +5% above shuffle
AI Clustering (VPE-001B+)
models = Claude, GPT-4, Gemini prompt = Core-30 words, no context task = group by similarity scoring = ARI vs families = ARI vs axes = inter-model ARI
Each model ran independently with no LUXVAR knowledge. Same 30 words.
Reproducibility
All benchmark scripts: GitHub ↗
Preprint: DOI ↗
Seeds: fixed (42 default)
Scripts: VARZIN_PHASE*.py
Format: CSV with Derived_From col
Data Architecture (Current)
# Current: single-file (portability) varzin-atlas-v2.html └── inline JS data # Production target: /data/ words.json ← Core-30 roots_full.json ← 801 roots results.json timeline.json publications.json
Migrating to external JSON enables CDN caching, API updates, and dataset versioning without HTML edits.
MASK-001 Protocol
M1: ELUZ → R1, SHA → R2, NAR → R3 M2: first morpheme → XXXX M3: all chars → C/V class (CVCVCV) target = Protocol prediction n = 27 labeled words result = M1+M2 survive, M3 collapses verdict= sub-character grounding
Publications & Datasets

Archive

Open Question

VPE-001A

The single remaining non-circular test. Every computational question is answered. This one requires humans.

THE QUESTION: Do naive human raters group Core-30 words by… A) Morpheme families (ELUZ / SHA / NAR) → as AI systems do (inter-model ARI = 0.812) B) Semantic axes (LIGHT / REFLECTION / SILENCE / GATE / MOTION) → as designed C) Something else entirely ALL THREE OUTCOMES ARE SCIENTIFICALLY INTERPRETABLE.
Three Research Paths — Post Phase XX
PATH A — Mathematical (no blocker): A1 Formal proof Conjecture 1 for all N (Lean/Coq) A2 Aff(Z_N) spectral properties general N A3 Paper C → arXiv cs.FL / Zenodo PATH B — LUXVAR Grammar (blocker: rater recruitment): B1 VPE-001A · ≥5 blind raters · Fleiss κ ≥ 0.40 B2 Map Core-30 to shift family (a=1), not Mirror-13 B3 Morpho-syntactic rules from phonotactic structure PATH C — External Memory (blocker: independent participant): C1 Sealed targets published before trial C2 Independent receiver with affine-compatible prior C3 z > 2.0 on ≥3 trials = establish external memory
Min Raters
5
Blind. No prior knowledge.
Words
30
Core-30. Card sort.
Threshold
κ≥0.40
Fleiss κ primary metric.
Status
PENDING
Paper B.
VPE-001A — Open Call

Participate in the Study

No linguistics background needed. ~20 minutes. If you have never seen LUXVAR before — you qualify.

Time
~20 min
Card sort. 30 words.
Requirements
0
No linguistics background.
Current Raters
0/5
Minimum 5 needed.
Downloads
↓ Read Protocol (Preprint) ↑ Submit Results
The 30 Words
Why This Matters

Three AI systems grouped these 30 words and all found the same morpheme families: ELUZ, SHA, NAR, ZAR, RAHT. But LUXVAR was designed around five semantic axes: LIGHT, REFLECTION, SILENCE, GATE, MOTION. Your grouping — whatever it is — is a data point that computational analysis cannot produce.

→ Contact to Participate
Builder Mode

Create a LUXVAR Word

Design a new word, assign its family and protocol, and generate its symbolic card — entirely local, no API key required.

Scientific note: Frequency values assigned here are symbolic design labels from the original LUXVAR ontology. Computational analysis confirmed they are not encoded in word structure (5 tests, n=704). Builder output is creative/archival, not empirical.
Word Definition
Local generator — runs entirely in your browser, no API key needed.
For Claude-powered generation: deploy varzin-proxy as a Cloudflare Worker and set PROXY_URL in the source.
Symbolic Card
Enter a word and generate →
Builder · Saved Cards

Your Generated Words

Cards persist in browser storage. Export all as JSON or TXT.

Full Registry

801 Root Explorer [DEMO]

Synthetic expansion — demo only. The 30 Core words (★) are the verified research registry. The remaining 771 entries are algorithmically generated phonotactic extensions for preview purposes. To replace with real data: import roots_full.json using the button below. No empirical claims are made about synthetic roots.
The full LUXVAR corpus contains 801 stable roots across 6 semantic fields.
Load real data: import a JSON/CSV root list below to replace synthetic entries.
Frequency note: values shown are historical symbolic labels — not empirically recoverable.
Mirror-13 · Phase X

Mirror-13 Simulator

The 312-state memory machine. Watch D12 ring + Mirror-13 center unfold in real time. MDL=13.11 bits · ΔEntropy=+127.6% · Rank #1/1035 by simplicity.

States
312
288 ring + 24 center
MDL
13.11
bits to specify Mirror-13
ΔEntropy
+127.6%
over D12
Simplicity Rank
#1
of 1,035 equivalent systems
Steps: 0
Current State
Live Entropy
Polarity Balance
Center Visits
Invariants — Live Verification
// Mirror-13 formal definition State = (gate ∈ {0..11}, phase ∈ {0..11}, polarity ∈ {+1,−1}) Center = gate 12 (24 center states) IN(gate,phase,pol) → (12, gate, −pol) // records entry, flips polarity OUT(12,phase,pol) → ((12−phase)%12, phase, pol) // exits to mirror of entry Invariant A: polarity = (−1)^n_IN × initial_polarity [100% verified] Invariant B: exit = mirror(entry) [bijection 12/12] Invariant C: D12 → 48 orbits | D12+M13 → 1 orbit Invariant D: R∘IN ≠ IN∘R [100% of states] Invariant E: D12 reaches 12 states | D12+M13 reaches 312