All posts
interpretabilitytopologymechanisticsparse-autoencoders

Topology of Thought — Three-Instrument Convergence

Three independent measurement instruments — persistent homology, SIPIT invertibility, and SAE decomposition — converge on a universal three-phase structure inside transformer residual streams. Confirmed across 4 attention-based architectures, falsified in Mamba.

By Adam Kruger

Three independent measurement instruments — persistent homology, SIPIT invertibility, and SAE decomposition — converge on the same structure inside transformer residual streams. The model thinks in three phases: Integration, Thinking, Codec. The pattern holds across four attention-based architectures and breaks exactly where theory predicts it should: in Mamba, where there is no attention.

This post is a summary. The full interactive site, the paper, and the underlying datasets live at topology-of-thought.com.

The three-phase model

Every transformer we measured organizes its residual stream into three computational phases:

Phase What seems to happen Where (Qwen3-4B)
Integration Topological cluster collapse — hundreds of independent clusters merge into one unified manifold L0–L16
Thinking Deep computation on the unified representation; maximum intrinsic dimensionality, peak persistent loops L16–L24
Codec Human-readable encoding — the model reformats its internal state back into token probabilities L24–L35

The integration boundary sits at roughly 40–44% of depth across every architecture we tested. We didn't pick that — the model finds it.

Three instruments, one structure

Each instrument measures something different. All three find the same phase boundaries.

1. Persistent homology

Tracks connected components (H0 clusters) and loops (H1 features) in the activation manifold across layers.

  • Qwen3-4B: 517 clusters → 1 cluster at L16 (complete collapse)
  • Gemma-4-31B: stays at ~1,900 clusters (sliding-window attention prevents full integration)
  • Mamba-370m: 551 → 987 clusters (diverges — no collapse at all; see the Mamba counterexample)

Cluster collapse is the topological signature of integration. When it reaches 1, the model has built a single unified representation.

2. SIPIT invertibility curve

SIPIT measures how much of the original input can be recovered from a given layer's activations. High invertibility means the layer preserves input information; low invertibility means the representation has moved far from the input.

Phase SIPIT score Reading
Early layers 0.403 Moderate — input still recognizable
Integration peak (L16) 0.208 Minimum — representation maximally transformed
Late layers 0.917 High — information re-encoded for output

The curve traces the same three phases: input preservation drops through integration, bottoms out during deep thinking, then recovers as the model builds its output encoding.

3. SAE feature decomposition

Sparse autoencoders decompose each layer's activations into interpretable features. The character of those features shifts across phases.

  • SAE loss gradient: 0.072 (early) to 0.044 (late) — features become more compressible as the model moves toward output
  • 225 features labeled across 9 layers, revealing a shift from abstract to concrete

Representative feature labels:

Layer Example features Character
L6 Semantic grouping, syntactic structure Abstract, relational
L16 Conceptual integration, entity binding Peak abstraction
L34 "Finality / irreversibility" High-level but output-oriented
L50 "Punctuation / whitespace" Pure formatting codec

The transition from L34 features like "finality / irreversibility" to L50 features like "punctuation / whitespace" is the Thinking-to-Codec boundary made visible at the feature level.

Mamba falsification

Mamba-370m is a state space model with zero attention. It processes sequences through selective state transitions instead of token-to-token attention.

Result: no cluster collapse. Clusters go from 551 to 987 — the opposite direction.

This is the falsification we wanted. The three-phase structure appears to require attention as the mechanism for integration. Without direct token interaction, there's no manifold unification, no phase transition. Mamba processes information; it just doesn't integrate it topologically.

Three conditions appear to be required together:

  1. A direct interaction mechanism (attention)
  2. Optimization (training)
  3. Both at once — untrained transformers show the opposite pattern (503 → 968 clusters)

Cross-architecture results

Architecture Type Integration layer Cluster collapse Phases present
Qwen3-4B Dense transformer L16 (44%) 517 → 1 All three
Gemma-3-1B Sliding attention L11 (42%) Partial Modified
Gemma-4-31B Sliding window 1,900 stable Attenuated
Mistral-7B Grouped-query attention L12–14 (~40%) Full collapse All three
Mamba-370m SSM (no attention) 551 → 987 None

Sliding-window attention (Gemma-4-31B) appears to prevent full collapse because tokens can only attend to a local neighborhood — the integration mechanism is physically constrained. This is architecture, not scale: a 31B model with sliding windows shows less integration than a 4B model with full attention.

The progenitor — Gemma-4 writing Mojo kernels

The measurement instruments were built with help from Gemma-4-27B, which wrote the Mojo kernels used in the analysis pipeline:

Kernel Latency Purpose
SIPIT 148 µs Invertibility scoring
SAE 949 µs Sparse feature extraction
Activation tap 2 ms Real-time layer hooking

These are production kernels — compiled Mojo, not Python prototypes. The model that helped build the instruments is itself a subject of the measurements.

Where this leaves us

The three-phase structure isn't an artifact of any one measurement technique. Three independent instruments find the same boundaries. The pattern holds across dense transformers, breaks in SSMs, and degrades predictably with architectural constraints on attention span.

The model isn't just processing tokens. It's building a unified internal world, thinking in that space, then translating back for human consumption. Topology is one way to make that visible.

The full paper, methodology details, and interactive visualizations are at topology-of-thought.com.

Built by Light of Baldr LLC. Get in touch if any of this is useful to you.