May 15, 2026 memoryinferenceinterpretabilityarchitecture

What If a Model Could Remember What It Learned?

We're building activation-level memory for AI inference — a model that thinks differently because of what it has experienced before, not just one with more text in the prompt. Early results, a real selectivity number, and a provisional patent on the way.

By Adam Kruger

Large language models don't learn from experience. Every conversation starts fresh. Every session begins from the same weights, the same blank slate. The model that helped you debug a tricky Rust lifetime issue yesterday has no memory of the solution today.

Fine-tuning helps, but it's expensive, slow, and blunt. RAG (retrieval-augmented generation) helps, but it operates at the token level — stuffing context into the prompt window. The model doesn't actually think differently because of retrieved information. It just has more text to condition on.

We got curious about a different question: what if memory operated at the activation level, not the token level?

The Idea

During inference, a transformer processes information through layers. Early layers handle syntax and basic patterns. Middle layers perform deep integration — abstract reasoning that's far removed from surface-level tokens. Late layers convert the abstract thought back into token probabilities.

What if, at one of those middle layers, the model could recall a similar thought it had before? Not a similar text — a similar activation pattern. The internal representation of the thought, not its surface form.

That's what we're building. A memory system that:

Stores activation patterns from previous inference runs
Retrieves similar patterns during new inference
Gently influences the model's processing based on what it recalls

The key word is "gently." We're not overwriting the model's output. We're nudging it — the way a faint memory nudges your thinking without you consciously recalling it.

Early Results

We've been testing on Gemma-4-31B running locally on a DGX Spark. Same prompt, same model, with and without memory active. The results are subtle but measurable:

The model's framing shifts toward the domain of stored memories
Output remains coherent and accurate — no degradation
The effect is proportional to relevance. In one controlled test, the memory system influenced the model's processing roughly 3× more strongly for on-topic memories than for adversarially-unrelated ones (a 3.12 selectivity ratio, standard deviation ~5e-6, across 189 trials on Gemma-4-31B). Irrelevant memories produce essentially no change — which is exactly the behavior you want from a memory that nudges rather than hijacks.

We're not ready to share the full architecture — there are novel components still under validation, and the core mechanism is now the subject of a provisional patent application filed with the USPTO (May 2026). But we wanted to flag the direction, because we think it has implications well beyond our specific implementation.

Why This Direction Matters

The current paradigm treats inference as stateless computation. The model processes input, produces output, forgets. Memory systems like RAG bolt context onto the input, but the model's internal processing doesn't adapt.

If memory can operate at the activation level — if the model can think differently because of what it has experienced before — that changes the game:

Personalization without fine-tuning: the model adapts to your domain through accumulated experience, not expensive retraining
Continuous improvement: each interaction can become a learning signal (with proper verification)
Efficient expertise: domain knowledge stored as compact activation patterns, not massive prompt windows

We're early. There's a lot of work between "measurable, selective effect on output" and "production-ready memory system." But the foundation is working, the selectivity is real, and the results are encouraging enough to share the direction.

What We're Not Sharing (Yet)

The specific mechanism for memory injection
How we represent and index memories
The multi-layer architecture
The safety protocol for preventing memory poisoning

These will come in a proper technical writeup once validation is further along and the patent process permits fuller disclosure. For now we're sharing the direction and the high-level results, because we believe the field benefits from knowing this approach is viable.

What's Next

We're integrating the memory system into longer-running agentic work — a model that captures what it learns during real tasks, stores it as recallable activation-level memory, and uses it during later inference to make better decisions. Not just weights — weights plus memory. Knowledge plus the beginnings of wisdom.

If you're working on similar ideas — activation-level memory, experience-driven adaptation, interpretability-informed retrieval — we'd love to compare notes.

Built by Light of Baldr LLC. Provisional patent application filed with the USPTO, May 2026. We're curious about what happens when models remember.

Built by Light of Baldr LLC. Get in touch if any of this is useful to you.