Why we built a 7-layer cognitive memory for AI agents

Thesh Sritharan8. April 20268 min read

The Problem

It's 2026 and AI agents still have amnesia. Every session starts from zero. Every conversation is a blank slate. Your agent might be brilliant at reasoning, but it can't remember what happened yesterday.

The standard answer is RAG — Retrieval-Augmented Generation. Stuff your documents into a vector database, retrieve the top-k results, inject them into the prompt. It works for search. But search is not memory.

Memory means prioritization. Memory means forgetting what doesn't matter. Memory means understanding relationships between people, events, and patterns over time. RAG gives you none of that. It's a flat list of chunks ranked by cosine similarity.

We needed something fundamentally different. Not a better vector database. A cognitive layer that thinks about information the way a human brain does.

Why existing solutions fall short

We evaluated every serious memory solution on the market before building our own. Here's what we found:

ChatGPT Memory

A flat list of notes. No API. Locked into OpenAI's ecosystem. No graph, no temporal logic, no decay. It remembers that you like Python — it doesn't learn how your business operates over time.

Mem0

Interesting product, but $249/month for Pro. Every memory operation calls an LLM to decide what to store. That means latency, cost, and a dependency on external inference for something that should be pure database math. Vector + graph, but no forgetting, no temporal weighting, no dream cycle.

Zep / Graphiti

The best graph approach we found. Graphiti's bi-temporal model is genuinely excellent — we actually use it as inspiration for our knowledge graph layer. But Zep Pro is $50/month, the system only adds and never removes, and there's no consolidation or decay mechanism. Over time, the graph grows without bound.

MemGPT / Letta

Clever OS-inspired memory tiers (main memory, archival, recall). But no temporal awareness, no entity extraction, no pattern detection. It manages context windows well — it doesn't build institutional knowledge.

None of these systems forget. None of them consolidate. None of them predict. They store — they don't think about what they store.

Our approach: cognitive neuroscience

Instead of starting from database architecture, we started from neuroscience. How does a human brain actually handle information?

The answer is: aggressively. Your brain filters out 99% of sensory input before it ever reaches conscious awareness. Of what remains, most decays within minutes unless reinforced. Only emotionally significant, novel, or frequently accessed information makes it into long-term storage. And even then, memories are actively rewritten every time you recall them.

We modeled Agent Brain on four core principles from cognitive science:

Perception Gate — Not everything deserves to be stored. Human sensory gating filters irrelevant stimuli before they consume cognitive resources. We score every input on emotion, novelty, urgency, and source trust. Low-value information gets a low initial weight and decays fast.

Miller's Law (1956) — Working memory holds 7 ± 2 items. We enforce this literally. Our working memory buffer holds exactly 7 items. When an 8th arrives, the lowest-weight item gets evicted. This prevents token bloat and forces prioritization.

Ebbinghaus Decay — Hermann Ebbinghaus showed in 1885 that memory fades exponentially unless reinforced. We apply his forgetting curve to every memory node. Unimportant, unaccessed memories lose weight over time until they are purged.

Dream Cycle (REM parallel) — During sleep, the human brain consolidates memories, strengthens important connections, and discovers non-obvious links between experiences. Our Dream Cycle runs nightly at 02:00 and does exactly this: consolidation, decay, and creative synthesis.

The 7 layers

1. Perception Gate

Every incoming piece of data gets scored on four dimensions: emotion (-1 to +1), novelty (0-1), urgency (0-1), and source trust (0-1). High-emotion, high-trust inputs are anchored deeply. Routine greetings and small talk get low scores and decay quickly. This is sensory filtering — not everything deserves cognitive resources.

2. Working Memory

A 7-item buffer that holds the most immediately relevant information. Items decay in minutes if not reactivated. When the buffer is full, the lowest-weight item is evicted. This protects the agent's context window from bloat and ensures only high-stakes information is front and center.

3. Episodic Memory

The autobiographical record. Each memory stores content, timestamp, participants, embeddings, and weight. Built on pgvector with cosine similarity search. The key feature: reconsolidation. Every time a memory is retrieved, it gets updated with the current context. Frequently accessed memories become stronger and more precisely tuned.

4. Knowledge Graph

Entities (people, places, concepts) and their relationships, extracted automatically with spaCy NER. Not a static snapshot — a living web with bi-temporal logic. When relationships change, old edges are invalidated and new ones created. The graph grows with the agent's experience.

5. Procedural Memory

Patterns detected after 3+ repetitions. If a sequence of actions consistently leads to positive outcomes, it becomes encoded as institutional intuition. No hard-coded rules — the system learns what works by watching what works.

6. Predictive Engine

Analyzes temporal patterns every 60 minutes. If something happened at the same time last month, and the month before, the engine flags it as a prediction. Proactive alerts let the agent intervene before problems occur, not after.

7. Dream Cycle

Runs nightly at 02:00. Four operations: pattern consolidation (merge similar memories), Ebbinghaus decay (fade unimportant ones), creative synthesis (find non-obvious connections between unrelated entities), and rule derivation (promote consistent observations to procedural memory). This is what makes the system human-like.

Why forgetting is a feature

Every other memory system only adds. They never remove. This seems like an advantage until you realize what it means in practice: unbounded growth, deteriorating retrieval quality, and context windows stuffed with irrelevant historical noise.

Token bloat is real. If your agent has 50,000 memories and you inject the top 20 into every prompt, those 20 slots better contain the right information. A system that never forgets treats a casual Tuesday greeting with the same weight as a critical compliance deadline. The signal-to-noise ratio collapses.

We apply the Ebbinghaus forgetting curve to every memory:

weight = initial_weight * e^(-decay_rate * time_since_last_access)

Memories that are emotionally significant, frequently accessed, or recently reinforced maintain their weight. Everything else fades. The agent stays sharp, not bloated. It remembers what matters and lets go of what doesn't.

This is not data loss. This is curation. The same thing your brain does every single night.

The numbers

Memory infrastructure shouldn't cost more than the LLM it serves. Here's where Agent Brain sits:

Solution	Price	Notes
Agent Brain	CHF 29/mo	All 7 layers. Swiss hosting. Open source.
Mem0 Pro	$249/mo	LLM dependency. US hosting. Vendor lock-in.
Zep Pro	$50/mo	Strong graph. No forgetting. No dream cycle.
MemGPT / Letta	$100+/mo	Self-host required. GPU costs. No temporal logic.

The cost difference comes from architecture. Agent Brain runs on local sentence-transformers for embeddings and spaCy for entity extraction. No LLM calls for memory operations. Pure database math. That means no per-token costs, no inference latency, and no dependency on external APIs.

Try it

Agent Brain is open source under AGPLv3. Self-host it, fork it, audit the code. Or use the managed version at CHF 29/month and skip the infrastructure work.

Try the demo Read the docs View on GitHub

We built Agent Brain because we needed it ourselves — for our AI agents at Valtis AG. Now we're making it available to everyone. If you're building agents that need to remember, learn, and anticipate, this is the infrastructure layer we wish had existed when we started.