Memory your agents don't have to rebuild.
Most agents forget the moment a session ends. Crowkis gives them long-term memory that survives restarts, consolidates contradictions instead of hoarding them, and recalls the right fact at the right time — all from one self-hosted binary, with zero external API calls.
every fact a cell · click to ripple recall
The numbers, drawn
One donut and three gauges — the whole memory story in a glance, then the full bars below.
LoCoMo · how 70.4% recall@10 is built
- Found by bi-encoder25%
- Added by reranking45.4%
- Not recalled29.6%
The cross-encoder reranker contributes the largest slice — the lift from ~25% to 70.4%.
LongMemEval
recall@5 · oracle
LongMemEval
recall@5 · hard
Temporal
recall@5 · best type
A memory layer, not a pile of embeddings
A pile of embeddings is not memory. Real memory knows that “I moved to Berlin” retires “I live in Munich,” that a preference stated today outranks one from six months ago, and that the question “where do they live?” should surface the current answer — not all three.
Crowkis memory is scoped to (agent, user), bounded per user, consolidating by default, and bi-temporal: it remembers not just what is true, but what was believed true, and when. All of it persists across restarts and runs entirely on the bundled ONNX embedder.
What makes it memory
- Consolidating. A contradicting fact retires the old one — no stale pile-up.
- Bi-temporal. Recall what was believed true at any past instant.
- Per-user bounded. Scoped to (agent, user), capped so memory can't sprawl.
- Rerank-boosted. A cross-encoder sharpens recall over the top candidates.
- Graph edges. Subject→relation→object links, traversed multi-hop.
- Zero-egress. Bundled models. Nothing leaves your machine.
How the memory actually works
Six design decisions separate a memory layer from a vector dump. Each one is a default you can tune, not a black box you have to trust.
Facts flow left-to-right into the store; a question pulls them back through recall and reranking. Consolidation keeps the picture current; the graph keeps it connected.
Consolidation, not accumulation
When a new fact contradicts an old one above a similarity threshold, Crowkis retires the old version instead of storing both. 'Lives in Munich' becomes 'moved to Berlin' — and a later question gets the current answer, not all three.
Recency-blended relevance
Recall ranks facts by semantic relevance blended with recency, with a configurable half-life (30 days by default). A preference stated today outranks one from six months ago, even when both match the query.
Bi-temporal recall
Memory keeps validity windows, so you can ask what was believed true at any past instant. The agent can reason about the present and reconstruct the past — useful for audits, disputes, and 'what did we know when?'
Cross-encoder reranking
A second, sharper model re-scores the top candidates before they're returned. It's the single change that tripled LoCoMo recall — bounded to the top-K so the cost stays small.
Graph edges
Facts can be linked as subject→relation→object edges and traversed multi-hop, so 'who works at the customer's company?' is a graph walk, not a guess. Fan-out is bounded to 512 edges per (agent, user).
Bounded & durable
Memory is capped per user (500 facts by default) so it can't sprawl, and it persists across restarts — a rescheduled pod comes back remembering exactly what it knew.
Use cases that need real memory
Anywhere an agent should know something on the next session that it learned on the last one.
Remember the customer across tickets
Channel preference, past issues, account context — recalled on the next ticket without re-asking. Consolidation keeps 'their address' current as it changes.
Remember the codebase's conventions
Which test runner, which lint rules, which patterns the team rejected last week. The agent stops relearning your repo on every session.
Remember the human, not just the chat
Preferences, relationships, recurring context that should outlive a single conversation — recalled by meaning, ranked by recency.
Shared memory, isolated per tenant
A swarm of agents writes to one (agent, user)-scoped store with zero cross-tenant leakage — proven under 16-thread concurrency with zero leaks.
Memory that never leaves the building
Bundled models mean recall happens locally — no conversation shipped to a hosted memory API. CMEMFORGET executes erasure on request.
Recall past the context window
Semantic search over the whole history surfaces what was said forty messages ago, so 'as I mentioned earlier' actually works.
What we've measured, not just claimed
Every number here comes from an independent harness on public datasets, run on a CPU-only laptop with the bundled models. The full method is in the benchmark write-up.
The benchmarks that actually matter
SNAP Research's LoCoMo (1,986 QA pairs over multi-session dialogue) and LongMemEval. CPU-only laptop, bundled embedder, no cloud.
LoCoMo — retrieval recall@10
The cross-encoder reranker roughly triples overall recall — from ~25% to 70.4%.
LongMemEval-S (hard) — recall@5 by question type
~49 sessions · ~500 turns / question92.7% recall@5 in oracle mode; 84.3% on the stratified hard split — competitive with hosted memory services, with nothing leaving your machine.
How it compares to the dedicated memory tools
Mem0, Zep, and Letta are good at memory — but they're memory only, and most lean on a hosted API or an external model to do their work. Crowkis matches them on the memory features and adds the part nobody else has: it's also your cache, your guardrails, and your gateway — one self-hosted binary, with nothing leaving your machine.
| Capability | Crowkis | Mem0 | Zep | Letta |
|---|---|---|---|---|
| Runs fully self-hosted | ● | ◐ | ◐ | ● |
| Zero external API calls (local models) | ● | ○ | ○ | ◐ |
| Consolidates contradictions | ● | ● | ● | ◐ |
| Bi-temporal recall (as-of a past time) | ● | ◐ | ● | ◐ |
| Graph memory | ● | ● | ● | ◐ |
| Cross-encoder reranking | ● | ◐ | ◐ | ○ |
| Also a Redis-compatible cache | ● | ○ | ○ | ○ |
| Guardrails + evals built in | ● | ○ | ○ | ○ |
| Reasoning reuse | ● | ○ | ○ | ○ |
Honest take: on raw headline recall, the hosted leaders post strong numbers too. Crowkis wins on a different axis — comparable recall while running fully local with zero egress, and folding memory, semantic caching, guardrails, evals, and an AI gateway into one Redis-compatible process instead of four services to wire together.
Give your agents a memory that lasts.
Self-hosted, zero-egress, and free to run. Install it from the Usage page and the CMEM commands are live in seconds.