One signed binary. Every feature compiled in. Free to run. Install Crowkis →
Crowkis · Agent Memory

Memory your agents don't have to rebuild.

Most agents forget the moment a session ends. Crowkis gives them long-term memory that survives restarts, consolidates contradictions instead of hoarding them, and recalls the right fact at the right time — all from one self-hosted binary, with zero external API calls.

70.4%
LoCoMo recall@10
92.7%
LongMemEval recall@5
lift from reranking
0
external API calls

every fact a cell · click to ripple recall

Recall at a glance

The numbers, drawn

One donut and three gauges — the whole memory story in a glance, then the full bars below.

LoCoMo · how 70.4% recall@10 is built

70.4%recall@10
  • Found by bi-encoder25%
  • Added by reranking45.4%
  • Not recalled29.6%

The cross-encoder reranker contributes the largest slice — the lift from ~25% to 70.4%.

92.7%

LongMemEval

recall@5 · oracle

84.3%

LongMemEval

recall@5 · hard

95.5%

Temporal

recall@5 · best type

Not a vector dump

A memory layer, not a pile of embeddings

A pile of embeddings is not memory. Real memory knows that “I moved to Berlin” retires “I live in Munich,” that a preference stated today outranks one from six months ago, and that the question “where do they live?” should surface the current answer — not all three.

Crowkis memory is scoped to (agent, user), bounded per user, consolidating by default, and bi-temporal: it remembers not just what is true, but what was believed true, and when. All of it persists across restarts and runs entirely on the bundled ONNX embedder.

What makes it memory

  • Consolidating. A contradicting fact retires the old one — no stale pile-up.
  • Bi-temporal. Recall what was believed true at any past instant.
  • Per-user bounded. Scoped to (agent, user), capped so memory can't sprawl.
  • Rerank-boosted. A cross-encoder sharpens recall over the top candidates.
  • Graph edges. Subject→relation→object links, traversed multi-hop.
  • Zero-egress. Bundled models. Nothing leaves your machine.
Under the hood

How the memory actually works

Six design decisions separate a memory layer from a vector dump. Each one is a default you can tune, not a black box you have to trust.

the memory brain — how a fact gets in, and back out

Facts flow left-to-right into the store; a question pulls them back through recall and reranking. Consolidation keeps the picture current; the graph keeps it connected.

01

Consolidation, not accumulation

When a new fact contradicts an old one above a similarity threshold, Crowkis retires the old version instead of storing both. 'Lives in Munich' becomes 'moved to Berlin' — and a later question gets the current answer, not all three.

02

Recency-blended relevance

Recall ranks facts by semantic relevance blended with recency, with a configurable half-life (30 days by default). A preference stated today outranks one from six months ago, even when both match the query.

03

Bi-temporal recall

Memory keeps validity windows, so you can ask what was believed true at any past instant. The agent can reason about the present and reconstruct the past — useful for audits, disputes, and 'what did we know when?'

04

Cross-encoder reranking

A second, sharper model re-scores the top candidates before they're returned. It's the single change that tripled LoCoMo recall — bounded to the top-K so the cost stays small.

05

Graph edges

Facts can be linked as subject→relation→object edges and traversed multi-hop, so 'who works at the customer's company?' is a graph walk, not a guess. Fan-out is bounded to 512 edges per (agent, user).

06

Bounded & durable

Memory is capped per user (500 facts by default) so it can't sprawl, and it persists across restarts — a rescheduled pod comes back remembering exactly what it knew.

Where it earns its place

Use cases that need real memory

Anywhere an agent should know something on the next session that it learned on the last one.

Support agents

Remember the customer across tickets

Channel preference, past issues, account context — recalled on the next ticket without re-asking. Consolidation keeps 'their address' current as it changes.

Coding agents

Remember the codebase's conventions

Which test runner, which lint rules, which patterns the team rejected last week. The agent stops relearning your repo on every session.

Personal assistants

Remember the human, not just the chat

Preferences, relationships, recurring context that should outlive a single conversation — recalled by meaning, ranked by recency.

Multi-agent systems

Shared memory, isolated per tenant

A swarm of agents writes to one (agent, user)-scoped store with zero cross-tenant leakage — proven under 16-thread concurrency with zero leaks.

Compliance-bound apps

Memory that never leaves the building

Bundled models mean recall happens locally — no conversation shipped to a hosted memory API. CMEMFORGET executes erasure on request.

Long-running chats

Recall past the context window

Semantic search over the whole history surfaces what was said forty messages ago, so 'as I mentioned earlier' actually works.

The receipts

What we've measured, not just claimed

Every number here comes from an independent harness on public datasets, run on a CPU-only laptop with the bundled models. The full method is in the benchmark write-up.

70.4%LoCoMo recall@10 — 3× the bi-encoder baseline
92.7%LongMemEval recall@5 (oracle); 84.3% on the hard split
95.5%recall@5 on temporal-reasoning questions
0cross-tenant leaks under 16-thread concurrency
29/29free features pass; 84/84 stress checks green
0external API calls — fully self-hosted, zero egress
Measured, not asserted

The benchmarks that actually matter

SNAP Research's LoCoMo (1,986 QA pairs over multi-session dialogue) and LongMemEval. CPU-only laptop, bundled embedder, no cloud.

LoCoMo — retrieval recall@10

bi-encoder + rerank
Overall70.4%
Single-hop73.7%
Temporal71.3%
Multi-hop67%
Open-domain47.8%

The cross-encoder reranker roughly triples overall recall — from ~25% to 70.4%.

LongMemEval-S (hard) — recall@5 by question type

~49 sessions · ~500 turns / question
Temporal reasoning95.5%
Knowledge update90.9%
Single-session user81.8%
Multi-session76.2%
Single-session preference60%

92.7% recall@5 in oracle mode; 84.3% on the stratified hard split — competitive with hosted memory services, with nothing leaving your machine.

Why Crowkis

How it compares to the dedicated memory tools

Mem0, Zep, and Letta are good at memory — but they're memory only, and most lean on a hosted API or an external model to do their work. Crowkis matches them on the memory features and adds the part nobody else has: it's also your cache, your guardrails, and your gateway — one self-hosted binary, with nothing leaving your machine.

CapabilityCrowkisMem0ZepLetta
Runs fully self-hosted
Zero external API calls (local models)
Consolidates contradictions
Bi-temporal recall (as-of a past time)
Graph memory
Cross-encoder reranking
Also a Redis-compatible cache
Guardrails + evals built in
Reasoning reuse
yes partial / plan-dependent noAs of June 2026; competitor features vary by plan & version.

Honest take: on raw headline recall, the hosted leaders post strong numbers too. Crowkis wins on a different axis — comparable recall while running fully local with zero egress, and folding memory, semantic caching, guardrails, evals, and an AI gateway into one Redis-compatible process instead of four services to wire together.

Give your agents a memory that lasts.

Self-hosted, zero-egress, and free to run. Install it from the Usage page and the CMEM commands are live in seconds.