The cache with a brain · built in Rust

STOPPAYING TWICEFOR THE SAME
ANSWER.

Crowkis understands what your LLM is being asked — and serves the answer it already has, only when it's safe to. Your bill drops. Your users stop waiting.

Run it free See the problem

docker pull crowkis/crowkis:latest · works with your Redis client

CAW

crowkis cli — redis clients work too

semantic hit · 0.4ms

poison blocked · stage 3

saved today · $1,240.80

lines of Rust, no GC pauses

integration tests in the suite

intent classes scored per query

anti-poisoning stages per write

protocols — RESP3 · gRPC · REST

image, every feature compiled in

It matches meaning

“How do refunds work?” and “What's your refund window?” become one answer, not two bills.

It refuses unsafe reuse

Five checks gate every hit — wrong, stale, or poisoned answers never leave the cache.

It drops into your stack

Speaks Redis, gRPC, REST, and MCP. One Docker image, one port change, zero rewrites.

One cache · every door in

Whatever you already use, it already speaks.

Python, Node, the Redis CLI, gRPC, REST, and MCP all plug into the same engine. Point a client at one port and you have a semantic cache — no rewrite, no new mental model.

The problem, in one line

Most of your LLM bill
is reruns.

The same questions, rephrased all day, billed at full price every time. The obvious fixes fail — exact-match caches miss the rephrasing, similarity caches serve answers they shouldn't. We built the cache that does neither.

The full story, with diagrams →

“how do refunds work?”→ paid compute

“what's the refund window?”→ paid again

“refund timeline?”→ paid again

with crowkis→ one bill, three hits

Where it earns, in the real world

If your app answers questions, Crowkis pays for itself.

Six production workloads where teams deploy Crowkis today — each one a repetition engine wearing a product's clothes.

SaaS · e-commerce · fintech support desks

Customer support bots

Refunds, resets, shipping windows — the same fifty intents in thousands of phrasings. The repeats become instant, free answers; only new questions reach the model.

→ the highest hit rates of any workload

HR, IT, and engineering assistants

Internal copilots

Your whole company asks the same policy and how-to questions. One shared memory across Slack bots, portals, and IDE plugins — the first answer serves everyone.

→ one answer, four hundred askers

docs assistants · knowledge products

RAG applications

Retrieval is cheap; the synthesis step is the bill. Crowkis caches the finished answer, version-pinned to your docs, so popular questions skip the whole pipeline.

→ cache the synthesis, not just the chunks

automation · multi-agent platforms

Agent fleets

Agents re-ask, re-plan, and re-fetch relentlessly. Semantic hits, reasoning reuse, and tool-call caching deflate the 10–50× call multiplier that breaks agent economics.

→ five agents, one model call

engineering teams on Claude Code & friends

AI coding assistants

Ten developers, one codebase, the same questions. Behind MCP, the team shares a local memory — doc lookups and code explanations stop billing per person.

→ one config block via MCP

consumer apps · voice assistants

High-traffic chat & voice

At scale, traffic converges on shared intents while every millisecond counts. Sub-millisecond streamed hits keep the experience instant and the unit economics sane.

→ <1ms hits inside a 1s voice budget

Don't see yours? The Roost covers twenty more — browse by use case →

Adoption is one port change

It speaks Redis, so your code already speaks Crowkis.

Crowkis serves RESP3 — the Redis wire protocol — alongside gRPC and a REST management API. Point your existing client at port 6383 and you have a semantic cache. The Python and Node SDKs add get_or_compute, streaming, and multimodal helpers on top.

Python SDK Node SDK

from crowkis import CrowkisClient

cache = CrowkisClient(host="127.0.0.1", port=6383, tenant="demo", model="gpt-4o")

# one call: serve from cache, or compute and store
answer = cache.get_or_compute(
    "Explain vector caches",
    lambda query: call_llm(query),
    ttl=3600,
)

Official Docker image

Free. Hardened. One pull away.

Community edition runs at full power with no license, no sign-up, no phone-home. Non-root, read-only, every capability dropped — before you ask.

$ docker pull crowkis/crowkis:latest

then one docker run — full guide on the Docker page

The Docker guide

Crowkis MCP · for AI apps

Claude Code asks. The cache answers.

The binary ships an MCP server, so AI assistants and agents check the cache before burning tokens — repeated lookups become free, locally.

$ claude mcp add crowkis -- crowkis mcp

two minutes in any MCP-capable app

Set up MCP

Your LLM bill has a cache-shaped hole in it.

Two commands to a running instance. Your Redis client already knows how to talk to it.

Start with Docker Read the quickstart

STOPPAYING TWICEFOR THE SAMEANSWER.