01
It matches meaning
“How do refunds work?” and “What's your refund window?” become one answer, not two bills.
The cache with a brain · built in Rust
Crowkis understands what your LLM is being asked — and serves the answer it already has, only when it's safe to. Your bill drops. Your users stop waiting.
docker pull crowkis/crowkis:latest · works with your Redis client
~0
lines of Rust, no GC pauses
0
integration tests in the suite
0
intent classes scored per query
0
anti-poisoning stages per write
0
protocols — RESP3 · gRPC · REST
0
image, every feature compiled in
01
“How do refunds work?” and “What's your refund window?” become one answer, not two bills.
02
Five checks gate every hit — wrong, stale, or poisoned answers never leave the cache.
03
Speaks Redis, gRPC, REST, and MCP. One Docker image, one port change, zero rewrites.
One cache · every door in
Python, Node, the Redis CLI, gRPC, REST, and MCP all plug into the same engine. Point a client at one port and you have a semantic cache — no rewrite, no new mental model.
The problem, in one line
The same questions, rephrased all day, billed at full price every time. The obvious fixes fail — exact-match caches miss the rephrasing, similarity caches serve answers they shouldn't. We built the cache that does neither.
Where it earns, in the real world
Six production workloads where teams deploy Crowkis today — each one a repetition engine wearing a product's clothes.
SaaS · e-commerce · fintech support desks
Refunds, resets, shipping windows — the same fifty intents in thousands of phrasings. The repeats become instant, free answers; only new questions reach the model.
→ the highest hit rates of any workload
HR, IT, and engineering assistants
Your whole company asks the same policy and how-to questions. One shared memory across Slack bots, portals, and IDE plugins — the first answer serves everyone.
→ one answer, four hundred askers
docs assistants · knowledge products
Retrieval is cheap; the synthesis step is the bill. Crowkis caches the finished answer, version-pinned to your docs, so popular questions skip the whole pipeline.
→ cache the synthesis, not just the chunks
automation · multi-agent platforms
Agents re-ask, re-plan, and re-fetch relentlessly. Semantic hits, reasoning reuse, and tool-call caching deflate the 10–50× call multiplier that breaks agent economics.
→ five agents, one model call
engineering teams on Claude Code & friends
Ten developers, one codebase, the same questions. Behind MCP, the team shares a local memory — doc lookups and code explanations stop billing per person.
→ one config block via MCP
consumer apps · voice assistants
At scale, traffic converges on shared intents while every millisecond counts. Sub-millisecond streamed hits keep the experience instant and the unit economics sane.
→ <1ms hits inside a 1s voice budget
Don't see yours? The Roost covers twenty more — browse by use case →
Adoption is one port change
Crowkis serves RESP3 — the Redis wire protocol — alongside gRPC and a REST management API. Point your existing client at port 6383 and you have a semantic cache. The Python and Node SDKs add get_or_compute, streaming, and multimodal helpers on top.
from crowkis import CrowkisClient
cache = CrowkisClient(host="127.0.0.1", port=6383, tenant="demo", model="gpt-4o")
# one call: serve from cache, or compute and store
answer = cache.get_or_compute(
"Explain vector caches",
lambda query: call_llm(query),
ttl=3600,
)Official Docker image
Community edition runs at full power with no license, no sign-up, no phone-home. Non-root, read-only, every capability dropped — before you ask.
$ docker pull crowkis/crowkis:latestCrowkis MCP · for AI apps
The binary ships an MCP server, so AI assistants and agents check the cache before burning tokens — repeated lookups become free, locally.
$ claude mcp add crowkis -- crowkis mcp
Two commands to a running instance. Your Redis client already knows how to talk to it.