Notes from the nest · 161 posts

The Roost

Engineering notes written by the people building Crowkis. Comparisons with everything else, use cases, economics, internals, security, operations — and nothing written to rank on a search engine.

latestJune 24, 2026· 9 min· benchmarks

How good is Crowkis agent memory, really? The LoCoMo and LongMemEval numbers

We ran Crowkis memory against two public, hostile retrieval benchmarks — SNAP's LoCoMo and LongMemEval — on a laptop with no cloud calls. Here are the recall numbers, by question type, with the reranker on and off.

Read it →

benchmarksJun 24, 2026· 9m

How good is Crowkis agent memory, really? The LoCoMo and LongMemEval numbers

guidesJun 23, 2026· 8m

The crowkis CLI: every subcommand, with the flags that matter

The binary is the whole product — server, REPL, doctor, bench, and the inspect tools. A tour of the crowkis command line, from cold start to debugging a missed hit.

referenceJun 22, 2026· 3m

How to use CSET: store an answer the safe way

CSET writes an answer into the semantic cache — running the five-stage anti-poisoning pipeline before it accepts anything.

featuresJun 22, 2026· 3m

CGUARD: an input guardrail that survives leetspeak and zero-width tricks

Prompt injection rarely arrives in plain English. CGUARD normalizes the evasion first — whitespace, leetspeak, zero-width characters — then scans for jailbreaks, overrides, and system-prompt exfiltration.

guidesJun 21, 2026· 5m

Cache an LLM call in three lines: the Python SDK

The Python SDK wraps the semantic cache in an ergonomic client — get-or-compute, streaming, tenants, models. Here's the three-line version and the production version.

referenceJun 21, 2026· 3m

How to use CGET: a lookup that matches meaning

CGET finds a cached answer by meaning, not exact bytes — and can return the confidence behind the hit so you decide whether to trust it.

referenceJun 20, 2026· 3m

How to use CSIM: score how close two queries are

CSIM returns the semantic similarity between two strings — the primitive behind every hit decision, exposed so you can calibrate thresholds.

featuresJun 20, 2026· 3m

COUTCHECK: catching the PII leak and the toxic line before your user does

The model's output is the other trust boundary. COUTCHECK scans responses for PII leakage and toxicity, and optionally validates JSON, returning a structured verdict you can act on.

guidesJun 19, 2026· 5m

A drop-in CachedOpenAI for Node, in one wrapper

The Node SDK ships a typed client and a CachedOpenAI wrapper — keep your OpenAI calls exactly as they are, and a semantic cache slips in underneath.

referenceJun 19, 2026· 3m

How to use CVECCOUNT: see how many vectors are live

CVECCOUNT returns the live entry count in the vector index — the quickest health signal for a semantic cache.

referenceJun 18, 2026· 3m

How to use CFLUSH: clear the semantic cache, by tenant

CFLUSH empties the semantic cache — globally, or scoped to a single tenant so one customer's reset doesn't touch another's.

featuresJun 18, 2026· 3m

CEVAL: nine evaluators that grade your LLM output without a second LLM

LLM-as-judge is expensive and leaks your data. CEVAL ships nine deterministic evaluators — toxicity, PII, injection-safety, relevance, JSON validity and more — that score input/output pairs locally and track the results over time.

guidesJun 17, 2026· 5m

Giving an agent memory from Python: CMEM in practice

The memory commands from application code — store facts, recall them semantically, and watch consolidation retire the stale ones. A worked example in Python.

referenceJun 17, 2026· 3m

How to use CTHINK and CREUSE: bank a chain of thought

CTHINK stores a reasoning trace as a reusable step graph; CREUSE fetches the matching plan for a new query at a fraction of the token cost.

benchmarksJun 16, 2026· 7m

Where the milliseconds go: an honest latency profile

A semantic cache hit isn't free — it has to embed your query first. We measured every operation's percentiles so you know exactly what you're paying for, and where the cache engine itself is microsecond-fast.

referenceJun 16, 2026· 3m

How to use CSTALE: serve slightly-old, refresh behind it

CSTALE returns a cached answer even past its TTL, flagged as stale — so an expired entry is a snappy answer plus a refresh signal, not a cold miss.

referenceJun 15, 2026· 3m

How to use CINVALIDATE: purge by meaning, preview first

CINVALIDATE clears entries whose meaning matches a natural-language instruction — and previews exactly what it would remove until you add COMMIT.

featuresJun 15, 2026· 3m

CPROMPT: version your prompts and A/B test them like code

Prompts are production logic edited like sticky notes. CPROMPT gives them named templates, automatic versioning, rollback, variable rendering, and sticky weighted A/B splits — all surviving restart.

guidesJun 14, 2026· 5m

Guardrails in your request path: CGUARD and COUTCHECK in code

Two commands wrap your model call in an input and an output gate — prompt-injection scanning before, PII and toxicity scanning after. No second model, no egress.

referenceJun 14, 2026· 3m

How to use CWHYEVICT: ask why an entry would be dropped

CWHYEVICT explains the retention maths for an entry — recency, frequency, isolation, and cost — so eviction is auditable, not mysterious.

referenceJun 13, 2026· 3m

How to use CFLAG and CCHECKBAD: a memory for wrong answers

CFLAG records a known-bad answer in the negative cache; CCHECKBAD catches every paraphrase of the question that would reproduce it.

featuresJun 13, 2026· 3m

CDOC: a self-hosted RAG store that chunks, filters, and reranks

You don't always need a separate vector database to do retrieval. CDOC is a mini RAG store inside Crowkis — auto-chunking, metadata filtering, and optional cross-encoder reranking — sharing the cache's embedder.

benchmarksJun 12, 2026· 6m

The throughput ceiling we won't hide — and the fix

On v0.2.2, throwing 16 threads at Crowkis got the same throughput as one. That's a real ceiling, we found it in our own harness, and here's both why it happened and how the embedding-deferral work lifts it.

referenceJun 12, 2026· 3m

How to use CPIN: serve a human-approved answer verbatim

CPIN pins a golden answer that's served word-for-word for matching questions, with an audit trail of who approved it.

referenceJun 11, 2026· 3m

How to use CSOURCE: tie answers to their source and cascade-purge

CSOURCE links cache entries to the source they derived from, so when the source changes you can purge everything built on it in one move.

featuresJun 11, 2026· 3m

CSESSION: conversation buffers with semantic recall built in

Chat history is more than the last N turns. CSESSION stores a multi-turn buffer per session, bounded and TTL'd, with both recent-window reads and semantic search across the whole conversation.

guidesJun 10, 2026· 5m

Self-hosted RAG in twenty lines with CDOC

Add documents, auto-chunk them, search with metadata filters and reranking — a working retrieval pipeline without a separate vector database.

referenceJun 10, 2026· 3m

How to use CTOOLSET and CTOOLGET: cache a tool call

CTOOLSET caches a tool result keyed by tool plus exact arguments; CTOOLGET returns it, so a deterministic call runs once and serves many.

referenceJun 9, 2026· 3m

How to use CDOC: a RAG store in the CLI

CDOC adds documents with auto-chunking and metadata, then searches them with filters and optional reranking — no separate vector database.

engineeringJun 9, 2026· 3m

Why Crowkis is Rust all the way down

A cache lives in the hot path of every request. The language choice isn't aesthetic — it's the difference between predictable microseconds and mystery pauses.

featuresJun 9, 2026· 3m

CPIN: golden answers that are served verbatim, with an audit trail

For the questions where 'close enough' is unacceptable — pricing, legal, brand lines — CPIN serves a human-approved answer verbatim, records who approved it, and never lets the model improvise.

benchmarksJun 8, 2026· 6m

84 of 84: correctness and isolation under a hostile harness

Empty strings, 100 KB values, null bytes, emoji, 16 threads hammering across tenants. The stress harness throws 84 nasty checks at Crowkis and counts the cross-tenant leaks. The leak count is zero.

referenceJun 8, 2026· 3m

How to use CSESSION: a conversation buffer with recall

CSESSION stores a multi-turn conversation, reads the recent window, and semantically searches the whole thing — so 'as I mentioned earlier' works.

vs the fieldJun 8, 2026· 3m

Crowkis vs Redis: same protocol, different century

Redis is magnificent infrastructure for exact-match workloads. LLM traffic isn't one. Here's why speaking the same protocol doesn't mean solving the same problem.

operationsJun 8, 2026· 3m

The five-minute deploy, timed honestly

Pull, run, first hit in the dashboard — with no config file, no signup, and no environment variables you're required to set. We timed it. It holds.

referenceJun 7, 2026· 3m

How to use the CMEM commands: long-term agent memory

CMEMSET stores a fact scoped to (agent, user); CMEMGET recalls by meaning, recency-blended; consolidation retires contradictions automatically.

use casesJun 7, 2026· 3m

Support bots are the single best caching workload in software

Nowhere else do thousands of people ask the same fifty questions, all day, in every phrasing imaginable. Crowkis was practically designed in a support queue.

featuresJun 7, 2026· 3m

CFLAG and CCHECKBAD: a memory for the answers that were wrong

Most caches only remember good answers. Crowkis also remembers bad ones — flag a hallucinated or harmful response once, and CCHECKBAD catches every paraphrase of the question that would have reproduced it.

guidesJun 6, 2026· 5m

A/B testing prompts in production with CPROMPT

Version a prompt, split traffic across versions with sticky per-user bucketing, render variables, and roll back — without a deploy or a feature-flag service.

referenceJun 6, 2026· 3m

How to use CGUARD: scan input for prompt injection

CGUARD checks a prompt for jailbreaks and injections, normalizing leetspeak and zero-width tricks first, and returns a verdict, category, and match.

economicsJun 6, 2026· 3m

The token math of repetition: what your duplicate questions actually cost

Take your daily query volume, multiply by the repeat fraction, multiply by your blended price per call. That number, twelve times a year, is the cache argument.

securityJun 6, 2026· 3m

Prompt injection meets your cache: the attack nobody threat-modeled

Injected instructions in one response become served truth for every similar query — unless the cache can smell an answer that doesn't answer.

referenceJun 5, 2026· 3m

How to use COUTCHECK: scan output for leaks before you send it

COUTCHECK scans a response for PII and toxicity, optionally validating JSON, and returns the entities found so you can redact, block, or regenerate.

vs the fieldJun 5, 2026· 3m

Crowkis vs GPTCache: the difference between a library and infrastructure

GPTCache proved developers want semantic caching. Crowkis is what happens when that idea grows up, moves out of your Python process, and gets a security model.

operationsJun 5, 2026· 3m

Upgrades as non-events: the binary-swap contract

docker pull, restart, done — no schema migrations, no export/import, no upgrade runbook. The on-disk format is a stability promise, not an implementation detail.

featuresJun 5, 2026· 3m

CSOURCE: answer lineage and cascade-purge when the source changes

Cached answers derive from sources — a doc, a config, an API. CSOURCE ties answers to their origin so that when the source changes, every answer built on it can be purged in one move.

benchmarksJun 4, 2026· 5m

The 150-second stall we found in our own benchmark

CDEDUP works — and at 1,340 vectors it froze the whole server for 150 seconds in our harness. Here's the honest finding, why it happens, and what it means for how you should run dedup.

referenceJun 4, 2026· 3m

How to use CEVAL: grade output without a second model

CEVAL runs deterministic evaluators — toxicity, PII, relevance, JSON validity and more — over an input/output pair, and tracks the results on /metrics.

use casesJun 4, 2026· 3m

Internal copilots: your whole company asks the same questions

HR policy, expense rules, deploy commands, VPN setup — every employee rediscovers them through your copilot, billed per discovery. Give the company one memory.

referenceJun 3, 2026· 3m

How to use CPROMPT: version and A/B test prompts

CPROMPT stores named prompt templates with versioning, variable rendering, sticky A/B splits, and rollback — all from the CLI, all surviving restart.

use casesJun 3, 2026· 3m

RAG apps: cache the synthesis, not just the retrieval

Your vector store finds the chunks fast. Then the model re-synthesizes the same answer from the same chunks, thousands of times. That second step is the bill.

securityJun 3, 2026· 3m

The supply-chain argument, made carefully

After the 2026 gateway compromise, 'how many packages are in your hot path?' became a real procurement question. Our answer is a number: zero.

featuresJun 3, 2026· 3m

CTOOLSET: cache the tool call so the agent stops paying for it twice

Agents call the same tools with the same arguments constantly. CTOOLSET and CTOOLGET cache tool results keyed by tool plus exact arguments, so a deterministic call runs once and serves many.

engineeringJun 2, 2026· 8m

How Crowkis earned the right to sit in your critical path

347 integration tests, a smoke suite that kills the process on purpose, and a Docker image hardened before anyone asked. The receipts behind 'production-ready.'

guidesJun 2, 2026· 4m

Let Claude Code use the cache: Crowkis over MCP

One config block turns Crowkis into a tool an AI assistant can hold — check the cache, store the answer — over MCP, with the same trust pipeline as every other write.

referenceJun 2, 2026· 3m

How to use CBUDGET: read per-tenant spend and alerts

CBUDGET reports token and dollar consumption per tenant in real time, and surfaces the tenants approaching or crossing their thresholds.

economicsJun 2, 2026· 3m

Why Crowkis refuses to meter you

A cache exists to make costs predictable. Metering the cache would be self-defeating. So Community is free and Enterprise is flat per cluster — priced on a call, not a meter.

operationsJun 2, 2026· 3m

Three windows into one cache: dashboard, Prometheus, logs

The built-in dashboard for humans, /metrics for your Grafana, one JSON line per event for your pipeline — same truth, three consumers, zero adapters.

referenceJun 1, 2026· 3m

How to use CDEDUP: collapse near-duplicate answers

CDEDUP finds entries that mean the same thing and collapses them, reporting clusters and memory reclaimed — best run as off-peak maintenance.

vs the fieldJun 1, 2026· 3m

Crowkis vs LiteLLM-style gateways: caching is not a checkbox

Python gateways treat caching as one feature among forty. Crowkis treats it as the product — and ships it without a Python supply chain attached.

engineeringJun 1, 2026· 3m

HNSW without the network hop: why the vector index lives inside the engine

Most semantic caches call out to a vector database. Crowkis embeds the HNSW graph in-process — and that placement decision is worth more than any algorithm tweak.

featuresJun 1, 2026· 3m

CINVALIDATE: purge the cache by meaning, with a preview before you commit

Sometimes you need to clear 'everything about the old pricing' — a fuzzy, semantic set. CINVALIDATE takes a natural-language instruction, previews what it would purge, and only acts on COMMIT.

benchmarksMay 31, 2026· 5m

Does the vector index go cold under churn? We tried to break it

A v0.2.1 bug let the HNSW index go cold under heavy write-and-flush churn — semantic search silently stopped finding neighbours. Here's the soak test that reproduces it and proves v0.2.2 fixed it.

referenceMay 31, 2026· 3m

How to use CINFO: the Crowkis-flavoured INFO

CINFO returns server, cache, savings, security, db, and license sections in one call — the fastest read on what the cache is doing right now.

use casesMay 31, 2026· 3m

Agent fleets are token furnaces. Crowkis is the heat exchanger.

Agents re-ask, re-plan, and re-fetch with industrial enthusiasm. Multiply by a fleet and you get the most cacheable traffic in existence — if the cache understands agents.

securityMay 31, 2026· 3m

Tenant isolation as physics, not policy

A WHERE clause is a promise; a namespace is a wall. How Crowkis makes cross-tenant leakage structurally impossible rather than procedurally unlikely.

economicsMay 30, 2026· 3m

Replay: the demo that uses your data instead of our slides

Every cache vendor promises a hit rate. Crowkis Replay computes yours — on your real queries, before you spend anything. The pitch is a number with your name on it.

operationsMay 30, 2026· 3m

Crowkis on Kubernetes: a well-behaved citizen

One container, a PVC, real health probes, hard memory bounds, graceful shutdown. Everything your cluster expects from a tenant that's read the manual.

featuresMay 30, 2026· 3m

CSTALE: serve the slightly-old answer now, refresh it behind the scenes

A hard TTL turns a one-second-expired answer into a full model call. CSTALE serves the cached answer past its TTL with a stale flag, so you choose freshness versus latency per request.

engineeringMay 29, 2026· 3m

The write-ahead log: how the cache survives a kill -9

Durability isn't a checkbox — it's a sequence of writes in the right order with checksums at every step. Here's the boring machinery that makes restarts uneventful.

vs the fieldMay 28, 2026· 3m

Crowkis vs Portkey: the gateway routes, the cache remembers

Portkey is a control panel for LLM calls. Crowkis is the memory underneath them. Confusing the two costs you the savings both promise.

securityMay 28, 2026· 3m

PII in a cache: scrub, isolate, erase, prove

Users put personal data in prompts whether you like it or not. The cache's job is a full lifecycle: keep it out of shared entries, find it on demand, erase it provably.

featuresMay 28, 2026· 3m

CBUDGET: per-tenant spend you can see before the invoice does

Token spend is usually a month-end surprise. CBUDGET tracks per-tenant token and dollar consumption in real time and surfaces alerts, so a runaway tenant is a notification, not a billing shock.

use casesMay 27, 2026· 3m

AI coding assistants: the cache your team didn't know it was sharing

Every developer on your team asks the assistant the same questions about the same codebase. With Crowkis behind MCP, the second ask is free for everyone.

engineeringMay 27, 2026· 3m

Bloom filters: how the engine knows what it doesn't know

The fastest disk read is the one that never happens. A few bits per key let Crowkis skip files that can't contain your answer — at a 1% false-positive cost we chose on purpose.

benchmarksMay 26, 2026· 6m

A million vectors on a laptop: the honest vector-search numbers

Crowkis is a cache with a vector index, not a vector database — but it should still hold up at scale. We indexed 100K and 1M vectors and measured build time, search latency, and recall. Including where dedicated vector DBs still win.

economicsMay 26, 2026· 3m

Budgets with teeth: why your LLM spend needs a circuit breaker

Every team has a runaway-loop story that ends with a shocking invoice. Per-key budgets with hard TPM and dollar walls end the genre.

operationsMay 26, 2026· 3m

Running a model canary: the operator's walkthrough

Slice the traffic, compare against cached baselines, promote or retreat — model upgrades as a controlled experiment with the cache as your measuring instrument.

vs the fieldMay 25, 2026· 3m

Crowkis vs Helicone-style observability: seeing the waste isn't saving it

Observability tools show you beautiful charts of money leaving. Crowkis is the component that makes the chart go down.

securityMay 25, 2026· 3m

Fail closed: why misconfiguring Crowkis locks it instead of opening it

Most self-hosted breaches are defaults, not exploits. Crowkis inverts the failure direction: forget to configure auth and you get a locked deployment, not an open one.

featuresMay 25, 2026· 3m

The AI Gateway: a semantic cache in front of any OpenAI-compatible API

Point your existing OpenAI client at Crowkis and change nothing else. The gateway proxies /v1/chat/completions, serves semantic hits without an upstream call, and adds retries, routing, and rate limits.

use casesMay 24, 2026· 3m

E-commerce assistants: catalog questions on repeat, margins on the line

Shipping times, return windows, size guides, 'does this come in blue?' — commerce traffic is seasonal, spiky, and gloriously repetitive. Cache accordingly.

engineeringMay 24, 2026· 3m

Twelve intents: why the cache treats a poem differently from a fact

One similarity threshold for all traffic is how caches embarrass themselves. Crowkis classifies every query into one of twelve intents, each with its own rules of reuse.

economicsMay 23, 2026· 3m

Latency is money: the second invoice nobody itemizes

Every multi-second model wait is paid twice — once in tokens, once in user patience. The cache refunds both, but only one shows up in accounting.

operationsMay 23, 2026· 3m

Fallback routing: surviving your provider's bad day

Providers have incidents; your product doesn't have to. Health-aware backend routing plus a warm cache turns upstream outages into degraded modes users barely notice.

featuresMay 23, 2026· 3m

CEMBED: free local embeddings, cached, with no API key

Embeddings usually mean an API key and a per-token bill. CEMBED turns text into vectors using the bundled ONNX model, locally, for free — and caches repeats so the second call is instant.

vs the fieldMay 22, 2026· 3m

Crowkis vs Pinecone: a vector database is not a cache

Pinecone answers 'what's similar?'. A production cache must answer 'is this safe to serve?'. Those are different questions with different architectures.

securityMay 22, 2026· 3m

The trust ledger: institutional memory for an immune system

Every accept and refuse, per source, append-only. Trust with memory changes attacker economics — and gives auditors the artifact they actually want.

use casesMay 21, 2026· 3m

EdTech tutors: a thousand students, one curriculum, one cache

Every cohort asks why the quadratic formula works. Teach the model once per concept, not once per student — while keeping personalized work personal.

engineeringMay 21, 2026· 3m

Structural templates: the matching layer vectors can't see

Embeddings blur exactly where caches need precision — numbers, dates, entities. Template abstraction catches what cosine similarity structurally cannot.

economicsMay 20, 2026· 3m

Before you downgrade the model, cache the good one

Cost pressure pushes teams toward cheaper, dumber models. Caching offers the opposite trade: keep frontier quality, pay small-model prices on the traffic that repeats.

operationsMay 20, 2026· 3m

Memory governance: a cache that respects its container

CROWKIS_MEMORY_LIMIT means what it says — no GC mood swings, no mystery RSS, eviction that engages before the kernel has opinions.

featuresMay 20, 2026· 3m

Caching what the model saw: multimodal image-plus-text lookups

Vision queries are expensive and repetitive — the same product photo, the same screenshot, asked about again and again. Crowkis caches image-plus-text lookups so a repeated visual question is a hit.

vs the fieldMay 19, 2026· 3m

Crowkis vs Weaviate, Qdrant, and Milvus: stop assembling your cache from parts

Every DIY semantic cache is a vector database, a Redis, a cron job, and a prayer. Crowkis is the version where the parts were designed for each other.

securityMay 19, 2026· 3m

Air-gapped by design: AI caching where the internet isn't invited

No phone-home, offline license verification, one binary. The deployment story for networks that treat outbound packets as incidents.

engineeringMay 18, 2026· 11m

Why we wrote our own LSM tree instead of bolting onto RocksDB

Every sane checklist says don't write your own storage engine. We did it anyway. Here's the actual reasoning, the architecture, and the parts that were painful.

use casesMay 18, 2026· 3m

Healthcare AI: caching under HIPAA without holding your breath

Clinical-adjacent assistants repeat administrative and informational answers constantly — but every cached byte is regulated. This is what compliance-mode caching looks like.

engineeringMay 18, 2026· 3m

Reasoning reuse: caching how the model thinks, not just what it says

Chain-of-thought tokens are the most expensive ones you buy. Crowkis extracts the thought's skeleton, abstracts the specifics, and recomposes it for the next input that shares its shape.

economicsMay 17, 2026· 3m

Agent unit economics: making the per-task math survive contact with reality

Agents multiply model calls per user action by 10–50x. Without aggressive reuse, the unit economics of agentic products simply don't close.

operationsMay 17, 2026· 3m

A tour of the dashboard: six panels, zero mysteries

Live verdicts, hit-type economics, top misses, safety blocks, tenant accounting, system pressure — what each panel answers and who keeps it open.

featuresMay 17, 2026· 3m

Confidence scoring: every hit arrives with a number you can gate on

A cache that only says 'hit' or 'miss' makes you trust it blindly. Crowkis returns a confidence score per hit — a geometric mean of five signals — so you decide the bar reuse must clear.

vs the fieldMay 16, 2026· 3m

Crowkis vs pgvector: your database deserves better than your cache traffic

pgvector is a lovely extension for storing embeddings next to your data. Routing every LLM query through Postgres is how lovely things die.

securityMay 16, 2026· 3m

Compliance modes: HIPAA, SOC2, GDPR-EU, FedRAMP as configuration

Each regime wants specific retention, audit, and erasure behavior. Enterprise compliance modes preset the whole posture, so the auditor's checklist maps to a flag.

use casesMay 15, 2026· 3m

Fintech assistants: fast answers, frozen correctness

Money questions repeat endlessly and tolerate zero staleness. Fintech is where freshness control stops being a feature and becomes the product.

engineeringMay 15, 2026· 3m

Eviction with a ledger: why LRU is the wrong instinct for an LLM cache

LRU evicts by recency and nothing else. But cache entries have wildly different replacement costs — and forgetting a $0.40 answer to keep a $0.0004 one is just bad accounting.

economicsMay 14, 2026· 3m

Why Community is actually free: the honest economics of our free tier

Full engine, production use, no license, no meter, no time bomb. Here's why giving the small end away is the rational structure, not a teaser.

operationsMay 14, 2026· 3m

The world's shortest cache runbook

Fail-open design means most 'incidents' are the absence of savings, not the presence of errors. Here's the whole decision tree, which fits on an index card.

featuresMay 14, 2026· 3m

Adaptive thresholds: the cache tunes its own reuse bar over time

A fixed similarity threshold is wrong the day after you set it. Crowkis uses a three-tier scheme — per-intent base, complexity adjustment, and an EMA feedback loop — that learns the right bar and persists it.

vs the fieldMay 13, 2026· 3m

Crowkis vs Momento: your cache shouldn't bill like the thing it's saving you from

Serverless caches meter every operation. A cache that charges per request in front of an API that charges per request is a strange kind of savings.

securityMay 13, 2026· 3m

Four doors, four locks: the authentication architecture

RESP, gRPC, REST, and the dashboard each get auth that fits their use — constant-time tokens for the data plane, RBAC for the control plane, mandatory locks past loopback.

use casesMay 12, 2026· 3m

Government and defense: the cache that works where the internet doesn't

Air-gapped networks, FedRAMP postures, and zero phone-home tolerance rule out most AI infrastructure on page one. Crowkis was designed to pass that page.

engineeringMay 12, 2026· 3m

Five TTL policies: engineering the shelf life of truth

Answers age at different speeds — prices in days, math never. A single TTL knob can't express that, so Crowkis ships five policies plus version pinning and webhooks.

economicsMay 11, 2026· 3m

The CFO pitch: explaining the cache to the person who signs things

Three sentences, one dashboard number, and a flat price. The rare infrastructure purchase that finance understands faster than engineering does.

operationsMay 11, 2026· 3m

Boring on purpose: the operational philosophy

Exciting infrastructure is a contradiction in terms. Every Crowkis design decision optimizes for the same review: 'it just runs.'

featuresMay 11, 2026· 3m

CDEDUP: collapsing the answers that mean the same thing

A semantic cache slowly accumulates near-duplicate answers. CDEDUP finds the clusters that mean the same thing and collapses them, reclaiming memory — and Crowkis is honest about its cost.

vs the fieldMay 10, 2026· 3m

Crowkis vs ElastiCache: managed Redis is still Redis

AWS will happily run an exact-match cache for you at any scale. It will miss your LLM traffic at any scale, too.

securityMay 10, 2026· 3m

Closed-source as a security posture, argued honestly

'Many eyes' assumes the eyes show up. For your hot path, a signed single binary with zero dependencies is a smaller attack surface than a thousand auditable packages nobody audits.

use casesMay 9, 2026· 3m

Multi-tenant SaaS: one cache, many customers, zero leaks

Caching across customers multiplies savings and multiplies risk. Tenant isolation has to be architecture, not a WHERE clause.

engineeringMay 9, 2026· 3m

Why we kept the Redis protocol instead of inventing an API

Every new API is a tax on adoption: clients, docs, muscle memory, tooling. RESP3 meant inheriting twenty years of all four on day one.

economicsMay 8, 2026· 3m

Provider arbitrage: paying frontier prices only for frontier questions

Model prices vary 50x for overlapping quality on easy queries. The arbitrage router exploits the spread automatically, with a quality bar you set per intent.

featuresMay 8, 2026· 3m

CPII: scrubbing personal data and honouring the right to be forgotten

A cache of LLM traffic is a cache of whatever users typed — including PII. CPII reports what personal data is present and executes right-to-erasure, so compliance is a command, not a project.

vs the fieldMay 7, 2026· 3m

Crowkis vs Memcached: a beautiful fossil meets a new workload

Memcached is the purest cache ever written — and purity is exactly the problem when your keys are sentences.

use casesMay 6, 2026· 3m

Startups: your LLM bill is eating runway you'll want back

Seed-stage AI products routinely spend salary-sized sums recomputing known answers. Free Community edition exists precisely for this moment of your company.

engineeringMay 6, 2026· 3m

One actor, no locks across await: the concurrency design

Crowkis serves thousands of connections through async IO — then funnels every cache decision through a single deterministic actor. Here's why that's a feature.

economicsMay 5, 2026· 3m

The hidden invoice of a cold cache: what model migrations really cost

Swap models with a normal cache and you re-purchase your entire corpus at the new model's prices. Migration leasing is the line item that prevents the line item.

featuresMay 5, 2026· 3m

CINFO and the dashboard: a cache you can actually watch work

Infrastructure you can't observe is infrastructure you don't trust. CINFO and the built-in dashboard expose hit rate, saved spend, safety blocks, memory pressure, and license state in real time.

vs the fieldMay 4, 2026· 3m

Crowkis vs Dragonfly, Valkey, and KeyDB: faster exact-matching is still exact-matching

The new Redis-compatibles race each other on throughput. On LLM traffic they all hit the same wall at full speed: the keys never repeat.

use casesMay 3, 2026· 3m

Platform teams: make caching a paved road, not a per-team adventure

Every product team is duct-taping its own LLM cache right now. Platform engineering exists to end exactly this kind of duplication.

engineeringMay 3, 2026· 3m

Designing the MCP server: a cache as a tool the model can hold

MCP turns Crowkis into something an AI assistant can use deliberately — check the cache, store the answer — over plain stdio, with the banner silenced so JSON-RPC stays clean.

economicsMay 2, 2026· 3m

The ROI timeline: hour one, week one, quarter one

Caching ROI isn't a hockey stick — it's a staircase that starts the first hour. Here's the honest schedule of when each saving shows up.

featuresMay 2, 2026· 3m

CKEYLIMIT: per-tenant rate limits that stop the runaway before it starts

A runaway agent or a noisy tenant can torch a budget in minutes. CKEYLIMIT sets per-tenant requests-per-minute and tokens-per-minute ceilings, enforced locally before the spend happens.

vs the fieldMay 1, 2026· 3m

Crowkis vs OpenAI prompt caching: a discount is not a cache

Provider prompt caching discounts your repeated prefixes. You still call the model, still wait, and still pay — just slightly less. There's a bigger idea available.

securityApr 30, 2026· 9m

Cache poisoning is the whole problem

Semantic caching has an obvious failure mode nobody likes to talk about: one bad write, served forever to everyone nearby. This is how Crowkis decides what to trust.

use casesApr 30, 2026· 3m

Consumer chat at scale: when every millisecond and every token multiply

At consumer scale, traffic converges on shared intents while costs and latency multiply by millions. The cache becomes load-bearing infrastructure.

engineeringApr 30, 2026· 3m

Three levels, one strategy: compaction without the tuning PhD

LSM compaction is where storage engines breed complexity. Crowkis ships exactly one strategy across three levels — chosen for cache workloads, closed for configuration.

featuresApr 29, 2026· 3m

CTHINK and CREUSE: banking a chain of thought and replaying it

The reasoning is the expensive part of a hard answer. CTHINK stores a chain-of-thought trace as a reusable step graph; CREUSE fetches the matching plan for a new query at a fraction of the original token cost.

vs the fieldApr 28, 2026· 3m

Crowkis vs Anthropic prompt caching: cache writes that bill you are telling you something

Anthropic's prompt caching is excellent at its actual job — cheap long contexts. It was never designed to be your response cache, and the pricing says so.

use casesApr 27, 2026· 3m

Voice assistants: caching as a conversational necessity

Voice gives you about a second before silence feels broken. Model round-trips don't fit. Cache hits do — with room to spare for the speech stack.

engineeringApr 27, 2026· 3m

Streaming cache hits: instant answers that still feel like typing

Users expect LLM answers to arrive as a typing stream. CGETSTREAM serves cached answers chunk by chunk, so a sub-millisecond hit doesn't break the interface's rhythm.

vs the fieldApr 25, 2026· 3m

Crowkis vs Gemini context caching: renting memory by the hour

Google bills cached context per token per hour — a parking meter for your own prompts. Compare that with a cache you simply own.

use casesApr 24, 2026· 3m

Translation pipelines: the same strings, the same languages, every release

Product copy, help docs, and templates get re-translated continuously as releases churn. Most of the content didn't change. Stop paying as if it did.

engineeringApr 24, 2026· 3m

347 tests and a murder weapon: how the suite is organized

Bottom-heavy by design: the layers that hold your data get the most hostile coverage, and the smoke suite's signature move is killing the process to prove a point.

vs the fieldApr 22, 2026· 3m

Crowkis vs vLLM prefix caching: different layers, different physics

vLLM's prefix caching saves GPU work inside one inference server. Crowkis saves the inference itself. You probably want both — but only one cuts the bill to zero on a hit.

use casesApr 21, 2026· 3m

Summarization at scale: the same documents keep getting summarized

Reports, tickets, calls, and articles get summarized on every view, by every viewer, in every digest. The document didn't change between viewers. The bill did.

vs the fieldApr 19, 2026· 3m

Crowkis vs LangSmith: tracing the waste vs deleting it

LangSmith shows you every span of every chain, beautifully. The spans are still billed. There's a component whose job is making the spans not happen.

use casesApr 18, 2026· 3m

Classification and extraction: high-volume, low-variance, born to be cached

Routing tickets, tagging content, extracting fields — LLM classification runs millions of small calls over heavily repeating inputs. The cache hit rate is absurd, in your favor.

vs the fieldApr 16, 2026· 3m

Crowkis vs Cloudflare AI Gateway: the edge is the wrong place for trust decisions

Cloudflare's gateway adds caching at the CDN layer — exact-match, eventually-evicted, on someone else's network. Useful plumbing; not a reuse brain.

use casesApr 15, 2026· 3m

Docs assistants: your documentation has a top-40 chart

Every docs site has the same hit parade — auth, rate limits, pagination, that one confusing endpoint. The assistant answering them should not bill like a consultant.

vs the fieldApr 13, 2026· 3m

Crowkis vs Kong AI Gateway: plugins are not engines

Kong added AI plugins to a great API gateway. A semantic-cache plugin in a proxy is a feature; a semantic cache engine is a product. The difference shows in production.

use casesApr 12, 2026· 3m

Answer-engine products: when the answer is the product, margin is the moat

If your product is answering questions, your COGS is the model bill and your UX is the latency. The cache moves both — which makes it strategy, not plumbing.

vs the fieldApr 10, 2026· 3m

Crowkis vs building it yourself: a love letter to the repo you'll abandon

Every team builds the in-house semantic cache once. The prototype takes a week. The production version takes the year you didn't budget. We know — we budgeted it.

vs the fieldApr 7, 2026· 3m

Crowkis vs Redis LangCache: when the incumbent validates the category

Redis shipping a semantic cache service confirms the problem is real. Their answer is a managed add-on; ours is a from-scratch engine. The difference is in the bones.

vs the fieldApr 4, 2026· 3m

Crowkis vs framework caches: your framework should not own your memory

LangChain, LlamaIndex, and Semantic Kernel all offer cache hooks. Framework caches live and die with the framework. Infrastructure shouldn't.

vs the fieldApr 1, 2026· 3m

Crowkis vs AWS Bedrock prompt caching: the cloud's cache serves the cloud

Bedrock's caching cuts repeated-prefix costs inside one cloud's model garden. Your cache strategy deserves a longer horizon than a vendor's feature page.

vs the fieldMar 29, 2026· 3m

Crowkis vs LangChain InMemoryCache: the default that quietly costs the most

One import gives you LangChain's in-memory exact cache. It's the caching equivalent of a sticky note — gone on restart, blind to paraphrase, local to one process.

vs the fieldMar 26, 2026· 3m

Crowkis vs Upstash: pay-per-request caching meets the request firehose

Serverless Redis with per-request pricing is elegant for occasional workloads. An LLM cache is the opposite of an occasional workload.

vs the fieldMar 23, 2026· 3m

Crowkis vs the dedup script: the cron job that thinks it's a cache

Somewhere in your repo is a script that hashes prompts and skips duplicates. It's doing its best. Here's everything it can't see.

vs the fieldMar 20, 2026· 3m

Crowkis vs Chroma: the prototype's best friend meets the production path

Chroma is wonderful for getting embeddings working before lunch. The qualities that make it great for prototypes are the ones a cache in production can't keep.

vs the fieldMar 17, 2026· 3m

Crowkis vs doing nothing: the most expensive cache is no cache

The default strategy — every query goes to the model — has a precise cost. It's on your invoice, itemized as everything.

vs the fieldMar 14, 2026· 3m

Crowkis vs fine-tuning your way to cheaper inference

Fine-tuning a smaller model is a months-long bet on cheaper tokens. Caching is a five-minute bet on zero tokens. One of these compounds weekly.

vs the fieldMar 11, 2026· 3m

Crowkis vs stuffing the context window: memory is not a prompt

Million-token contexts tempt teams to ship the whole knowledge base with every call. That's not memory — that's paying to re-read the library daily.

161 posts in the roost · crows remember faces. we remember production incidents.