One signed binary. Every feature compiled in. Free to run. Install Crowkis →
← back to the Roost
featuresJune 13, 2026· 3 min read

CDOC: a self-hosted RAG store that chunks, filters, and reranks

You don't always need a separate vector database to do retrieval. CDOC is a mini RAG store inside Crowkis — auto-chunking, metadata filtering, and optional cross-encoder reranking — sharing the cache's embedder.

Retrieval-augmented generation usually means standing up a whole second system: a vector database, an embedding pipeline, a chunker, a reranker. For a working set that fits a cache, that's a lot of moving parts. CDOC folds the common case into Crowkis itself — a self-hosted mini vector store that speaks the same RESP you already use.

CDOC ADD takes an id and text and, with CHUNK and OVERLAP, splits a long document into overlapping passages automatically, attaching whatever META key=value pairs you pass. CDOC SEARCH runs an approximate-nearest-neighbour query with field-level FILTER predicates and an optional RERANK pass, returning [id, text, score] triples — the shape every RAG pipeline expects.

the crowkis read path — five gates, every one can veto

Reuse only when meaning, structure, confidence, and trust all agree.

The quiet win is that CDOC shares the cache's bundled ONNX embedder and, for reranking, its cross-encoder — the same models proven on the memory benchmarks. So retrieval inherits the same zero-egress property: your documents are embedded and searched locally, which matters when 'the documents' are contracts, tickets, or anything you can't ship to a hosted API.

The bottom line

CDOC isn't trying to be Pinecone. It's trying to delete Pinecone from the architecture diagram for the many apps whose corpus is small enough to live beside the cache — and for those apps, one fewer system is the whole feature.