CEMBED: free local embeddings, cached, with no API key
Embeddings usually mean an API key and a per-token bill. CEMBED turns text into vectors using the bundled ONNX model, locally, for free — and caches repeats so the second call is instant.
Embedding text is table-stakes infrastructure that quietly costs money and leaks data: most teams call a hosted embeddings API, paying per token and shipping their text to a third party. CEMBED removes both costs — it embeds text with the bundled all-MiniLM-L6-v2 ONNX model, in-process, with no API key and no egress.
It also remembers. Repeated text — and production traffic is full of repeats — is served from an embedding micro-cache, so the second CEMBED of the same string is effectively free and instant. That's the same micro-cache that turns exact-repeat semantic lookups from milliseconds into microseconds, exposed as a primitive you can call directly.
Reuse only when meaning, structure, confidence, and trust all agree.
Having a free local embedder as a command, not just an internal detail, is more useful than it sounds. It means the embedding behind your retrieval, your clustering, your dedup, or your own semantic feature comes from the same model the cache uses — consistent vectors, one dependency, no bill — and you can build on it without standing up an embedding service.
The bottom line
The cheapest embedding is the one you already computed; the second cheapest is the one that never left your machine. CEMBED is both, which is why it's the quiet foundation the rest of the intelligence layer stands on.