CGUARD: an input guardrail that survives leetspeak and zero-width tricks
Prompt injection rarely arrives in plain English. CGUARD normalizes the evasion first — whitespace, leetspeak, zero-width characters — then scans for jailbreaks, overrides, and system-prompt exfiltration.
The naive prompt-injection filter loses to a five-minute workaround: insert a zero-width space, swap an 'o' for a '0', pad with newlines, and the regex that caught 'ignore previous instructions' sails right past 'ignore prev1ous 1nstructions'. CGUARD assumes the attacker knows this, so it normalizes before it matches — collapsing whitespace, folding leetspeak, stripping zero-width characters — and only then runs the scan.
What it scans for is the real catalogue: DAN-style jailbreaks, developer-mode prompts, instruction overrides detected by verb-plus-noun co-occurrence rather than fixed strings, and attempts to exfiltrate the system prompt. The return is structured — a verdict, a category, and the matched span — so your app can log the category, show the user a clean refusal, and route the rest onward.
Five stages score every write before it can ever be served.
It runs as a command, CGUARD, which means it composes anywhere: in front of a cache write, in front of a model call, or as a standalone gate in a pipeline that isn't even using Crowkis for caching yet. No model call, no egress — the detection is deterministic and local, so it costs microseconds and leaks nothing.
The bottom line
A guardrail is only worth shipping if it assumes an adversary, not a typo. CGUARD is built for the person actively trying to break your agent, which is exactly the person a string-match filter was never going to stop.