cognitive-cache
ossAlgorithmic context-window selection for LLM coding tools. Treats context as a constrained optimization problem, not retrieval.
Every LLM tool right now (Cursor, Claude Code, Copilot, all of them) decides what to put in the context window using heuristics: grep for some symbols, embed and cosine-similarity search, or just cram as many files as will fit. Nobody has an actual algorithm for this.
This project is an attempt to build one. Runs entirely local with no LLM calls, API keys, or cloud dependencies, and supports Python, JavaScript, TypeScript, Go, Rust, Java, Ruby, C, and C++.
Think of it this way:
| Classic OS | LLM Equivalent | Current State |
|---|---|---|
| RAM | Context window | Manually managed |
| Virtual memory / page swaps | Context eviction + retrieval | Crude summarization |
| Process scheduler | Agent orchestration | Hand-coded loops |
| File system cache | Knowledge retrieval | Cosine similarity |
| Memory allocator | Token budget allocation | Nobody does this |
Early computers had programmers manually managing memory addresses. Then virtual memory was invented, a single algorithm, and it unlocked everything we know as modern computing.
The LLM ecosystem is at the “manual memory management” stage right now. Context is the single most important resource where reasoning happens, yet every tool manages it with heuristics instead of optimization.
Cognitive-cache is building the “virtual memory” for LLM reasoning.
Benchmark — 23 real bug-fix PRs
| approach | avg file recall |
|---|---|
| cognitive-cache | 34.7% |
| embedding RAG | 25.9% |
| llm-triage | 25.7% |
| grep | 22.8% |
Available as a Python library, CLI, MCP server (Claude Code / Cursor), and a GitHub Action. License: Apache-2.0.