STONE
Semantic Token Optimization and Natural Encoding
STONE sits between your application and the AI provider. It takes the full conversation history and restructures it into a bounded representation before the provider ever sees it. Not a cache. Not a summary. A fundamentally different way to manage context.
AI APIs charge for every word. Again.
Every time you send a message, the entire conversation goes with it. Turn 1, turn 5, turn 20 — resent in full. Your provider charges for every token, every time. By turn 10, you're paying for context the model has already processed nine times.
Longer sessions hit a ceiling. Context fills up, the model starts forgetting earlier decisions, and you're forced to start over. Everything you invested — explaining your codebase, establishing requirements, building up a shared understanding — vanishes when the session resets.
Restructure. Don't truncate.
STONE takes the raw conversation and restructures it into a compact, bounded form that keeps what matters — facts, decisions, intent, domain context — while removing redundancy. This restructured context replaces the raw history in the API call.
The AI's response passes through unmodified. STONE only touches the input side.
Standard API vs Lexi
The difference is structural. Standard APIs accumulate cost linearly. Lexi flattens the curve.
What sets STONE apart
The payload sent to your provider stays constant no matter how long the conversation runs. A 50-turn session costs roughly the same per request as a 5-turn session. Without STONE, context grows linearly. With STONE, it flatlines.
14.4 KB of memory per session — constant, regardless of conversation length. CPU per request is also constant. Lexi handles thousands of concurrent sessions on a single node without degradation.
STONE detects typed entities — currency values, dates, version strings, port numbers, metrics, SLA targets — and pins them into a permanent anchor. These facts survive every restructuring pass. In a 75-turn blind benchmark against GPT-4o, Lexi averaged 8.4/10 recall accuracy — the gap to full-context (9.0/10) is in response detail, not factual correctness.
If restructuring can't help, the original is sent unchanged. Quality never drops below baseline.
Every message is permanently archived. When you reference something from turn 3 at turn 70, STONE retrieves the original — not a summary — and extracts exactly the content the model needs to answer your specific question. Three layers of recall back each other up, and each retrieval makes the next one faster.
Live request benchmarks
75-turn multi-domain conversation on GPT-4o, March 2026. Blind-judged against direct OpenAI. Results vary by content and conversation pattern.
GPT-4o, 75-turn multi-domain conversation, March 2026. Total: 1.6M tokens reduced to 135K. Provider cost: $0.52 via Lexi vs $4.37 direct — 88% reduction.
Results from a 75-turn blind benchmark against GPT-4o. Your results may vary.
Where STONE makes the biggest difference
AI coding agents
Agents that plan, write, test, and iterate devour context. STONE keeps it bounded so they loop longer without hitting limits. Finish a full feature in one session instead of re-explaining your codebase across five.
Customer support bots
Long support threads pile up history. STONE retains the customer's issue, account details, and prior resolutions while keeping per-request cost flat. Scale support without scaling your AI bill linearly.
Research assistants
Deep research involves dozens of turns. STONE keeps key findings, sources, and conclusions pinned while restructuring the exploratory dialogue that produced them.
Multi-turn workflows
Tutoring, brainstorming, document drafting, data analysis — any workflow where conversations run long benefits from bounded context. The longer the session, the more STONE saves.
See it in your own code.
Two minutes to integrate.
$10 free credit. No card required. Every response shows the exact breakdown.