STONE

Semantic Token Optimization and Natural Encoding

STONE sits between your application and the AI provider. It takes the full conversation history and restructures it into a bounded representation before the provider ever sees it. Not a cache. Not a summary. A fundamentally different way to manage context.

Raw conversation (turn 50)
21,400 tokens
arrow_downward STONE
After STONE
1,800 tokens
trending_down 91.6% fewer tokens sent

AI APIs charge for every word. Again.

Every time you send a message, the entire conversation goes with it. Turn 1, turn 5, turn 20 — resent in full. Your provider charges for every token, every time. By turn 10, you're paying for context the model has already processed nine times.

Longer sessions hit a ceiling. Context fills up, the model starts forgetting earlier decisions, and you're forced to start over. Everything you invested — explaining your codebase, establishing requirements, building up a shared understanding — vanishes when the session resets.

Restructure. Don't truncate.

STONE takes the raw conversation and restructures it into a compact, bounded form that keeps what matters — facts, decisions, intent, domain context — while removing redundancy. This restructured context replaces the raw history in the API call.

The AI's response passes through unmodified. STONE only touches the input side.

Turn 1 Passive. Your first message goes through unchanged. Zero overhead, zero cost. STONE observes but doesn't intervene.
Turn 2 STONE activates. The conversation is restructured into a bounded form. In our GPT-4o benchmark, reduction starts at 25-35% and climbs rapidly.
Turn 10+ The raw history keeps growing, but what's sent stays bounded. 68% at turn 10, 91% at turn 35, 93% at turn 75. Cost flatlines.

Standard API vs Lexi

The difference is structural. Standard APIs accumulate cost linearly. Lexi flattens the curve.

Standard approach
× Token count grows linearly with each turn
× Every request costs more than the last
× Context limit forces session reset
× Earlier context silently falls away
With STONE
Tokens stay bounded regardless of session length
Per-request cost flatlines after turn 2
Sessions run as long as the work requires
Facts and decisions survive via fact pinning

What sets STONE apart

all_inclusive
Bounded context

The payload sent to your provider stays constant no matter how long the conversation runs. A 50-turn session costs roughly the same per request as a 5-turn session. Without STONE, context grows linearly. With STONE, it flatlines.

memory
O(1) resource usage

14.4 KB of memory per session — constant, regardless of conversation length. CPU per request is also constant. Lexi handles thousands of concurrent sessions on a single node without degradation.

push_pin
Fact pinning

STONE detects typed entities — currency values, dates, version strings, port numbers, metrics, SLA targets — and pins them into a permanent anchor. These facts survive every restructuring pass. In a 75-turn blind benchmark against GPT-4o, Lexi averaged 8.4/10 recall accuracy — the gap to full-context (9.0/10) is in response detail, not factual correctness.

verified
Zero-negative guarantee

If restructuring can't help, the original is sent unchanged. Quality never drops below baseline.

manage_search
Cold recall

Every message is permanently archived. When you reference something from turn 3 at turn 70, STONE retrieves the original — not a summary — and extracts exactly the content the model needs to answer your specific question. Three layers of recall back each other up, and each retrieval makes the next one faster.

Live request benchmarks

75-turn multi-domain conversation on GPT-4o, March 2026. Blind-judged against direct OpenAI. Results vary by content and conversation pattern.

Turn Direct Via Lexi Reduction
1 186 186 0%
5 2,469 1,519 37%
10 5,946 1,628 68%
35 18,028 1,837 91%
75 21,440 1,495 93%

GPT-4o, 75-turn multi-domain conversation, March 2026. Total: 1.6M tokens reduced to 135K. Provider cost: $0.52 via Lexi vs $4.37 direct — 88% reduction.

Results from a 75-turn blind benchmark against GPT-4o. Your results may vary.

14.4 KB Resource usage
88% Cost reduction
91.6% Token reduction

Where STONE makes the biggest difference

code

AI coding agents

Agents that plan, write, test, and iterate devour context. STONE keeps it bounded so they loop longer without hitting limits. Finish a full feature in one session instead of re-explaining your codebase across five.

support_agent

Customer support bots

Long support threads pile up history. STONE retains the customer's issue, account details, and prior resolutions while keeping per-request cost flat. Scale support without scaling your AI bill linearly.

biotech

Research assistants

Deep research involves dozens of turns. STONE keeps key findings, sources, and conclusions pinned while restructuring the exploratory dialogue that produced them.

sync_alt

Multi-turn workflows

Tutoring, brainstorming, document drafting, data analysis — any workflow where conversations run long benefits from bounded context. The longer the session, the more STONE saves.

Get started free

See it in your own code.
Two minutes to integrate.

$10 free credit. No card required. Every response shows the exact breakdown.

redeem $10 Free credit on signup
credit_card_off $0 No card required
model_training 33 Models supported
shield 0 Facts lost
An unhandled error has occurred. Reload X