Billing model

Transparent, per-request pricing. We take 40% of your savings. You keep 60%. STONE saves 0% → you pay 0.

Pricing formula (H.1)

We charge list-price for what actually reaches the provider, plus 40% of the gross savings STONE creates. The customer always nets 60% of the savings.

provider_cost      = actual_input_tokens × P_in + output_tokens × P_out
gross_savings      = max(0, (original_tokens − actual_input_tokens)) × P_in
customer_savings   = gross_savings × 0.60
lexico_profit      = gross_savings × 0.40
customer_charge    = provider_cost + lexico_profit   (credit customer)

// P_in, P_out are list prices per 1M tokens for the selected model.

Turn 1 is always passive

The first request in a new conversation (turnNumber = 1) has no accumulated context to restructure. STONE forwards your payload unchanged and the bomb-proof first-turn guard forces gross_savings = 0, lexico_profit = 0. You pay provider_cost only.

verified
X-Lexi-Mode: first-turn
Turn 1: passive — you pay provider_cost only.

Passive mode

When STONE cannot reduce tokens (e.g. a short single-turn prompt, or a payload that grew after restructuring), gross_savings is clamped to 0 and no Lexi fee applies. The response carries X-Lexi-Mode: passive.

Provider errors

warning
5xx
Provider 5xx — request is not billed. No provider_cost, no Lexi fee. The upstream status is relayed to your client.
block
4xx
Provider 4xx (auth, invalid model, over-quota) — no LexiCo fee. You are responsible only for the provider cost the provider billed you for.

Streaming abort

If your client disconnects mid-stream, billing runs on the tokens actually delivered. The invariant customer_charge ≥ provider_cost always holds. You are never undercharged for what the provider produced, never overcharged for what you didn't receive.

invariant: customer_charge ≥ provider_cost   (always)

Refunds

EU/EEA consumers have 14 days withdrawal right from signup. Manage refund requests at lexico.no/refund. Usage consumed before the refund request is non-refundable per EU Directive 2011/83/EU Art. 16(m).

Rate limits per tier

Limits auto-scale with lifetime spend. No tiers to pick.

Lifetime spend Requests / minute Max API keys
Free 120 10,000
$5+ 2,000 10,000
$50+ 10,000 10,000
$500+ 30,000 10,000
$1000+ 60,000 10,000

Response headers (16)

All 16 canonical X-Lexi headers are exposed on every /v1/chat/completions and /v1/messages response. Full type and unit reference in the headers section of /docs.

X-Lexi-Original-TokensPre-STONE input tokens.
X-Lexi-Compressed-TokensTokens sent to provider.
X-Lexi-Tokens-Savedoriginal − compressed.
X-Lexi-Compression-Ratiosaved / original (0–1, 4 dp).
X-Lexi-Provider-Cost-UsdList-price USD for tokens actually sent.
X-Lexi-Gross-Savings-Usdtokens_saved × input_price.
X-Lexi-Customer-Savings-Usdgross × 0.60 (your share).
X-Lexi-Customer-Charge-Usdprovider_cost + lexico_profit.
X-Lexi-Lexico-Profit-Usdgross × 0.40 (our share).
X-Lexi-Customer-Typecredit | api.
X-Lexi-Balance-AfterCredit balance after this charge (USD).
X-Lexi-Memory-UsageTokens in session memory.
X-Lexi-Recall-CountRecalled memory count.
X-Lexi-Modefirst-turn | passive | normal | passthrough.
X-Lexi-First-TurnOnly when turn 1: true.
X-Lexi-Request-IdCorrelation UUID — matches X-Request-Id.
An unhandled error has occurred. Reload X