Settings

Compression

Controls how aggressively prompts are compressed before forwarding to the LLM provider. Higher ratio = more compression = more savings but higher risk of quality loss.

Target Ratio

0.5

Fraction of tokens to KEEP. 0.5 = keep 50%, remove 50%. Lower value = more aggressive compression, more savings, but higher risk of losing important context.

Min Tokens Threshold

Minimum token count before a text block is considered for compression. Blocks shorter than this are passed through unchanged. Too low = compressing tiny texts wastes CPU. Too high = missing savings on medium texts.

Compression Enabled

Master switch. When OFF, naxxen acts as a pure passthrough proxy — no compression, no latency overhead. Useful for debugging or A/B testing.

Rate Limits — Free Tier

Limits for API keys with tier 'free'. Protects against abuse and controls costs for non-paying users.

Requests per Minute (RPM)

Maximum API requests per minute for free tier keys. Exceeding this returns HTTP 429. Sliding window counter in Redis.

Requests per Day (RPD)

Maximum API requests per day (rolling 24h window) for free tier keys. Hard daily cap to prevent cost runaway.

Rate Limits — Pro Tier

Limits for API keys with tier 'pro'. Higher limits for paying customers, but still needed to prevent single-user abuse.

Requests per Minute (RPM)

Maximum API requests per minute for pro tier keys. Set higher than free tier. 0 = unlimited.

Requests per Day (RPD)

Maximum API requests per day for pro tier keys. 0 = unlimited.

Access Control

Controls account signups and how many API keys each user can create.

Signups Enabled

When OFF, new users cannot register. Existing users can still sign in. Incoming emails are recorded on the waiting list.

Signup Disabled Message

Shown to users who try to sign up while signups are disabled.

Max API keys — Free tier

Maximum number of active API keys a free-tier user can create. 0 = unlimited.

Max API keys — Pro tier

Maximum number of active API keys a pro-tier user can create. 0 = unlimited.

Body Size Limits

Maximum allowed sizes for request and response bodies. Protects the API against OOM denial-of-service attacks via oversized payloads.

Max Request Body (bytes)

Maximum request body size accepted by the proxy. Default 52428800 (50 MB). Requests exceeding this limit receive HTTP 413.

Max Response Body (bytes)

Maximum provider response body size for non-streaming responses. Default 104857600 (100 MB). Responses exceeding this are truncated.

Data Retention & Export

Controls how long raw prompt data (training_data) stays in production and when it is exported for offline training. Legal basis: Purpose A (service delivery / QA) = Art. 6(1)(b) and (f); Purpose B (training) = Art. 6(1)(a) consent. See docs/privacy-and-training.md.

Raw Data Retention (days)

Days to keep raw training_data rows in production. After this cutoff the hourly TTL purger hard-deletes rows. Lower = more privacy, less debugging window. Industry norm is 30 days.

Export File Retention (days)

Days to keep exported tar.gz files on S3/MinIO before auto-deletion. Admin can always manually delete earlier after download.

Daily Export Cron

Master switch for the 02:00 UTC export job that moves consented training_data older than 24h into tar.gz files and deletes them from production.

PII Stripping Engine

Which engine removes personal data from exported training files. 'presidio' = Microsoft Presidio (best recall ~90%); 'regex' = naive regex + small NER (~70%); 'none' = DEBUG ONLY, never in prod.

Consent Policy Version

Current version of the privacy policy / consent text. Bumping the MAJOR or MINOR part triggers a re-consent banner for all users on next login. Patch bumps (third segment) are silent. Do not edit without a policy text change in git.

Cache

Controls Redis caching of compressed text. Caching prevents re-compressing the same content and is critical for Anthropic's prompt caching (cache_control markers).

Default Cache TTL (seconds)

How long compressed text stays in Redis cache. When the same text appears again within this window, the cached compressed version is used instead of running ONNX again. Saves CPU and ensures deterministic output.

Anthropic Cache TTL (seconds)

TTL for blocks with Anthropic cache_control markers. MUST be longer than Anthropic's server-side cache (300 sec / 5 min). If we compress differently within their cache window, the user pays cache_creation tokens (25% surcharge) instead of cache_read tokens (90% discount). Default 600 = 10 min, safely above Anthropic's 5 min.