Settings

Compression

Controls how aggressively prompts are compressed before forwarding to the LLM provider. Higher ratio = more compression = more savings but higher risk of quality loss.

0.5

Fraction of tokens to KEEP. 0.5 = keep 50%, remove 50%. Lower value = more aggressive compression, more savings, but higher risk of losing important context.

Minimum token count before a text block is considered for compression. Blocks shorter than this are passed through unchanged. Too low = compressing tiny texts wastes CPU. Too high = missing savings on medium texts.

Master switch. When OFF, naxxen acts as a pure passthrough proxy — no compression, no latency overhead. Useful for debugging or A/B testing.

Rate Limits — Free Tier

Limits for API keys with tier 'free'. Protects against abuse and controls costs for non-paying users.

Maximum API requests per minute for free tier keys. Exceeding this returns HTTP 429. Sliding window counter in Redis.

Maximum API requests per day (rolling 24h window) for free tier keys. Hard daily cap to prevent cost runaway.

Rate Limits — Pro Tier

Limits for API keys with tier 'pro'. Higher limits for paying customers, but still needed to prevent single-user abuse.

Maximum API requests per minute for pro tier keys. Set higher than free tier. 0 = unlimited.

Maximum API requests per day for pro tier keys. 0 = unlimited.

Access Control

Controls account signups and how many API keys each user can create.

When OFF, new users cannot register. Existing users can still sign in. Incoming emails are recorded on the waiting list.

Shown to users who try to sign up while signups are disabled.

Maximum number of active API keys a free-tier user can create. 0 = unlimited.

Maximum number of active API keys a pro-tier user can create. 0 = unlimited.

Body Size Limits

Maximum allowed sizes for request and response bodies. Protects the API against OOM denial-of-service attacks via oversized payloads.

Maximum request body size accepted by the proxy. Default 52428800 (50 MB). Requests exceeding this limit receive HTTP 413.

Maximum provider response body size for non-streaming responses. Default 104857600 (100 MB). Responses exceeding this are truncated.

Data Retention & Export

Controls how long raw prompt data (training_data) stays in production and when it is exported for offline training. Legal basis: Purpose A (service delivery / QA) = Art. 6(1)(b) and (f); Purpose B (training) = Art. 6(1)(a) consent. See docs/privacy-and-training.md.

Days to keep raw training_data rows in production. After this cutoff the hourly TTL purger hard-deletes rows. Lower = more privacy, less debugging window. Industry norm is 30 days.

Days to keep exported tar.gz files on S3/MinIO before auto-deletion. Admin can always manually delete earlier after download.

Master switch for the 02:00 UTC export job that moves consented training_data older than 24h into tar.gz files and deletes them from production.

Which engine removes personal data from exported training files. 'presidio' = Microsoft Presidio (best recall ~90%); 'regex' = naive regex + small NER (~70%); 'none' = DEBUG ONLY, never in prod.

Current version of the privacy policy / consent text. Bumping the MAJOR or MINOR part triggers a re-consent banner for all users on next login. Patch bumps (third segment) are silent. Do not edit without a policy text change in git.

Cache

Controls Redis caching of compressed text. Caching prevents re-compressing the same content and is critical for Anthropic's prompt caching (cache_control markers).

How long compressed text stays in Redis cache. When the same text appears again within this window, the cached compressed version is used instead of running ONNX again. Saves CPU and ensures deterministic output.

TTL for blocks with Anthropic cache_control markers. MUST be longer than Anthropic's server-side cache (300 sec / 5 min). If we compress differently within their cache window, the user pays cache_creation tokens (25% surcharge) instead of cache_read tokens (90% discount). Default 600 = 10 min, safely above Anthropic's 5 min.