Skip to content

cost-limit โ€” Token Budget Enforcement โ€‹

Severity: CRITICAL ยท Auto-fix: No ยท Category: ๐Ÿ’ฐ Cost

What It Does โ€‹

Fires at CRITICAL level when the prompt exceeds the configured token_limit. Use this to catch accidentally bloated prompts before they hit API rate limits or blow your context window.

Example โ€‹

Prompt: 850 tokens, token_limit: 800

[ CRITICAL ] cost-limit (line -)
  Prompt exceeds token limit: 850/800 tokens.
  Consider removing redundant content or splitting into smaller prompts.

Configuration โ€‹

yaml
token_limit: 800  # Default

rules:
  cost_limit: true

Choosing a Limit โ€‹

Set token_limit to 60โ€“70% of your model's context window to leave room for the completion:

ModelContext windowRecommended token_limit
GPT-4o128k500โ€“2000 (system prompt budget)
GPT-48k500โ€“1000
GPT-3.5 Turbo16k500โ€“1500
Claude 3.5 Sonnet200k500โ€“5000
Gemini 1.5 Pro1M500โ€“10000

Disabling โ€‹

yaml
rules:
  cost_limit: false

Or raise the limit:

yaml
token_limit: 2000

Released under the Apache 2.0 License.