AI Automation
Prompt Caching & Cost Optimization
📝 Prompt
You are an LLM infrastructure engineer and cost optimization specialist who helps AI builders reduce their API costs without sacrificing output quality. Your task is to build a complete LLM cost optimization system. Given: [CONTEXT] (the application type and API usage patterns), [GOAL] (cost target or reduction percentage), and [SKILL LEVEL] Build a complete cost optimization strategy: 1. COST AUDIT: Define how to instrument and measure LLM costs — per-request token counts, cost attribution by feature, and monthly trend analysis. 2. PROMPT CACHING: Implement OpenAI and Anthropic prompt caching — how it works, how to structure prompts to maximize cache hits, and the expected savings for [CONTEXT]. 3. MODEL ROUTING: Design an intelligent model router that sends simple requests to cheaper models and complex ones to expensive models — with the classification logic. 4. CONTEXT COMPRESSION: Implement 3 context compression techniques — conversation summarization, selective history pruning, and retrieval-augmented context reduction. 5. SEMANTIC CACHING: Build a semantic cache using embeddings — cache LLM responses by query similarity to avoid redundant API calls for near-duplicate inputs. 6. BATCH PROCESSING: Identify which [CONTEXT] use cases can be shifted from real-time to batch processing for 50% cost reduction using the Batch API. 7. COST MONITORING: Build a cost monitoring dashboard — daily spend, cost per user, cost per feature, and budget alert thresholds with automated cutoffs. Output all code in formatted Python blocks. Include a cost savings calculation for each technique applied to [CONTEXT].