LLM & Embedding Pricing Guide
A comprehensive comparison of API pricing across all major providers. Make informed decisions based on cost, context window, and capabilities.
Last updated: January 2026 • Prices in USD per million tokens
OpenAI Models
GPT-4.1, o3, o4-mini and reasoning models
| Model | Input/1M | Output/1M | Cached | Context | Notes |
|---|---|---|---|---|---|
| GPT-4.1 | $2.00 | $8.00 | $0.50 | 1M | Long-context, improved coding |
| GPT-4.1 Mini | $0.40 | $1.60 | $0.10 | 1M | Cost-efficient variant |
| GPT-4.1 Nano | $0.10 | $0.40 | $0.025 | 128K | Entry-level, fastest |
| o3 | $2.00 | $8.00 | $0.50 | 200K | Reasoning model |
| o3-pro | $20.00 | $80.00 | - | 200K | Deep reasoning |
| o4-mini | $1.10 | $4.40 | $0.275 | 200K | Efficient reasoning |
| GPT-4o | $5.00 | $20.00 | - | 128K | Deprecated Feb 2026 |
Anthropic Claude Models
Opus, Sonnet, and Haiku tiers
| Model | Input/1M | Output/1M | Context | Notes |
|---|---|---|---|---|
| Claude Opus 4.1 | $15.00 | $75.00 | 200K | Frontier reasoning |
| Claude Sonnet 4.5 | $3.00 | $15.00 | 200K (1M beta) | Best for coding, flagship |
| Claude Sonnet 4 | $3.00 | $15.00 | 200K | Balanced mid-tier |
| Claude Haiku 4.5 | $1.00 | $5.00 | 200K | 4-5x faster than Sonnet |
| Claude Haiku 3.5 | $0.80 | $4.00 | 200K | Previous fast model |
| Claude Haiku 3 | $0.25 | $1.25 | 200K | Cheapest Claude |
Claude Discounts: Batch API offers 50% discount. Prompt caching: writes at 1.25x, hits at only 0.1x base price.
Google Gemini Models
Pro, Flash, and embedding models with up to 2M context
| Model | Input/1M | Output/1M | Context | Notes |
|---|---|---|---|---|
| Gemini 2.5 Pro | $1.25 (≤200K) / $2.50 (>200K) | $10.00 / $15.00 | 2M | Reasoning flagship |
| Gemini 2.5 Flash | $0.30 | $2.50 | 1M | Hybrid reasoning |
| Gemini 2.5 Flash-Lite | $0.10 | $0.40 | 1M | Most cost-efficient |
| Gemini 2.0 Flash | $0.10 | $0.40 | 1M | Balanced multimodal |
Google Batch: 50% discount on batch processing. Gemini 2.5 Flash deprecating June 2026.
DeepSeek Models
Ultra cost-effective with aggressive caching discounts
| Model | Input/1M | Output/1M | Context | Notes |
|---|---|---|---|---|
| DeepSeek V3.2-Exp | $0.028 (hit) / $0.28 (miss) | $0.42 | 128K | 50% cheaper than V3.1 |
| deepseek-chat | $0.07 (hit) / $0.27 (miss) | $1.10 | 64K | Non-thinking mode |
| deepseek-reasoner | $0.14 (hit) / $0.55 (miss) | $2.19 | 64K | Thinking mode |
💡 Best Value: DeepSeek V3.2 offers the lowest cost per token among all providers. Cache hits as low as $0.028/1M tokens!
xAI Grok Models
Grok 4 with 256K context and real-time capabilities
| Model | Input/1M | Output/1M | Cached | Context | Notes |
|---|---|---|---|---|---|
| Grok 4 | $3.00 | $15.00 | $0.75 | 256K | Latest flagship |
| Grok 4 Fast | $0.20 | $0.50 | - | 256K | High-throughput |
| Grok 3 | $3.00 | $15.00 | - | 131K | Standard tier |
| Grok 3 Mini | $0.30 | $0.50 | - | 131K | Lightweight |
Grok Live Search: $25 per 1,000 sources ($0.025/source) for real-time web access.
Embedding Models
For RAG, semantic search, and vector databases
| Model | Provider | Price/1M | Dimensions | Notes |
|---|---|---|---|---|
| text-embedding-3-small | OpenAI | $0.02 | 1536 | Best cost-to-performance |
| text-embedding-3-large | OpenAI | $0.13 | 3072 | Highest quality |
| text-embedding-ada-002 | OpenAI | $0.10 | 1536 | Legacy model |
| embed-v4 | Cohere | $0.12 | 1024 | Multimodal (text + image) |
| voyage-3-large | Voyage AI | $0.22 | 2048 | Highest quality |
| voyage-3 | Voyage AI | $0.06 | 1024 | Balanced performance |
| voyage-3-lite | Voyage AI | $0.02 | 512 | Budget option |
| Gemini Embedding | $0.15 | 768 | Native Gemini |
🎯 Best Value: OpenAI text-embedding-3-small at $0.02/1M tokens offers excellent quality at the lowest price. Voyage AI offers first 200M tokens free!
Quick Comparison: Best for Your Use Case
Cheapest Overall
DeepSeek V3.2
$0.028 input
Best for Coding
Claude Sonnet 4.5
$3/$15
Largest Context
Gemini 2.5 Pro
2M tokens
Fastest Reasoning
o4-mini
$1.10/$4.40