Back to Learning Hub
Day 4: The Economics

LLM & Embedding Pricing Guide

A comprehensive comparison of API pricing across all major providers. Make informed decisions based on cost, context window, and capabilities.

Last updated: January 2026 • Prices in USD per million tokens

OpenAI Models

GPT-4.1, o3, o4-mini and reasoning models

ModelInput/1MOutput/1MCachedContextNotes
GPT-4.1$2.00$8.00$0.501MLong-context, improved coding
GPT-4.1 Mini$0.40$1.60$0.101MCost-efficient variant
GPT-4.1 Nano$0.10$0.40$0.025128KEntry-level, fastest
o3$2.00$8.00$0.50200KReasoning model
o3-pro$20.00$80.00-200KDeep reasoning
o4-mini$1.10$4.40$0.275200KEfficient reasoning
GPT-4o$5.00$20.00-128KDeprecated Feb 2026

Anthropic Claude Models

Opus, Sonnet, and Haiku tiers

ModelInput/1MOutput/1MContextNotes
Claude Opus 4.1$15.00$75.00200KFrontier reasoning
Claude Sonnet 4.5$3.00$15.00200K (1M beta)Best for coding, flagship
Claude Sonnet 4$3.00$15.00200KBalanced mid-tier
Claude Haiku 4.5$1.00$5.00200K4-5x faster than Sonnet
Claude Haiku 3.5$0.80$4.00200KPrevious fast model
Claude Haiku 3$0.25$1.25200KCheapest Claude

Claude Discounts: Batch API offers 50% discount. Prompt caching: writes at 1.25x, hits at only 0.1x base price.

Google Gemini Models

Pro, Flash, and embedding models with up to 2M context

ModelInput/1MOutput/1MContextNotes
Gemini 2.5 Pro$1.25 (≤200K) / $2.50 (>200K)$10.00 / $15.002MReasoning flagship
Gemini 2.5 Flash$0.30$2.501MHybrid reasoning
Gemini 2.5 Flash-Lite$0.10$0.401MMost cost-efficient
Gemini 2.0 Flash$0.10$0.401MBalanced multimodal

Google Batch: 50% discount on batch processing. Gemini 2.5 Flash deprecating June 2026.

DeepSeek Models

Ultra cost-effective with aggressive caching discounts

ModelInput/1MOutput/1MContextNotes
DeepSeek V3.2-Exp$0.028 (hit) / $0.28 (miss)$0.42128K50% cheaper than V3.1
deepseek-chat$0.07 (hit) / $0.27 (miss)$1.1064KNon-thinking mode
deepseek-reasoner$0.14 (hit) / $0.55 (miss)$2.1964KThinking mode

💡 Best Value: DeepSeek V3.2 offers the lowest cost per token among all providers. Cache hits as low as $0.028/1M tokens!

xAI Grok Models

Grok 4 with 256K context and real-time capabilities

ModelInput/1MOutput/1MCachedContextNotes
Grok 4$3.00$15.00$0.75256KLatest flagship
Grok 4 Fast$0.20$0.50-256KHigh-throughput
Grok 3$3.00$15.00-131KStandard tier
Grok 3 Mini$0.30$0.50-131KLightweight

Grok Live Search: $25 per 1,000 sources ($0.025/source) for real-time web access.

Embedding Models

For RAG, semantic search, and vector databases

ModelProviderPrice/1MDimensionsNotes
text-embedding-3-smallOpenAI$0.021536Best cost-to-performance
text-embedding-3-largeOpenAI$0.133072Highest quality
text-embedding-ada-002OpenAI$0.101536Legacy model
embed-v4Cohere$0.121024Multimodal (text + image)
voyage-3-largeVoyage AI$0.222048Highest quality
voyage-3Voyage AI$0.061024Balanced performance
voyage-3-liteVoyage AI$0.02512Budget option
Gemini EmbeddingGoogle$0.15768Native Gemini

🎯 Best Value: OpenAI text-embedding-3-small at $0.02/1M tokens offers excellent quality at the lowest price. Voyage AI offers first 200M tokens free!

Quick Comparison: Best for Your Use Case

Cheapest Overall

DeepSeek V3.2

$0.028 input

Best for Coding

Claude Sonnet 4.5

$3/$15

Largest Context

Gemini 2.5 Pro

2M tokens

Fastest Reasoning

o4-mini

$1.10/$4.40

Previous Day 3: Prompt Engineering
NextDay 5: Coming Soon...
Chat with us on WhatsApp