Day 4: The Economics

LLM & Embedding Pricing Guide

A comprehensive comparison of API pricing across all major providers. Make informed decisions based on cost, context window, and capabilities.

Last updated: January 2026 • Prices in USD per million tokens

OpenAI Models

GPT-4.1, o3, o4-mini and reasoning models

Model	Input/1M	Output/1M	Cached	Context	Notes
GPT-4.1	$2.00	$8.00	$0.50	1M	Long-context, improved coding
GPT-4.1 Mini	$0.40	$1.60	$0.10	1M	Cost-efficient variant
GPT-4.1 Nano	$0.10	$0.40	$0.025	128K	Entry-level, fastest
o3	$2.00	$8.00	$0.50	200K	Reasoning model
o3-pro	$20.00	$80.00	-	200K	Deep reasoning
o4-mini	$1.10	$4.40	$0.275	200K	Efficient reasoning
GPT-4o	$5.00	$20.00	-	128K	Deprecated Feb 2026

Anthropic Claude Models

Opus, Sonnet, and Haiku tiers

Model	Input/1M	Output/1M	Context	Notes
Claude Opus 4.1	$15.00	$75.00	200K	Frontier reasoning
Claude Sonnet 4.5	$3.00	$15.00	200K (1M beta)	Best for coding, flagship
Claude Sonnet 4	$3.00	$15.00	200K	Balanced mid-tier
Claude Haiku 4.5	$1.00	$5.00	200K	4-5x faster than Sonnet
Claude Haiku 3.5	$0.80	$4.00	200K	Previous fast model
Claude Haiku 3	$0.25	$1.25	200K	Cheapest Claude

Claude Discounts: Batch API offers 50% discount. Prompt caching: writes at 1.25x, hits at only 0.1x base price.

Google Gemini Models

Pro, Flash, and embedding models with up to 2M context

Model	Input/1M	Output/1M	Context	Notes
Gemini 2.5 Pro	$1.25 (≤200K) / $2.50 (>200K)	$10.00 / $15.00	2M	Reasoning flagship
Gemini 2.5 Flash	$0.30	$2.50	1M	Hybrid reasoning
Gemini 2.5 Flash-Lite	$0.10	$0.40	1M	Most cost-efficient
Gemini 2.0 Flash	$0.10	$0.40	1M	Balanced multimodal

Google Batch: 50% discount on batch processing. Gemini 2.5 Flash deprecating June 2026.

DeepSeek Models

Ultra cost-effective with aggressive caching discounts

Model	Input/1M	Output/1M	Context	Notes
DeepSeek V3.2-Exp	$0.028 (hit) / $0.28 (miss)	$0.42	128K	50% cheaper than V3.1
deepseek-chat	$0.07 (hit) / $0.27 (miss)	$1.10	64K	Non-thinking mode
deepseek-reasoner	$0.14 (hit) / $0.55 (miss)	$2.19	64K	Thinking mode

💡 Best Value: DeepSeek V3.2 offers the lowest cost per token among all providers. Cache hits as low as $0.028/1M tokens!

xAI Grok Models

Grok 4 with 256K context and real-time capabilities

Model	Input/1M	Output/1M	Cached	Context	Notes
Grok 4	$3.00	$15.00	$0.75	256K	Latest flagship
Grok 4 Fast	$0.20	$0.50	-	256K	High-throughput
Grok 3	$3.00	$15.00	-	131K	Standard tier
Grok 3 Mini	$0.30	$0.50	-	131K	Lightweight

Grok Live Search: $25 per 1,000 sources ($0.025/source) for real-time web access.

Embedding Models

For RAG, semantic search, and vector databases

Model	Provider	Price/1M	Dimensions	Notes
text-embedding-3-small	OpenAI	$0.02	1536	Best cost-to-performance
text-embedding-3-large	OpenAI	$0.13	3072	Highest quality
text-embedding-ada-002	OpenAI	$0.10	1536	Legacy model
embed-v4	Cohere	$0.12	1024	Multimodal (text + image)
voyage-3-large	Voyage AI	$0.22	2048	Highest quality
voyage-3	Voyage AI	$0.06	1024	Balanced performance
voyage-3-lite	Voyage AI	$0.02	512	Budget option
Gemini Embedding	Google	$0.15	768	Native Gemini

🎯 Best Value: OpenAI text-embedding-3-small at $0.02/1M tokens offers excellent quality at the lowest price. Voyage AI offers first 200M tokens free!

Quick Comparison: Best for Your Use Case

Cheapest Overall

DeepSeek V3.2

$0.028 input

Best for Coding

Claude Sonnet 4.5

$3/$15

Largest Context

Gemini 2.5 Pro

2M tokens

Fastest Reasoning

o4-mini

$1.10/$4.40

Previous Day 3: Prompt Engineering

NextDay 5: Coming Soon...