AI Fundamentals: The Terms You Need
A complete glossary of the most important concepts in Generative AI. From the basic building blocks to advanced production deployment.
Navigate Categories
Core LLM Building Blocks
The fundamental components that make up large language models.
Transformer Architecture
Core design (encoder-only like BERT, decoder-only like GPT/Claude/Gemini etc [autoregressive], encoder-decoder like T5)
Tokenization
Break text into tokens and map them to numbers
Embedding Spaces
Map tokens into a semantic space where closer vectors mean similar meaning
Positional Encoding
Adds sequence order information to token embeddings
Attention
Highlights the most relevant tokens in context
Self-Attention
Each token attends to every other token for context
Cross-Attention
Connect encoder and decoder (in encoder-decoder models)
Multi-Head Attention
Several attention heads capture different patterns in parallel
Feed-Forward Networks
Nonlinear layers that transform representations between attention blocks
Residual Connections
Shortcut links that preserve signals and help gradient flow
Layer Normalization
Normalizes activations to stabilize and speed up training
Output Projection (LM Head)
Final linear layer mapping hidden states into logits
Logits
Raw prediction scores for each token before probabilities
Softmax
Turns logits into a probability distribution
Sampling from Probabilities
Chooses the next token based on probability weights
RoPE
Rotary Positional Encoding, adds relative position info for longer sequences
ALiBi / Relative Positional Encoding
Alternative to RoPE for long contexts
Linear / Performer Attention
Efficient attention variants for very long sequences
Grouped Query Attention (GQA)
Reduces memory use by sharing keys/values across heads
Multi-Head Latent Attention
Extends attention into hidden spaces for richer context
SwiGLU/GeLU Activations
Smooth nonlinear functions that improve expressiveness
RMSNorm
Root-mean-square normalization, a lighter alternative to LayerNorm
Advanced Architectures
Modern variations and specialized model types.
Diffusion Models
Generate images/video by learning to reverse noise process (DALL-E, Midjourney, Stable Diffusion)
VAEs (Variational Autoencoders)
Probabilistic generative models with latent spaces
GANs (Generative Adversarial Networks)
Generator vs discriminator training for realistic outputs
State Space Models (Mamba/SSMs)
Efficient alternatives to transformers for long sequences
Autoregressive Models
Generate sequences one token at a time (GPT family, Claude, Gemini etc)
Flow-Based Models
Invertible neural networks for exact likelihood estimation
Hybrid Architectures
Combine multiple approaches (transformer + CNN, etc.)
Training & Tuning
How models learn and adapt to specific tasks.
Pretraining
Build general world knowledge from large datasets
Mixed Precision Training
Speed up training with lower-precision arithmetic
Sharded / Distributed Training
Scale across multiple GPUs/nodes
Continual / Lifelong Learning
Update models without forgetting old knowledge
Self-Instruction / Self-Play
AI generates its own training examples
Fine-Tuning
Adapt the model for specific domains or tasks
Supervised Fine-Tuning (SFT)
Train on curated input-output pairs
LoRA
Parameter-efficient adapters for cheap fine-tuning
QLoRA
LoRA + quantization, enabling fine-tuning of huge models on modest hardware
PEFT
Family of methods (e.g., LoRA, QLoRA, adapters) updating only small parts of the model
Instruction Tuning
Teach models to follow natural language instructions
RLHF
Align model outputs with human preferences via feedback
Constitutional AI
AI-written principles for safer alignment
DPO / PPO / GRPO
Optimization algorithms for preference alignment
Distillation
Transfer knowledge from a large model into a smaller one
Gradient Descent & Backpropagation
Core optimization mechanics
Loss Functions
Cross-entropy loss, perplexity, etc. that guide learning
Learning Rate Scheduling
Adjust training speed for stability
Batch Size & Gradient Accumulation
Control efficiency and memory use
Scaling Laws
Predictable links between size, data, and performance
Mixture-of-Experts Fine-Tuning
Activate only subsets of parameters for efficiency
Preference Optimization (IPO, KTO, etc.)
Newer RLHF alternatives without full RL
Curriculum Learning
Strategic ordering of training data for better convergence
Meta-Learning
Learning to learn quickly from few examples
Transfer Learning
Leveraging pre-trained models for new tasks
Domain Adaptation
Adapting models across different domains
Generation Controls
Techniques to control how the model generates text.
Entropy
Measures uncertainty, higher entropy = more diverse but less predictable outputs
Temperature
Controls randomness; higher = creative, lower = precise
Top-k / Top-p
Sampling filters, Higher = safer, looser = more diverse
Repetition Penalty
Discourages loops; too strong may block valid words
Beam Search
Finds higher-probability outputs, safer but less creative
Stop Sequences
Force model to end at defined markers, too strict may cut off early
Context Window
Defines memory size, longer = more context, but costly
Chain-of-Thought
Stepwise reasoning, improves logic but can slow generation
Speculative Decoding
Drafts with a smaller model, faster but may need corrections
Contrastive Decoding
Balances fluency vs factuality
Medusa / Speculative Multi-Path Decoding
Multi-hypothesis generation for speed + accuracy
Knowledge & Retrieval
Connecting LLMs to the outside world and private data.
RAG
Combine LLMs with external knowledge sources for up-to-date answers
Vector Databases
Store embeddings and perform fast similarity search
In-Context Learning
Model adapts using examples provided in the prompt
Knowledge Graphs
Represent facts as structured entities and relationships
Semantic Search
Retrieve information based on meaning, not exact keywords
Few-Shot Learning
Learn from a small number of examples
Zero-Shot Learning
Generalize to new tasks without any examples
Hybrid Search
Combine semantic, keyword, and graph-based retrieval
Retrieval Augmentation via Loops
Iteratively refine queries to improve results
Reranker
Re-rank retrieved results to prioritize the most relevant content
Dense Passage Retrieval (DPR)
Embedding-based retrieval for large corpora
Sparse Retrieval / BM25
Keyword-based retrieval for efficiency and baseline
Feedback-Augmented Retrieval
Use human or model feedback to improve relevance
Chain-of-Retrieval
Multi-step retrieval pipelines for complex queries
Retriever-Generator Loops
Iterative retrieval + generation for difficult queries
Hybrid Dense-Sparse Retrieval
Combine semantic + keyword retrieval for better accuracy
Multi-hop Reasoning
Retrieve multiple connected facts to answer complex queries
Embedding Updates / Incremental Indexing
Keep retrieval database fresh efficiently
Efficiency & Scaling
Making models faster, smaller, and cheaper to run.
Quantization
Compress model size and memory usage with minimal accuracy loss
Sparse Models / MoE
Activate only relevant experts to save compute
Model Parallelism
Distribute large models across multiple devices
Data Parallelism
Process multiple batches simultaneously for faster training
Gradient Checkpointing
Save memory by recomputing intermediate activations on demand
KV Caching
Cache key-value pairs to accelerate inference
Latency vs. Cost Tradeoff
Larger models = slower & costlier, smaller = faster & lighter
Model Serving
Efficient deployment using batching and request queuing
Edge Deployment
Run models on local or mobile devices with limited resources
Mixed Precision Training
Use FP16/BF16 to speed up training and reduce memory
Sharded / Distributed Training
Split model parameters across devices for massive models
Pipeline Parallelism
Overlap sequential layer computation across devices for speed
Activation & Parameter Offloading
Move parts of model to CPU to save GPU memory
Sparse Attention
Compute attention only for relevant tokens, reducing cost
Memory-Mapped Checkpoints
Load large models efficiently without full RAM usage
Elastic / Dynamic Batching
Adjust batch size dynamically to optimize throughput
Inference Optimization
Operator fusion, kernel tuning, and caching for faster runtime
Quantization-Aware Training (QAT)
Fine-tune models with quantization to retain accuracy
Data & Preprocessing
Preparing the fuel (data) for the engine (model).
Data Cleaning & Filtering
Ensure high-quality, relevant, and consistent training data
Tokenizer Training
Build vocabularies using BPE, SentencePiece, or Unigram models
Data Deduplication
Remove repeated or near-duplicate examples to improve learning
Data Mixing & Curriculum Learning
Present data strategically for better convergence
Data Augmentation
Expand dataset with synthetic or modified examples
Synthetic Data Generation
AI-generated data to fill gaps or rare cases
Balanced Sampling
Ensure diverse representation across domains/classes
Data Quality Assessment
Measuring and ensuring training data meets standards
Active Learning
Strategically selecting the most informative data for labeling
Data Versioning
Track changes and maintain reproducibility in datasets
Evaluation & Benchmarks
Measuring how good the model actually is.
Perplexity
Measures how well a model predicts text (core LM metric)
BLEU / ROUGE / BERTScore
Compare generated text to reference quality
Benchmark Suites
Standardized tests like MMLU, HellaSwag, BIG-bench for model evaluation
Human Evaluation
Collect human judgments for accuracy, coherence, and safety
Factuality / Truthfulness Metrics
Specialized evaluation for hallucination-prone outputs
Consistency / Contradiction Metrics
Check if model outputs are logically consistent across queries
Bias & Fairness Metrics
Quantify demographic or cultural biases in outputs
Adversarial Robustness
Test resilience to malicious or tricky prompts
Knowledge Probing
Evaluate stored factual knowledge (e.g., LAMA, TruthfulQA)
Efficiency & Cost Metrics
Measure latency, throughput, memory, and compute requirements
Explainability / Interpretability Evaluation
Assess clarity and transparency of model reasoning
Extensions
Expanding capabilities beyond just text (Agents, Tools, etc.).
Multimodality
Combine text, images, audio, and video for richer understanding
Agents
LLMs that plan, reason, and take actions autonomously
Agentic AI
LLMs with autonomous decision-making, memory, and goal-oriented behaviour
Multi-Agent Systems
Teams of LLMs with specialized roles for complex tasks
Tool Use / Function Calling
LLMs interact with APIs, databases, and external tools
Prompt Engineering
Design inputs to guide models toward better outputs
Few-Shot & One-Shot Learning
Adapt from minimal examples for quick generalization
Auto-Prompting / Self-Instruction
Models generate their own prompts to improve learning
Chain of Thought (CoT)
Break problems into step-by-step reasoning
ReAct
Combine reasoning with tool use and actions
Self-Consistency
Compare multiple reasoning paths → pick best answer
Tree of Thoughts (ToT)
Explore many reasoning branches before deciding
Reflexion
Model critiques, learns, and refines its own answers
Memory
Agents recall past interactions for long-term context
Production & MLOps
Managing the lifecycle of models in the real world.
Model Versioning & Lineage
Track model evolution and dependencies
A/B Testing for Models
Compare model performance in production environments
Model Monitoring & Drift Detection
Ensure continued performance over time
Continuous Learning
Updating models with new data without full retraining
Model Registry
Centralized storage and management of model artifacts
Feature Stores
Manage and serve features for model inference
Model Serving Infrastructure
Scalable deployment and serving systems
Experiment Tracking
Log and compare training runs and hyperparameters
Data Pipelines
Automated data processing and validation workflows
Model Governance
Policies and processes for responsible model deployment
Rollback Strategies
Safe deployment and quick recovery from model issues
Performance Monitoring
Track model accuracy, latency, and resource usage
Safety & Limits
Ensuring models are safe, fair, and reliable.
Hallucination
Fluent but incorrect or nonsensical outputs
Alignment
Ensure model behaviour is safe, ethical, and policy-compliant
Guardrails
Rule-based or learned filters to prevent harmful outputs
Bias & Fairness
Detect and mitigate demographic or cultural prejudices
Privacy & Data Leakage
Protect sensitive information during training and inference
Adversarial Attacks
Malicious inputs designed to mislead or exploit the model
Interpretability
Understand how and why the model makes decisions
Calibration / Uncertainty Estimation
Avoid overconfident wrong predictions
Red-Teaming
Stress-test models for safety, robustness, and alignment
Robustness Evaluation
Test model resilience against noise, domain shifts, and edge cases
Mechanistic Interpretability
Understanding model internals and how features are computed
Watermarking
Methods to detect AI-generated content
Model Editing
Updating specific knowledge without full retraining
Constitutional AI
Training models to follow ethical principles and safety guidelines
Jailbreaking Prevention
Protecting against attempts to bypass safety measures
Whew! That was a lot.
Don't worry about memorizing everything. Use this page as a reference. Ready to start applying this knowledge?