Day 1: Comprehensive Guide

AI Fundamentals: The Terms You Need

A complete glossary of the most important concepts in Generative AI. From the basic building blocks to advanced production deployment.

Navigate Categories

Core LLM Building Blocks Advanced Architectures Training & Tuning Generation Controls Knowledge & Retrieval Efficiency & Scaling Data & Preprocessing Evaluation & Benchmarks Extensions Production & MLOps Safety & Limits

Core LLM Building Blocks

The fundamental components that make up large language models.

Transformer Architecture

Core design (encoder-only like BERT, decoder-only like GPT/Claude/Gemini etc [autoregressive], encoder-decoder like T5)

Tokenization

Break text into tokens and map them to numbers

Embedding Spaces

Map tokens into a semantic space where closer vectors mean similar meaning

Positional Encoding

Adds sequence order information to token embeddings

Attention

Highlights the most relevant tokens in context

Self-Attention

Each token attends to every other token for context

Cross-Attention

Connect encoder and decoder (in encoder-decoder models)

Multi-Head Attention

Several attention heads capture different patterns in parallel

Feed-Forward Networks

Nonlinear layers that transform representations between attention blocks

Residual Connections

Shortcut links that preserve signals and help gradient flow

Layer Normalization

Normalizes activations to stabilize and speed up training

Output Projection (LM Head)

Final linear layer mapping hidden states into logits

Logits

Raw prediction scores for each token before probabilities

Softmax

Turns logits into a probability distribution

Sampling from Probabilities

Chooses the next token based on probability weights

RoPE

Rotary Positional Encoding, adds relative position info for longer sequences

ALiBi / Relative Positional Encoding

Alternative to RoPE for long contexts

Linear / Performer Attention

Efficient attention variants for very long sequences

Grouped Query Attention (GQA)

Reduces memory use by sharing keys/values across heads

Multi-Head Latent Attention

Extends attention into hidden spaces for richer context

SwiGLU/GeLU Activations

Smooth nonlinear functions that improve expressiveness

RMSNorm

Root-mean-square normalization, a lighter alternative to LayerNorm

Advanced Architectures

Modern variations and specialized model types.

Diffusion Models

Generate images/video by learning to reverse noise process (DALL-E, Midjourney, Stable Diffusion)

VAEs (Variational Autoencoders)

Probabilistic generative models with latent spaces

GANs (Generative Adversarial Networks)

Generator vs discriminator training for realistic outputs

State Space Models (Mamba/SSMs)

Efficient alternatives to transformers for long sequences

Autoregressive Models

Generate sequences one token at a time (GPT family, Claude, Gemini etc)

Flow-Based Models

Invertible neural networks for exact likelihood estimation

Hybrid Architectures

Combine multiple approaches (transformer + CNN, etc.)

Training & Tuning

How models learn and adapt to specific tasks.

Pretraining

Build general world knowledge from large datasets

Mixed Precision Training

Speed up training with lower-precision arithmetic

Sharded / Distributed Training

Scale across multiple GPUs/nodes

Continual / Lifelong Learning

Update models without forgetting old knowledge

Self-Instruction / Self-Play

AI generates its own training examples

Fine-Tuning

Adapt the model for specific domains or tasks

Supervised Fine-Tuning (SFT)

Train on curated input-output pairs

LoRA

Parameter-efficient adapters for cheap fine-tuning

QLoRA

LoRA + quantization, enabling fine-tuning of huge models on modest hardware

PEFT

Family of methods (e.g., LoRA, QLoRA, adapters) updating only small parts of the model

Instruction Tuning

Teach models to follow natural language instructions

RLHF

Align model outputs with human preferences via feedback

Constitutional AI

AI-written principles for safer alignment

DPO / PPO / GRPO

Optimization algorithms for preference alignment

Distillation

Transfer knowledge from a large model into a smaller one

Gradient Descent & Backpropagation

Core optimization mechanics

Loss Functions

Cross-entropy loss, perplexity, etc. that guide learning

Learning Rate Scheduling

Adjust training speed for stability

Batch Size & Gradient Accumulation

Control efficiency and memory use

Scaling Laws

Predictable links between size, data, and performance

Mixture-of-Experts Fine-Tuning

Activate only subsets of parameters for efficiency

Preference Optimization (IPO, KTO, etc.)

Newer RLHF alternatives without full RL

Curriculum Learning

Strategic ordering of training data for better convergence

Meta-Learning

Learning to learn quickly from few examples

Transfer Learning

Leveraging pre-trained models for new tasks

Domain Adaptation

Adapting models across different domains

Generation Controls

Techniques to control how the model generates text.

Entropy

Measures uncertainty, higher entropy = more diverse but less predictable outputs

Temperature

Controls randomness; higher = creative, lower = precise

Top-k / Top-p

Sampling filters, Higher = safer, looser = more diverse

Repetition Penalty

Discourages loops; too strong may block valid words

Beam Search

Finds higher-probability outputs, safer but less creative

Stop Sequences

Force model to end at defined markers, too strict may cut off early

Context Window

Defines memory size, longer = more context, but costly

Chain-of-Thought

Stepwise reasoning, improves logic but can slow generation

Speculative Decoding

Drafts with a smaller model, faster but may need corrections

Contrastive Decoding

Balances fluency vs factuality

Medusa / Speculative Multi-Path Decoding

Multi-hypothesis generation for speed + accuracy

Knowledge & Retrieval

Connecting LLMs to the outside world and private data.

RAG

Combine LLMs with external knowledge sources for up-to-date answers

Vector Databases

Store embeddings and perform fast similarity search

In-Context Learning

Model adapts using examples provided in the prompt

Knowledge Graphs

Represent facts as structured entities and relationships

Semantic Search

Retrieve information based on meaning, not exact keywords

Few-Shot Learning

Learn from a small number of examples

Zero-Shot Learning

Generalize to new tasks without any examples

Hybrid Search

Combine semantic, keyword, and graph-based retrieval

Retrieval Augmentation via Loops

Iteratively refine queries to improve results

Reranker

Re-rank retrieved results to prioritize the most relevant content

Dense Passage Retrieval (DPR)

Embedding-based retrieval for large corpora

Sparse Retrieval / BM25

Keyword-based retrieval for efficiency and baseline

Feedback-Augmented Retrieval

Use human or model feedback to improve relevance

Chain-of-Retrieval

Multi-step retrieval pipelines for complex queries

Retriever-Generator Loops

Iterative retrieval + generation for difficult queries

Hybrid Dense-Sparse Retrieval

Combine semantic + keyword retrieval for better accuracy

Multi-hop Reasoning

Retrieve multiple connected facts to answer complex queries

Embedding Updates / Incremental Indexing

Keep retrieval database fresh efficiently

Efficiency & Scaling

Making models faster, smaller, and cheaper to run.

Quantization

Compress model size and memory usage with minimal accuracy loss

Sparse Models / MoE

Activate only relevant experts to save compute

Model Parallelism

Distribute large models across multiple devices

Data Parallelism

Process multiple batches simultaneously for faster training

Gradient Checkpointing

Save memory by recomputing intermediate activations on demand

KV Caching

Cache key-value pairs to accelerate inference

Latency vs. Cost Tradeoff

Larger models = slower & costlier, smaller = faster & lighter

Model Serving

Efficient deployment using batching and request queuing

Edge Deployment

Run models on local or mobile devices with limited resources

Mixed Precision Training

Use FP16/BF16 to speed up training and reduce memory

Sharded / Distributed Training

Split model parameters across devices for massive models

Pipeline Parallelism

Overlap sequential layer computation across devices for speed

Activation & Parameter Offloading

Move parts of model to CPU to save GPU memory

Sparse Attention

Compute attention only for relevant tokens, reducing cost

Memory-Mapped Checkpoints

Load large models efficiently without full RAM usage

Elastic / Dynamic Batching

Adjust batch size dynamically to optimize throughput

Inference Optimization

Operator fusion, kernel tuning, and caching for faster runtime

Quantization-Aware Training (QAT)

Fine-tune models with quantization to retain accuracy

Data & Preprocessing

Preparing the fuel (data) for the engine (model).

Data Cleaning & Filtering

Ensure high-quality, relevant, and consistent training data

Tokenizer Training

Build vocabularies using BPE, SentencePiece, or Unigram models

Data Deduplication

Remove repeated or near-duplicate examples to improve learning

Data Mixing & Curriculum Learning

Present data strategically for better convergence

Data Augmentation

Expand dataset with synthetic or modified examples

Synthetic Data Generation

AI-generated data to fill gaps or rare cases

Balanced Sampling

Ensure diverse representation across domains/classes

Data Quality Assessment

Measuring and ensuring training data meets standards

Active Learning

Strategically selecting the most informative data for labeling

Data Versioning

Track changes and maintain reproducibility in datasets

Evaluation & Benchmarks

Measuring how good the model actually is.

Perplexity

Measures how well a model predicts text (core LM metric)

BLEU / ROUGE / BERTScore

Compare generated text to reference quality

Benchmark Suites

Standardized tests like MMLU, HellaSwag, BIG-bench for model evaluation

Human Evaluation

Collect human judgments for accuracy, coherence, and safety

Factuality / Truthfulness Metrics

Specialized evaluation for hallucination-prone outputs

Consistency / Contradiction Metrics

Check if model outputs are logically consistent across queries

Bias & Fairness Metrics

Quantify demographic or cultural biases in outputs

Adversarial Robustness

Test resilience to malicious or tricky prompts

Knowledge Probing

Evaluate stored factual knowledge (e.g., LAMA, TruthfulQA)

Efficiency & Cost Metrics

Measure latency, throughput, memory, and compute requirements

Explainability / Interpretability Evaluation

Assess clarity and transparency of model reasoning

Extensions

Expanding capabilities beyond just text (Agents, Tools, etc.).

Multimodality

Combine text, images, audio, and video for richer understanding

Agents

LLMs that plan, reason, and take actions autonomously

Agentic AI

LLMs with autonomous decision-making, memory, and goal-oriented behaviour

Multi-Agent Systems

Teams of LLMs with specialized roles for complex tasks

Tool Use / Function Calling

LLMs interact with APIs, databases, and external tools

Prompt Engineering

Design inputs to guide models toward better outputs

Few-Shot & One-Shot Learning

Adapt from minimal examples for quick generalization

Auto-Prompting / Self-Instruction

Models generate their own prompts to improve learning

Chain of Thought (CoT)

Break problems into step-by-step reasoning

ReAct

Combine reasoning with tool use and actions

Self-Consistency

Compare multiple reasoning paths → pick best answer

Tree of Thoughts (ToT)

Explore many reasoning branches before deciding

Reflexion

Model critiques, learns, and refines its own answers

Memory

Agents recall past interactions for long-term context

Production & MLOps

Managing the lifecycle of models in the real world.

Model Versioning & Lineage

Track model evolution and dependencies

A/B Testing for Models

Compare model performance in production environments

Model Monitoring & Drift Detection

Ensure continued performance over time

Continuous Learning

Updating models with new data without full retraining

Model Registry

Centralized storage and management of model artifacts

Feature Stores

Manage and serve features for model inference

Model Serving Infrastructure

Scalable deployment and serving systems

Experiment Tracking

Log and compare training runs and hyperparameters

Data Pipelines

Automated data processing and validation workflows

Model Governance

Policies and processes for responsible model deployment

Rollback Strategies

Safe deployment and quick recovery from model issues

Performance Monitoring

Track model accuracy, latency, and resource usage

Safety & Limits

Ensuring models are safe, fair, and reliable.

Hallucination

Fluent but incorrect or nonsensical outputs

Alignment

Ensure model behaviour is safe, ethical, and policy-compliant

Guardrails

Rule-based or learned filters to prevent harmful outputs

Bias & Fairness

Detect and mitigate demographic or cultural prejudices

Privacy & Data Leakage

Protect sensitive information during training and inference

Adversarial Attacks

Malicious inputs designed to mislead or exploit the model

Interpretability

Understand how and why the model makes decisions

Calibration / Uncertainty Estimation

Avoid overconfident wrong predictions

Red-Teaming

Stress-test models for safety, robustness, and alignment

Robustness Evaluation

Test model resilience against noise, domain shifts, and edge cases

Mechanistic Interpretability

Understanding model internals and how features are computed

Watermarking

Methods to detect AI-generated content

Model Editing

Updating specific knowledge without full retraining

Constitutional AI

Training models to follow ethical principles and safety guidelines

Jailbreaking Prevention

Protecting against attempts to bypass safety measures

Whew! That was a lot.

Don't worry about memorizing everything. Use this page as a reference. Ready to start applying this knowledge?

Special thanks togenieincodebottlefor this comprehensive guide.