Advanced Workflow

WanVideo SCAIL Pose Control

A complete resource list for setting up the WanVideo SCAIL workflow. Ensure you have all dependencies installed for proper Pose-to-Video generation.

Required Resources & Explanations

Repository

ComfyUI-WanAnimatePreprocess

Essential for data preparation. Handles face cropping, masking, and initial pose detection before passing data to the animation model.

Open Link

Repository

ComfyUI-SCAIL-pose

The core nodes for SCAIL. Features advanced 3D pose extraction and multi-character support for high-fidelity animation control.

Open Link

Repository

ComfyUI-WanVideoWrapper

The main sandbox wrapper that allows Wan2.1 models to run inside ComfyUI. Required to load the diffusion models.

Open Link

Repository

SCAIL (Official)

The official research repository. Useful for understanding the underlying technology of Spatially Consistent Animation.

Open Link

Model

YOLO v10m (ONNX)

Object detection model used to find people/bodies in the frame before pose estimation runs.

Open Link

Model

ViTPose-L Wholebody (ONNX)

Large Vision Transformer for pose estimation. A solid balance of accuracy and speed for body tracking.

Open Link

Model

ViTPose-H Model (ONNX)

Huge Vision Transformer model. Provides the highest accuracy for whole-body pose detection but requires more compute.

Open Link

Config

ViTPose-H Data (.bin)

Required weights file for ViTPose-H. Must be in the same folder as the ONNX model due to file size splitting.

Open Link

Model

Wan2.1 VAE (BF16)

Variational Autoencoder. Compresses video frames into latent space for processing and decodes them back to pixels.

Open Link

Model

UMT5-XXL Text Encoder

Massive text encoder (T5) that translates your prompts into embeddings the Wan2.1 model can understand.

Open Link

Model

Wan2.1 I2V 14B (Lightx2v)

The main diffusion model (14 Billion params). Logic: Generates the video frames based on text and image input.

Open Link

Model

CLIP Vision (H)

Vision encoder that 'sees' your reference image to guide the generation (IPAdapter style functionality).

Open Link

Model

SCAIL Adapter (FP8 Scaled)

The specific SCAIL model weights. Enables the 'Pose Control' capability within the Wan2.1 architecture.

Open Link

Repository

WanVideo SCAIL Tree

File browser for the SCAIL directory. Useful to check for updates or alternative model versions.

Open Link

Important: Model Placement

Checkpoints/Diffusion: ComfyUI/models/diffusion_models/ or loras/ (check specific node inputs).
VAE: ComfyUI/models/vae/
Text Encoders: ComfyUI/models/text_encoders/
Pose/Detection: ComfyUI/models/detection/ (Keep .onnx and .bin together).

WanVideo SCAIL Pose Control

Required Resources & Explanations

ComfyUI-WanAnimatePreprocess

ComfyUI-SCAIL-pose

ComfyUI-WanVideoWrapper

SCAIL (Official)

YOLO v10m (ONNX)

ViTPose-L Wholebody (ONNX)

ViTPose-H Model (ONNX)

ViTPose-H Data (.bin)

Wan2.1 VAE (BF16)

UMT5-XXL Text Encoder

Wan2.1 I2V 14B (Lightx2v)

CLIP Vision (H)

SCAIL Adapter (FP8 Scaled)

WanVideo SCAIL Tree

Important: Model Placement

Get the Workflow

Setup Tips