Back to ComfyUI Hub
Advanced Workflow

WanVideo SCAIL Pose Control

A complete resource list for setting up the WanVideo SCAIL workflow. Ensure you have all dependencies installed for proper Pose-to-Video generation.

Required Resources & Explanations

Repository

ComfyUI-WanAnimatePreprocess

Essential for data preparation. Handles face cropping, masking, and initial pose detection before passing data to the animation model.

Repository

ComfyUI-SCAIL-pose

The core nodes for SCAIL. Features advanced 3D pose extraction and multi-character support for high-fidelity animation control.

Repository

ComfyUI-WanVideoWrapper

The main sandbox wrapper that allows Wan2.1 models to run inside ComfyUI. Required to load the diffusion models.

Repository

SCAIL (Official)

The official research repository. Useful for understanding the underlying technology of Spatially Consistent Animation.

Model

YOLO v10m (ONNX)

Object detection model used to find people/bodies in the frame before pose estimation runs.

Model

ViTPose-L Wholebody (ONNX)

Large Vision Transformer for pose estimation. A solid balance of accuracy and speed for body tracking.

Model

ViTPose-H Model (ONNX)

Huge Vision Transformer model. Provides the highest accuracy for whole-body pose detection but requires more compute.

Config

ViTPose-H Data (.bin)

Required weights file for ViTPose-H. Must be in the same folder as the ONNX model due to file size splitting.

Model

Wan2.1 VAE (BF16)

Variational Autoencoder. Compresses video frames into latent space for processing and decodes them back to pixels.

Model

UMT5-XXL Text Encoder

Massive text encoder (T5) that translates your prompts into embeddings the Wan2.1 model can understand.

Model

Wan2.1 I2V 14B (Lightx2v)

The main diffusion model (14 Billion params). Logic: Generates the video frames based on text and image input.

Model

CLIP Vision (H)

Vision encoder that 'sees' your reference image to guide the generation (IPAdapter style functionality).

Model

SCAIL Adapter (FP8 Scaled)

The specific SCAIL model weights. Enables the 'Pose Control' capability within the Wan2.1 architecture.

Repository

WanVideo SCAIL Tree

File browser for the SCAIL directory. Useful to check for updates or alternative model versions.

Important: Model Placement
  • Checkpoints/Diffusion: ComfyUI/models/diffusion_models/ or loras/ (check specific node inputs).
  • VAE: ComfyUI/models/vae/
  • Text Encoders: ComfyUI/models/text_encoders/
  • Pose/Detection: ComfyUI/models/detection/ (Keep .onnx and .bin together).

Get the Workflow

Download the configured JSON file to drag-and-drop directly into ComfyUI.

Download JSON

Setup Tips

  • Ensure ViTPose-H .onnx and .bin files are in the same folder.
  • ComfyUI-WanVideoWrapper is evolving fast; update frequently via Manager.
  • High VRAM (24GB+) is recommended for the 14B model variants.
Chat with us on WhatsApp