WanVideo SCAIL Pose Control
A complete resource list for setting up the WanVideo SCAIL workflow. Ensure you have all dependencies installed for proper Pose-to-Video generation.
Required Resources & Explanations
ComfyUI-WanAnimatePreprocess
Essential for data preparation. Handles face cropping, masking, and initial pose detection before passing data to the animation model.
ComfyUI-SCAIL-pose
The core nodes for SCAIL. Features advanced 3D pose extraction and multi-character support for high-fidelity animation control.
ComfyUI-WanVideoWrapper
The main sandbox wrapper that allows Wan2.1 models to run inside ComfyUI. Required to load the diffusion models.
SCAIL (Official)
The official research repository. Useful for understanding the underlying technology of Spatially Consistent Animation.
YOLO v10m (ONNX)
Object detection model used to find people/bodies in the frame before pose estimation runs.
ViTPose-L Wholebody (ONNX)
Large Vision Transformer for pose estimation. A solid balance of accuracy and speed for body tracking.
ViTPose-H Model (ONNX)
Huge Vision Transformer model. Provides the highest accuracy for whole-body pose detection but requires more compute.
ViTPose-H Data (.bin)
Required weights file for ViTPose-H. Must be in the same folder as the ONNX model due to file size splitting.
Wan2.1 VAE (BF16)
Variational Autoencoder. Compresses video frames into latent space for processing and decodes them back to pixels.
UMT5-XXL Text Encoder
Massive text encoder (T5) that translates your prompts into embeddings the Wan2.1 model can understand.
Wan2.1 I2V 14B (Lightx2v)
The main diffusion model (14 Billion params). Logic: Generates the video frames based on text and image input.
CLIP Vision (H)
Vision encoder that 'sees' your reference image to guide the generation (IPAdapter style functionality).
SCAIL Adapter (FP8 Scaled)
The specific SCAIL model weights. Enables the 'Pose Control' capability within the Wan2.1 architecture.
WanVideo SCAIL Tree
File browser for the SCAIL directory. Useful to check for updates or alternative model versions.
Important: Model Placement
- Checkpoints/Diffusion:
ComfyUI/models/diffusion_models/orloras/(check specific node inputs). - VAE:
ComfyUI/models/vae/ - Text Encoders:
ComfyUI/models/text_encoders/ - Pose/Detection:
ComfyUI/models/detection/(Keep .onnx and .bin together).
Get the Workflow
Download the configured JSON file to drag-and-drop directly into ComfyUI.
Download JSONSetup Tips
- Ensure ViTPose-H .onnx and .bin files are in the same folder.
- ComfyUI-WanVideoWrapper is evolving fast; update frequently via Manager.
- High VRAM (24GB+) is recommended for the 14B model variants.