I've been away for a few months but I've still been making clips. These are using Wan2.2 and a custom slime Lora for Wan2.1. The Lora was trained solely on stills from videos that were made using ImageFX and Kling. No photos of any real persons were used in Lora training, no likeness of any real person is intended.
These aren't 100% perfect, but I think these demonstrate what is possible if you put the effort in and don't give all your money to OpenAI. Each of these clips cost less than 10c to generate. At a lower resolution, you can make them for less than 2c. Training the Lora probably cost about $4.
These are looking great, I know gifs are compressed but you can see the quality of the underlying videos in them. Have you experimented with training with video clips too yet? Training will take longer but should give a decent improvement once you've gone through the pain of formatting and captioning. I keep saying I'll move to Wan2.2 training next but keep getting distracted by Qwen_image. They use similar captioning and the vae is the same too so should train pretty well without too many changes. I probably should resize my image set and give it a go. I find Qwen needs further realism Loras that Wan2.2 doesn't need.
messg said: These are looking great, I know gifs are compressed but you can see the quality of the underlying videos in them. Have you experimented with training with video clips too yet? Training will take longer but should give a decent improvement once you've gone through the pain of formatting and captioning. I keep saying I'll move to Wan2.2 training next but keep getting distracted by Qwen_image. They use similar captioning and the vae is the same too so should train pretty well without too many changes. I probably should resize my image set and give it a go. I find Qwen needs further realism Loras that Wan2.2 doesn't need.
I have experimented with training from video, but ideally I'd like to train from real video to get realistic results, and that runs into consent and copyright issues. Formatting was relatively quick as I can just write a simple bash script for ffmpeg to convert to 16 FPS, then uses shortcut to crop the best 81 frames for each clip. Captioning manually was a couple of minutes per clip but that's no different to images. Training was indeed substantially longer per iteration but it's exactly the same process.
Wan 2.2 is good as is for realism, it doesn't really need any 'details' Loras or 'nudity helpers' that wan2.1 needs. Wan2.2 also works perfectly with Loras trained on wan2.1. Choice of scheduler seems to make more of a difference and I get best results with Euler. 4 steps on high noise and 10 steps on low noise.
The biggest advantage is that it is also substantially faster than wan 2.1. Something like 1700 seconds with wan 2.1 vs 120 seconds with wan2.2 to a comparable quantity on an A40 GPU.
thereald said: I have experimented with training from video, but ideally I'd like to train from real video to get realistic results, and that runs into consent and copyright issues. Formatting was relatively quick as I can just write a simple bash script for ffmpeg to convert to 16 FPS, then uses shortcut to crop the best 81 frames for each clip. Captioning manually was a couple of minutes per clip but that's no different to images. Training was indeed substantially longer per iteration but it's exactly the same process.
Wan 2.2 is good as is for realism, it doesn't really need any 'details' Loras or 'nudity helpers' that wan2.1 needs. Wan2.2 also works perfectly with Loras trained on wan2.1. Choice of scheduler seems to make more of a difference and I get best results with Euler. 4 steps on high noise and 10 steps on low noise.
The biggest advantage is that it is also substantially faster than wan 2.1. Something like 1700 seconds with wan 2.1 vs 120 seconds with wan2.2 to a comparable quantity on an A40 GPU.
I've used wan2.1 and 2.2 a lot. I actually use Wan2.2 in some of my comfy workflows with Qwen. I generate a lower resolution qwen image and pass the latent to Wan2.2 for a 2nd pass. This gives me the best of both worlds with Qwen's prompt adherence and my lora's with Wan's better realism.
I've a RTX 6000 pro so training and running resources are fine, it's just a time thing and whether the results are worth the effort.