Green slimed with Wan2.1! (custom lora) by wammypinupart

AI Wam group

AI Wam

"Green slimed with Wan2.1! (custom lora)"

23 posts 2141 views

3/20/253/20/25

Quote Link

Tagged female, synthetic

Report

wammypinupart ✓

Videos Store Gallery Posts Blogs

I trained a custom lora for Wan2.1!
It's just stupid how good this is already. For a rookie effort I mean. This is for the text-to-video model, and it's trained on **images**, not even video. I just pulled in the dataset I had from my last Flux lora. Feels wonky in some ways, and there are limitations, but mostly, wowzers.

Watch this thread! I will put more gifs on it as I explore different stuff.

Here: gifs from some of the first text-to-video generations I ran, fiddling with different settings. These aren't the most creative scenarios. But, you see what it can do. From just a text prompt! And the hit rate was really high for these.

(I did the training with ComfyUI and DiffusionPipe on Runpod, following this tutorial: https://learn2train.medium.com/fine-tuning-wan-2-1-with-a-curated-dataset-step-by-step-guide-a6f0b334ab79 -- a couple of false starts, some headache. But worth it!)

animated

3/21/253/21/25

These are fantastic! Very inspiring!

3/21/253/21/25

Quote Link Report

wetwedding

Gallery Posts

these are great, fun to watch.

3/21/253/21/25

Quote Link Report

thereald ✓

Videos Gallery Posts

Now this is a step up.

So if I'm understanding correctly it's using an image based lora for prompt reference, but all the motion is based on the model's existing dataset?

Looking at using Runpod myself - noting that you pay a very affordable rate for GPU access, what would you say the cost per clip or cost per second of video is?

3/21/253/21/25

Quote Link Report

16parkvilla ✓

South Yorkshire UK

Gallery Posts Blogs

Way above my understanding, but these are fantastic

3/21/253/21/25

Quote Link Report

messg ✓

Gallery Posts

thereald said: Now this is a step up.

So if I'm understanding correctly it's using an image based lora for prompt reference, but all the motion is based on the model's existing dataset?

Looking at using Runpod myself - noting that you pay a very affordable rate for GPU access, what would you say the cost per clip or cost per second of video is?

It's a little more complex than that. The image set also trains the visuals along with the caption understanding. With WAN you can also train on videos which will also capture the specific motions.

3/21/253/21/25

Quote Link Report

wammypinupart ✓

Videos Store Gallery Posts Blogs

I think it's basically telling Wan "for this prompt, create video frames that look like this image"? ...

You can train Wan on short videos too, but when I tried adding a few of them, it immediately caused videocard memory problems that I can't solve. (file size? frame rate?) I bet this gets easier over time though, with more tutorials or training frameworks.

Runpod cost: I actually use Google Colab mostly (I might switch), broadly I think these are comparable and it's something like 5 to 10 cents an image? It's waaay cheaper than Kling anyway. I would recommend it. It takes some time to learn ComfyUI too -- I started back in the Stable Diffusion days -- that has been a worthwhile investment, they have added capability for new models as they roll out. I trained my Flux loras in a Comfy template for example.

3/21/253/21/25

messg said:

It's a little more complex than that. The image set also trains the visuals along with the caption understanding. With WAN you can also train on videos which will also capture the specific motions.

That's kind of what I meant, but you put it better - I was thinking that the image data and caption data was acting like a 'super prompt image', the equivalent of an image prompt in image-to-video but with much more data for the model to work with (I know this is just thinking by analogy and probably not what's actually happening). Wan2.1 is looking very, very promising for what I make, but I'm wary of it being a massive time-sink...

wammypinupart said: Runpod cost: I actually use Google Colab mostly (I might switch), broadly I think these are comparable and it's something like 5 to 10 cents an image? It's waaay cheaper than Kling anyway. I would recommend it. It takes some time to learn ComfyUI too -- I started back in the Stable Diffusion days -- that has been a worthwhile investment, they have added capability for new models as they roll out. I trained my Flux loras in a Comfy template for example.

That is cheaper, cheers. Certainly worth exploring

3/21/253/21/25

Quote Link Report

umd105790

Gallery Posts

holy balls!

you're my favorite AI pioneer lol

Keep the change you filthy animal

3/21/253/21/25

Quote Link Report

messg ✓

Gallery Posts

Unfortunately, you can't avoid it being a timesink. On the plus side, you're investing in learning which will help with many things in the future. What we learn from image training translates to video and so on.
Just start small and simple and build on it, it's vastly more rewarding than trying to rely on Commercial models where you are constrained to the whim of the company.

3/21/253/21/25

Quote Link Report

MrBloom

Posts

These videos are sick! Can't wait to see more!

3/21/253/21/25

Quote Link Report

wammypinupart ✓

Videos Store Gallery Posts Blogs

Gif share number two. Buckle up!

Wan has two different models for text-to-video (t2v) and image-to-video (i2v). My lora is for the t2v version. But I read that you can use a t2v wan lora on i2v generations too, so of course I had to try that out.

Because it would be cool if I could start with a single clean image, and turn that into a slime video. Like kind of the dream AI generator. But there's no way that this will actually work, right? With this jenky first-try lora, and this weird type of niche video I want to make? No way. Right?

... Right?

animated

3/22/253/22/25

great part 2, Leo seeing your gif's

3/22/253/22/25

Quote Link Report

5-ht

Posts

wammypinupart said: Gif share number two. Buckle up!

Really impressive stuff! Seems like the animation is slightly inferior to the T2V, but with the obvious bonus of the "holy grail" workflow of generating clean images and then just applying the mess in the vid gen. Any tips about the curation of the image dataset to include in the LoRA? Does it need examples of different stages of a sliming, and does that need to be explicitly captioned? Or is it just a variety of high-quality slime images? Managed to get it going on runpod, but haven't actually trained anything yet because now i need to go back and actually collect the training data.

3/22/253/22/25

Quote Link Report

5-ht

Posts

Incidentally, it's absolutely insane that this works!

3/22/253/22/25

Quote Link Report

wammypinupart ✓

Videos Store Gallery Posts Blogs

5-ht said: Any tips about the curation of the image dataset to include in the LoRA? Does it need examples of different stages of a sliming, and does that need to be explicitly captioned?

Your guess is as good as mine, that's what I'd try if I were doing another one. I didn't do anything that fancy this time around. Maybe there are some tips out there on the internet, it's all pretty new though and the more typical lora case is a style or character.

Looking forward to seeing what you come up with! Good luck!

3/22/253/22/25

5-ht said:

wammypinupart said: Gif share number two. Buckle up!

The same principle applies as to training an image model. large diverse image sets are best. Hi quality images, variety of subjects, backgrounds and mess.
Caption images should be done with detailed descriptions of the images. Keywords do work but it's not going to create the most flexible of models.

3/23/253/23/25

Quote Link Report

wammypinupart ✓

Videos Store Gallery Posts Blogs

Another image-to-video batch. Simple scenes again, just stretching it out a tiny bit: can it do different colors? (I only have green in the training set.)

animated

4/1/25, 1:02pm: No-bump reply

3/25/253/25/25

Logged in for the first time in ages as I had to tell you that my partner and I are blown away by your work! It's incredibly realistic! Wow!

3/25/253/25/25

Quote Link Report

Jimmy20

Worcestershire

Posts

Absolutely astonishing work - I could watch these GIFs all day (and I see a future where that's all I will do at this rate, haha)

Amazing stuff, can't wait to see this progress further!

3/25/253/25/25

Quote Link Report

wammypinupart ✓

Videos Store Gallery Posts Blogs

One last gif share for the thread! I'm off and running after this. I'm still not entirely sure what to do with this format! ... but there will be more.

Trying out some slightly more elaborate scenes to see what happens, back to text-to-video except for a couple I extended (i2v on the last frame) and combined. It continues to be surprisingly coherent. The success rate for getting useable results is lower, the more complicated the prompt is, but it's still pretty high, like I'm pretty happy with about a third to a half of what comes back. I have to learn to use a video editor now, lol.

Thanks for the love, y'all

animated

3/31/253/31/25

Incredible work. Love the hot blonde baker!

Am I right in thinking that gunge tanks would be near impossible to replicate at this stage, due to the affect on the flow and the glass?

4/1/254/1/25

Quote Link Report

deadpool

Posts

wammypinupart said: Another image-to-video batch. Simple scenes again, just stretching it out a tiny bit: can it do different colors? (I only have green in the training set.)

You can't go wrong with black, or blue, but... old school green is so much fun.

I even wanna be me! -- Zoe

New Thread

Back to Group

About AI Wam Group

Images and discussion about computer-generated content

GuidelinesIf you upload content that features photorealistic models that are actually computer-generated or deep fakes, you must check the appropriate box on the upload form that tags it as synthetic content. The actual technology used to generate the content does not matter, whether it's artificial intelligence, Photoshop, or a really skilled human artist.

If you upload content that features the likeness of real person but has been generated or altered to depict them in a fantasy scenario, that person must be you, your model, or someone from whom you've gotten consent to use their likeness for this purpose. We will need to remove any content that we feel may be using a person's likeness without their consent.

Photorealistic generated and altered content is subject to the same rules as natural content: Copyright laws still apply, subjects must be clearly portrayed as adults, and if the content depicts any nudity, the uploader must have a verified account.

Textual content generated by computer must be tagged as synthetic.

Posts, uploads, and other interactions may not be done by a bot or any other automated process.