Had a play with this. It's not there yet but it has potential, it's too unstable frame-to-frame, and the results have an absurd, dreamlike quality. Giving it an image to work from is better than a prompt, this is one of the better ones I made
I tried the paid version of Luma for a month, it's cool but it has a lot of limitations at the moment:
- you can't do actions other than walking and idle animations. So you can forget about pie throwing or any other behavior (or at least I haven't managed no matter how much I tried); - If you start from an image with a lot of mess, character recognition will fail. The mess will be treated as some kind of removable mask and you'll end up with a completely different character; - the system struggles with scenes that have characters in the background, it will often pick a random character; - image + prompt barely works and the generation will just do whatever it wants; - since it works frame by frame, once it loses the model on a frame the rest of the video will be a monstrosity beyond human comprehension; - text prompting is really limited, way worse than Dall-e 3 and closer to SD;
So you're kinda limited to model playing with pies scenarios.
On the bright side the tech is really promising with really impressive consistency on characters, scenes and even photography style. It also has very loose content filtering (for now).
Wait until you see Runway Gen3, it looks leaps ahead of Luma and closing in on Sora. Video is definitely coming along quick but we'll see how it goes. With how censored most image generators are I wouldn't hold my breath for anything ground breaking.
kortanklein said: I tried the paid version of Luma for a month, it's cool but it has a lot of limitations at the moment:
- you can't do actions other than walking and idle animations. So you can forget about pie throwing or any other behavior (or at least I haven't managed no matter how much I tried); - If you start from an image with a lot of mess, character recognition will fail. The mess will be treated as some kind of removable mask and you'll end up with a completely different character; - the system struggles with scenes that have characters in the background, it will often pick a random character; - image + prompt barely works and the generation will just do whatever it wants; - since it works frame by frame, once it loses the model on a frame the rest of the video will be a monstrosity beyond human comprehension; - text prompting is really limited, way worse than Dall-e 3 and closer to SD;
So you're kinda limited to model playing with pies scenarios.
On the bright side the tech is really promising with really impressive consistency on characters, scenes and even photography style. It also has very loose content filtering (for now).
It makes me wonder how cool SORA will be.
This is exactly what I found. An image with slime falling will be translated into a video with a solid column of slime. I had some OK results with gunge dunk pics, but more like the playful pleading before the dunk, rather than the dunk itself.
Nice article on the current state of the art AI video generation. It's gonna be a while before you can post a short script and have it turned into a reasonable video.
So I've been playing with it a bit more, once you know what you can and cannot do it's kinda decent. And it seems it's improving rapidly at least from an interface perspective.
If you want to use it, some advice:
- add 'no camera movement' to every image prompt. Camera movement almost never works - use images that don't have too much mess, ideally liquid mess works better than chunky mess - POV kinda works as long as you don't expect actions - the system is trained on Caucasian models, so other phenotypes might turn Caucasian during the video.
One thing to say about LumaLabs is that it's quite expensive per generation, works out about 20c per 5 second video clip at the lowest tier of paid membership. I burnt through my 150 prompts quite quickly, my learning from it is similar to kortanklein's.
It seems less is more when it comes to prompts, beyond very basic direction, regardless of what I prompted it felt like basically pot luck, I got the subject moving about in an essentially uncontrollable way. A few times I got mess moving in a plausible manner, rarely did I get mess leaving a mess behind. Most of the time falling mess actually cleaned the subject, she started off messy and ended clean. Sometimes the results were pure horror, with deformed bodies and melted faces. I'll post the best results in a video to my profile when I have time.