Here is a quick messy video compilation showing the latest version of stable diffusion img2vid xt v1.1 that was released a few days ago. It's definitely not quite there yet with video but some decent progress is being made on making things a bit easier and more consistent between frames. For this tool it is taking a single image (AI generated) and then turning it into a short video. There is not too much control of this process and no prompt can be provided, so it just does what it thinks needs doing with the image.
It's a big improvement. I think it would make more sense if AI were to create a video from a series of photos, such as a wam photoshoot that shows some continuity between shots. That way it could almost use the old technique of morphing but using AI to add more movement and realism and with longer videos due to a string of images being input. With just one image, it's hard to imagine what AI could possibly come up with, other than a few seconds of basic movement.
Things are definitely moving at a fast pace.. I was reading about SORA earlier. Can only imagine how crippled it will be with censorship when released but I'm not sure if thats a bad thing..
There is no comparison, Sora is soo much better - check it out!
It looks very interesting, but I suspect that the current blocking algorithms in Bing etc will seem like a soft touch compared to what they'll implement for Sora!
When you consider where AI was 10 years ago, then 5 years ago, then just 1 year ago, it's clear that things are improving at an exponential rate. I'm really interested to see where this technology will be in just a few years as it can only get better and better over time.
There is no comparison, Sora is soo much better - check it out!
Sora looks cool and is showing the next steps with AI video. For our needs though we will need to wait for open source technology to catch up. But it's exciting to see that we will be able to make detailed videos of a decent length eventually.
There is no comparison, Sora is soo much better - check it out!
They haven't released sora yet and I suspect it will be a while before it trickles down to normal users. Even when it does, I can't imagine the filters will let anything through nsfw.
Have you tried taking the outputs of the img2vid and then re-stylising them with eg. Deforum afterwards? Figure it might fix some of the "vectory" nature of the outputs..
messy_ai said: Have you tried taking the outputs of the img2vid and then re-stylising them with eg. Deforum afterwards? Figure it might fix some of the "vectory" nature of the outputs..
I've not played with deforum in a while. Does it have any more frame to frame consistency these days? I always recall the videos having quite large variance between frames.
I'm just in the process of generating an animatediff video at the moment, that one is looking quite cool.
I'm not sure OpenAI is going to have too much that's useful for what we want to create. Of course Sora looks absolutely stunning and is a complete revolution in video generation. But what we are waiting for is the democratization of this technology when open source solutions become available that match this. Sora, when it releases in a few months or so, is going to come with massive censorship of prompts and will be an exercise in trying to squeeze a carefully worded analogy past it to get something maybe semi-desirable from a WAM perspective. But then this is also my opinion on things like bing, a million weird images for a handful of loosely wam related ones.
It is a very exciting day all the same though with Sora being announced. But my excitement is mostly for what other tools will come after and what will be possible in the near future.
I'm not sure OpenAI is going to have too much that's useful for what we want to create. Of course Sora looks absolutely stunning and is a complete revolution in video generation. But what we are waiting for is the democratization of this technology when open source solutions become available that match this. Sora, when it releases in a few months or so, is going to come with massive censorship of prompts and will be an exercise in trying to squeeze a carefully worded analogy past it to get something maybe semi-desirable from a WAM perspective. But then this is also my opinion on things like bing, a million weird images for a handful of loosely wam related ones.
It is a very exciting day all the same though with Sora being announced. But my excitement is mostly for what other tools will come after and what will be possible in the near future.
Agreed, I think what a lot of people are missing is this isn't going to be available to run local any time in the near future. The processing power needed is astronomical. Technically it is stunning and some techniques will bleed out into opensource but SORA is going to be through subscription only and I doubt that will be cheap. Add to that, OAI's filter policies this isn't going to make your wam dreams come true. What I do want to see is SD adopt techniques learned from OAI. The image captioning on SD1.5 and SDXL is horrendous. The whole data set needs scrapped and reworked with descriptive captioning. Add to that a prompt inference layer that enhances and alligns your prompt to the data set and we will see big improvements.
Ya'll keep missing the point. It's being done now and sooner rather than later. It won't be long before it's Open source and all you have to do is give the thing a script or just s story and tell it what kind of voice to give the characters.
DuncanEdwards said: Ya'll keep missing the point. It's being done now and sooner rather than later. It won't be long before it's Open source and all you have to do is give the thing a script or just s story and tell it what kind of voice to give the characters.
DuncanEdwards said: Ya'll keep missing the point. It's being done now and sooner rather than later. It won't be long before it's Open source and all you have to do is give the thing a script or just s story and tell it what kind of voice to give the characters.
To what end? If you think this is going to work any time soon on a runpod cluster let alone locally on a 4090 or what ever is coming in the next 5 years you're mistaken. This is no doubt an amazing technology but this will be ran as SaaS for the foreseeable future and with that comes content restrictions and filtering. True industry content creators have a lot to fear, their roles will change no doubt. Stock photography, animation, SFX teams will no doubt be impacted. Of course there will be a downstream impact on local modelling and I'm all for the improvements that will come from this but I'm realistic of what to expect.
From what I've seen uncovered already, I don't think the stock image companies have much to worry about - SORA videos seem to heavily rely on stock videos they've scraped from Shutterstock! Some of their show-off video prompts are so good because they've asked it to output stuff they knew had been scraped, so were virtually assured of a great looking AI video.