It's still pretty hard to make AI get the concept of a thrown pie ... but it's getting somehow usable now. The physics of the cream impact and its effect on hair and surrounding is already pretty good. I am looking forward to the next iterations of tools and models.
Oh and because I get the question a lot. There's not one tool or model I use in generating these. It's a combination of tools and every sequence featured in a video takes several steps to generate. Also my work builds upon archives of my own work and generative AI material that has been refined over several iterations of the underlying tools. I assume these days most of the AI video creation tools can produce decent results, but they either require a lot of computing power if run locally or you need to use credits. In any way producing this costs money. I did not yet have a chance to use Veo3 as it's not available on my side of the big pond, but I am looking forward to testing it.
outstanding results. I know how much effort that goes into preparing the datasets, training the models and refining them. It's a thankless task much of the time. I'm curious about your workflow. Are you creating start/end frames in flux using your Loras and then using Kling etc to generate the videos before upscaling or have you gone direct to training Wan2.1 and using Txt2Vid.
I've got my hands on a new RTX pro 6000 (96Gb) which I'm using locally to train HiDream. Their full FP16 model is outstanding and trains way better than flux as far as my testing goes. I haven't got round to training Wan yet. That's on my list of things to try soon but work keeps getting in the way
messg said: outstanding results. I know how much effort that goes into preparing the datasets, training the models and refining them. It's a thankless task much of the time. I'm curious about your workflow. Are you creating start/end frames in flux using your Loras and then using Kling etc to generate the videos before upscaling or have you gone direct to training Wan2.1 and using Txt2Vid.
I've got my hands on a new RTX pro 6000 (96Gb) which I'm using locally to train HiDream. Their full FP16 model is outstanding and trains way better than flux as far as my testing goes. I haven't got round to training Wan yet. That's on my list of things to try soon but work keeps getting in the way
Thanks for the feedback. I think you are already much more sophisticated then me. I put a lot of time into preparing my base material and still images. These are my starting point usually. I created characters and refined them for over almost two years now. I used locally trained models as well as various online services as basic as bing or chatgpt. I use chatgpt to create scenarios or prompts and help it to refine my scenes. For video I use runway, Luma or kling, midjourney also will become an option. I didn't yet start to think about running video creation locally. I only have one fast GPU and I mostly use it for upscaling and refining locally. Obviously Veo3 with the optiion of adding audio will be of interest to me, but it's not available yet here and I don't want to go through the hassel of VPN and a US google account.
Very impressive. There are some fantastic subtleties not just in the pie hits but the way the models move and interact with the environment around them.
messg said: outstanding results. I know how much effort that goes into preparing the datasets, training the models and refining them. It's a thankless task much of the time. I'm curious about your workflow. Are you creating start/end frames in flux using your Loras and then using Kling etc to generate the videos before upscaling or have you gone direct to training Wan2.1 and using Txt2Vid.
I've got my hands on a new RTX pro 6000 (96Gb) which I'm using locally to train HiDream. Their full FP16 model is outstanding and trains way better than flux as far as my testing goes. I haven't got round to training Wan yet. That's on my list of things to try soon but work keeps getting in the way
Thanks for the feedback. I think you are already much more sophisticated then me. I put a lot of time into preparing my base material and still images. These are my starting point usually. I created characters and refined them for over almost two years now. I used locally trained models as well as various online services as basic as bing or chatgpt. I use chatgpt to create scenarios or prompts and help it to refine my scenes. For video I use runway, Luma or kling, midjourney also will become an option. I didn't yet start to think about running video creation locally. I only have one fast GPU and I mostly use it for upscaling and refining locally. Obviously Veo3 with the optiion of adding audio will be of interest to me, but it's not available yet here and I don't want to go through the hassel of VPN and a US google account.
I think this round of vids is really fantastic, definitely some of the most realistic, high definition stuff I've seen, which is obviously a function of your excellent photography being used as the data source!
I was just wondering though: when I look at them the main nitpicky "detractor / uncanny valley" stuff I'm still seeing looks like the work of the upscaler. How do they look prior to upscaling? Where I've messed around with e.g. Topaz before, I usually end up abandoning the results because it creates effects that detract from the realism of the substance. Are you finding a tradeoff too?
messg said: outstanding results. I know how much effort that goes into preparing the datasets, training the models and refining them. It's a thankless task much of the time. I'm curious about your workflow. Are you creating start/end frames in flux using your Loras and then using Kling etc to generate the videos before upscaling or have you gone direct to training Wan2.1 and using Txt2Vid.
I've got my hands on a new RTX pro 6000 (96Gb) which I'm using locally to train HiDream. Their full FP16 model is outstanding and trains way better than flux as far as my testing goes. I haven't got round to training Wan yet. That's on my list of things to try soon but work keeps getting in the way
Thanks for the feedback. I think you are already much more sophisticated then me. I put a lot of time into preparing my base material and still images. These are my starting point usually. I created characters and refined them for over almost two years now. I used locally trained models as well as various online services as basic as bing or chatgpt. I use chatgpt to create scenarios or prompts and help it to refine my scenes. For video I use runway, Luma or kling, midjourney also will become an option. I didn't yet start to think about running video creation locally. I only have one fast GPU and I mostly use it for upscaling and refining locally. Obviously Veo3 with the optiion of adding audio will be of interest to me, but it's not available yet here and I don't want to go through the hassel of VPN and a US google account.
I think this round of vids is really fantastic, definitely some of the most realistic, high definition stuff I've seen, which is obviously a function of your excellent photography being used as the data source!
I was just wondering though: when I look at them the main nitpicky "detractor / uncanny valley" stuff I'm still seeing looks like the work of the upscaler. How do they look prior to upscaling? Where I've messed around with e.g. Topaz before, I usually end up abandoning the results because it creates effects that detract from the realism of the substance. Are you finding a tradeoff too?
Thakns for the feedback and it's ok to be nitpicky ... I am that myself too much. I use Topaz in fact for upscaling to 4K and it can "amplify" artifacts and petterns sometimes but those usually exist already in the AI video output. I mainly upscale to 4K because the youtube algorithm besides many other things seems to be sensitive to resolution as well. E.g. you get better encoding from the getgo if you upload in 4K. Smaller channels - at least in the past - got the less effective encoding if the uploaded in 1080, cause it cost youtube less encoding ressources.
Topaz offers various models for upscaling, some of them are better and more ressource consuming than others. I usually use the Rhea model cause it strikes a good balance between processing time and my available render ressources. And yes my photography helped me a lot with getting good source material.