You've probably seen yesterday that OpenAI released a new image generation model (it's not clear if it's a DALL-E evolution or a new model entirely).
You can use this model via SORA interface if you have Plus subscription.
Early testing shows absolutely massive improvement in terms of prompt complexity and, even better, image to image consistency allowing you to reuse the exact same setting.
kortanklein said: You've probably seen yesterday that OpenAI released a new image generation model (it's not clear if it's a DALL-E evolution or a new model entirely).
You can use this model via SORA interface if you have Plus subscription.
Early testing shows absolutely massive improvement in terms of prompt complexity and, even better, image to image consistency allowing you to reuse the exact same setting.
The image model is completely separate to DALLE. previously, GPT passed the revised/filtered prompt to DALLE. 4o is a multi model with image recognition, generation and speech. They've held back releasing the image generation part of it for nearly a year for red teaming. The filters are definitely relaxed for now but it's also got a far wider range of understanding and will get filtered for content it interprets as Fetish content etc.
API access will be more flexible and suspect it'll be rolled out via azure and Bing etc in near term.
kortanklein said: My bad, I didn't realize there was already a thread about it. I'll switch to the other thread, there is a lot to say about this new model.
It really is the best image model available right now.
So playing around a little more, A lot of my old prompts from the early Dalle/Bing days work pretty well without editing them. The same prompt engineering works as before for the most part.
Haven't had time to really stretch the model with newer fresh prompts yet
After having a good play around with GPT and the new model, I'm going out on a limb to say this is materially as much of an improvement to image generation as Dalle 3 was to 2.5. The level of understanding and creativity is insane. It's most noticeable when playing around with SFW concepts and images. If this truly allowed NSFW images and concepts, it'd be mind blowingly good. It uses a different method of generation that previous diffusion models and as a result, it can maintain temporal knowledge. In effect you can create coherent sequences of photos. I genuinely didn't think this would happen so soon. It's not perfect and obviously for wam/messy imaging, you will be fighting against filters for anything too NSFW but it's a real glimpse of how this tech is moving quickly.
Local models producing better quality images without constraints but bar has been set in terms of flexibility
messg said: Ohh well, it seems the more relaxed approach to censorship didn't particularly last long. Woke up to an immediate ban this morning.
If using this, I would be careful with skirting the line with NSFW or fetish related content.
Wow you got banned? I was thinking it was pretty chill with the filters, sorry to read that.
I find the tool really incredible: - you can upload an image and ask to render it from a different angle or perspective, perfect for POV; - you can upload two images and merge them, basically put two moments and say "show me a pie fight between the two" and it will render it with insane consistency; - you can do more complex actions than ever before (still testing this). - you can describe the content of a pie and it will try to render the result accordingly. - you can make a sequence of images by remixing each version.
messg said: Ohh well, it seems the more relaxed approach to censorship didn't particularly last long. Woke up to an immediate ban this morning.
If using this, I would be careful with skirting the line with NSFW or fetish related content.
Wow you got banned? I was thinking it was pretty chill with the filters, sorry to read that.
I find the tool really incredible: - you can upload an image and ask to render it from a different angle or perspective, perfect for POV; - you can upload two images and merge them, basically put two moments and say "show me a pie fight between the two" and it will render it with insane consistency; - you can do more complex actions than ever before (still testing this). - you can describe the content of a pie and it will try to render the result accordingly. - you can make a sequence of images by remixing each version.
It's straight witchcraft.
Indeed, I was able to test quite a bit before the ban. The level of flexibility is insane. On the surface, people think this is just another Dalle or Diffusion model but the changes go much deeper than that and even for SFW imaging, the outputs are miles ahead of others.
On my ban, I was using the Sora front end and it looks like if you get a number of complete refusals they will trigger the ban. I wasn't banned immediately, I got an email about 6hrs later but it also seems a number of others got it too around the same time. To be honest, it's completely my own fault. I was testing a number of prompt engineering techniques to see where the model limits were. Nothing was particularly NSFW or pornographic. My ban reason was for "Non-consensual Intimate Content" I had been testing prompts for the subject in stocks and wrists tied so it's fully on me. I'm mostly annoyed that I used my main GPT pro account but I suppose it gives me a good reason not to renew. I can always create a new account.
I think as long as you keep the prompts lighthearted, safe you should be fine. It's a shame but as with Dalle, the model is way more capable than what is released. I've no doubt the unrestricted version would be able to do pretty much all NSFW