So image models are generally getting to high standard but the main thing that holds back a lot of prompts and images produced is wasted pixels. Most models will run at quite a low resolution relatively speaking. That means when you are generating, any part of the image that isn't part of the main subject is going to waste pixels on something not needed. SOme times backgrounds are important but proportionally, I see a lot of images being generated where 70% of the picture is background. This is killing the image generation as you're left with such a low number of pixels, deformities and details are lost.
The 2 examples show what I mean, the more cropped and centred image produces a much high level of detail.
Of course, backgrounds are important as they set the scene so there are a couple of ways of avoiding the loss of detail.
1) Increase the overall resolution, more pixels = more detail even if you are using it on the background. Probably the only real direct solutions for public models. They may offer the feature for pro users. 2)Infilling generation. Some tools will allow you to take a cropped image and infill/expand the background. This allows you to focus primarily on generating the key theme and then adding the remainder of the background 3) For users of comfy whether it's flux, QWEN or something else. You can use a tiled Vae generation that effectively chunks the generation down into equal parts. Effectively generating 4x images (or more) which are then stiched together.
Overall though, composition is the main thing to consider. Use the pixels most efficiently.
Interestingly, i think the left image is more appealing to me, it shows more. This seems to be a limitation of AI right now, as the "camera" gets closer like you said you lose the context of the scene, and the angles themselves just dont do it for me. irl when you're close to a person you can still see them.