I'm not sure this is the right forum for this but I'll answer as an AI guy.
AI can spell- chatGPT and other LLMs are the obvious proof of this. As a matter of fact, most spell checkers you use probably invoke some AI implementations
Im guessing you're referring to the image generation AI, which cannot spell (well, some can, but it's a complicated explanation). Text to image (TTI) AI tools are very different from LLM models. They are designed to create images from text.
(1) Unlike LLM AI, they are not built to understand concepts like grammar, linguistic structure, syllabic composition, or otherwise. TTI tools are built to understand things like color, perspective, spaces, and interactions. TTI tools are trying to achieve very different goals from LLM tools. It's like asking why a house doesn't have wheels. A house is not meant to be a vehicle. Likewise, a vehicle is not meant to be a living space
(2) TTI tools are trained by taking in literal billions of images with heavily articulate descriptions to teach the AI what is going on. Among the images provided, very few (relatively speaking) are going to contain text. Of those that do have text, they are going to have text in hundreds of languages. And for each text that is in an image, they will have different font faces and typographies, many of which obstructed, blurry, incomplete or otherwise illegible. There is absolutely no reason to believe that TTI AI should or would be able to figure this out. As a matter of fact, no human would in this situation either.
(3) The tech is in it's toddler era. AI has been growing for a while, but it's getting to a point where it's becoming somewhat sufficient. If you remember AI failing to generate realistic hands, that was a huge hurdle to cross. And yet, once attention was drawn to it, AI got a LOT better at rendering hands. (Still not perfect depending on which model you may be using, but nonetheless). TTI will, sooner than later begin to understand text, but when you consider the complications of languages, font faces, perspectives and otherwise, you have to realize just how much you're actually asking for.
As mentioned above, there are TTI models that produce some pretty great text as it is, but that's their focus, so rendering something like, oh I don't know...a girl getting pied? That will not be viable in said model
Tl;Dr - Ai is complicated, human language is more complicated. And the written version of text is infinitely complicated
AI takes information from the prompt and also uses alpha-numeric characters it has 'data-mined' from other sources. It has no intelligence, only the ability to piece things together. I was finding cameras in a lot of my images once, then realized I had put 'photo-realistic' in the wrong place in my prompt, so it was making people in the scene into photographers. I also notice things spelled wrong a lot of the time.
Here is an example where I clearly wrote 'snack bar' in the prompt but the image left out the 'r'.