As far as I'm aware, individual AI videos are currently limited to 8 seconds' duration (unless you're an extremely clever person who can seamlessly splice multiple videos together!), which gives the video creator very little time to convey the story they're trying to tell.
Bearing that in mind, please consider which of the following options most closely matches your view regarding generated characters speaking during AI videos. It's possible either to make characters say specific sentences or to provide the AI engine with a general idea of what's being said and it comes up with the actual words.
It's assumed that non-spoken vocalisations like laughing etc are accepted.
"talking or not" Most of the time, real or AI, I have the sound off. I prefer static images, or progressions, because I can make up my own stories, often muiltiple ones for the same images
I don't give them direction on what to say, but they sometimes say something cute that makes me like the video more. I get some version of "I can't believe I'm doing this!" or "This is so fun!" quite often with my videos, so that gets kinda repetitive. And sometimes I get something completely ridiculous, which makes me laugh. One of my favorites was a video of a woman happily playing in the pool in a ballgown and Veo made her say "I have no idea why I'm dressed like this" while smiling and giggling. I didn't expect that, and it made me laugh really hard.
Sometimes I wish the videos would have the subjects questioning why their creator put them in that situation or question my sanity or whatever, but I can't bring myself to prompt for that. I just don't think it'd be as funny if I know it's coming.
Kabe22 said: I don't give them direction on what to say, but they sometimes say something cute that makes me like the video more. I get some version of "I can't believe I'm doing this!" or "This is so fun!" quite often with my videos, so that gets kinda repetitive. And sometimes I get something completely ridiculous, which makes me laugh. One of my favorites was a video of a woman happily playing in the pool in a ballgown and Veo made her say "I have no idea why I'm dressed like this" while smiling and giggling. I didn't expect that, and it made me laugh really hard.
It is fun to let Veo come up with the actual words having given it some steers. I don't want to give it too much free reign for fear of wasting too many credits! There are enough things that can (and do) go wrong as it is!
Kabe22 said: Sometimes I wish the videos would have the subjects questioning why their creator put them in that situation or question my sanity or whatever, but I can't bring myself to prompt for that. I just don't think it'd be as funny if I know it's coming.
If it started doing that, I'm not sure I could trust it to have it's "heart" in generating good quality, realistic WAM images!
VEO3 is 8 second limited although I believe 3.1 can extend. Sora2 is 10second, 15 second and upto 25seconds using the storyboard. As you're generating SFW videos, I'd recommend playing around with Sora2.
Regardless of whether it's VEO3 or Sora2, I'd highly recommend trying JSON or YAML prompt formatting which you'll get way greater control over the output.
messg said: I'd highly recommend trying JSON or YAML prompt formatting which you'll get way greater control over the output.
Intrigued by this, I'll give it a google - does it jailbreak anything or it is just a more precise way of prompting?
I've made just a small handful of videos with Sora for the gameshow style stuff I make, and I didn't like how it created either a British male narrator with a local radio DJ type voice, or an American woman with a somewhat shrill and chirpy voice - I felt like I had no control over the voiceover, and it added little to the video. Sounds were also slightly out of sync with the video and that was off-putting.
I'd potentially be interested in something that could take a longer video and apply a realistic soundtrack to it following a script, and I haven't found anything good for this yet. But someone mouthing "oh fuck!" is just as good as someone saying "oh fuck!". It's mainly about the visuals for me.
messg said: I'd highly recommend trying JSON or YAML prompt formatting which you'll get way greater control over the output.
Intrigued by this, I'll give it a google - does it jailbreak anything or it is just a more precise way of prompting?
I've made just a small handful of videos with Sora for the gameshow style stuff I make, and I didn't like how it created either a British male narrator with a local radio DJ type voice, or an American woman with a somewhat shrill and chirpy voice - I felt like I had no control over the voiceover, and it added little to the video. Sounds were also slightly out of sync with the video and that was off-putting.
I'd potentially be interested in something that could take a longer video and apply a realistic soundtrack to it following a script, and I haven't found anything good for this yet. But someone mouthing "oh fuck!" is just as good as someone saying "oh fuck!". It's mainly about the visuals for me.
No, they're structured formatted syntax that most modern Video and imaging models have an understanding of and can follow very well. They will remove some randomness because of the specificity but can be amended to introduce that back. Sora2 and VEO3 can follow these prompts very well but Nanobanana (original and pro) and QWEN can also follow these very well too. Wan2.2 I've not tested enough but YAML uses less tokens so may be better.
A sora2 YAML prompt:
title: Charity Gunge Vlog The School Challenge (Fully Covered Edition) duration: 15s aspect_ratio: 9:16 style: camera: handheld phone vlog, front-facing lighting: bright, even, warm stage lighting reflecting softly on the glass mood: candid, expressive, playful audio: faint laughter from off-camera, gentle dripping, light echo inside the tank tone: genuine, humorous, and relaxed a natural moment of colourful charity fun
shots: - time: 0s4s description: > The clip opens in a tight handheld frame filmed on a phone from inside a **clear glass gunge tank**. A woman with **loose wavy brunette hair** sits on a small stool, laughing softly beneath a complete covering of **thick yellow custard, bright green slime, and soft pink foam**. Her **school uniform white blouse, striped tie, and pleated navy skirt with sheer black tights is completely hidden** under the heavy, dripping mixture. Custard slides from her hair in slow, glossy streams, while green slime glides across her shoulders and face. She blinks through the mess, smiling faintly before letting out a quiet laugh. After a pause, she says, amused but resigned, "Well there's no getting out of this one."
- time: 4s9s description: > The camera shifts slightly as she moves, the thick mixture **oozing down her hair and sleeves in glossy waves**. Her features are almost fully obscured by layers of custard and slime, the colours swirling together as they continue to drip. She lifts one hand to wipe her eyes, leaving bright yellow streaks across her cheeks. Another blob of foam slides down from her fringe and lands squarely in her lap. She giggles, shaking her head with disbelief. "It's so heavy," she says lightly, pausing to catch her breath. "I can barely move."
- time: 9s15s description: > A close-up captures her completely drenched -- custard and slime still sliding from her head in slow, thick ribbons. The camera lingers on her expression as she exhales a small, breathy laugh, her eyelashes clumped with custard and streaks of pink foam tracing down her neck. She looks at the lens, smiling through the mess. "All for charity," she says softly, voice calm and good-natured. She gives a cheerful thumbs-up, then wipes her eyes again, smearing more foam across her forehead as another drip falls from her hair. The clip fades out on her quiet laughter and the steady sound of custard dripping to the floor.
style_and_tone_summary: > Realistic handheld vlog filmed inside a clear gunge tank under bright, warm lighting. The pacing is unhurried, with expressive close-ups and natural pauses between lines. The woman's entire outfit and hair are coated in thick custard, foam, and slime, giving a vivid sense of texture and weight. Her dialogue is brief, friendly, and genuine, with laughter breaking through between moments of disbelief. The overall tone is colourful, light-hearted, and humorous -- a cheerful, self-aware charity moment captured with warmth and playful realism.
messg said: I'd highly recommend trying JSON or YAML prompt formatting which you'll get way greater control over the output.
Intrigued by this, I'll give it a google - does it jailbreak anything or it is just a more precise way of prompting?
I've made just a small handful of videos with Sora for the gameshow style stuff I make, and I didn't like how it created either a British male narrator with a local radio DJ type voice, or an American woman with a somewhat shrill and chirpy voice - I felt like I had no control over the voiceover, and it added little to the video. Sounds were also slightly out of sync with the video and that was off-putting.
I'd potentially be interested in something that could take a longer video and apply a realistic soundtrack to it following a script, and I haven't found anything good for this yet. But someone mouthing "oh fuck!" is just as good as someone saying "oh fuck!". It's mainly about the visuals for me.
No, they're structured formatted syntax that most modern Video and imaging models have an understanding of and can follow very well. They will remove some randomness because of the specificity but can be amended to introduce that back. Sora2 and VEO3 can follow these prompts very well but Nanobanana (original and pro) and QWEN can also follow these very well too. Wan2.2 I've not tested enough but YAML uses less tokens so may be better.
A sora2 YAML prompt:
title: Charity Gunge Vlog The School Challenge (Fully Covered Edition) duration: 15s aspect_ratio: 9:16 style: camera: handheld phone vlog, front-facing lighting: bright, even, warm stage lighting reflecting softly on the glass mood: candid, expressive, playful audio: faint laughter from off-camera, gentle dripping, light echo inside the tank tone: genuine, humorous, and relaxed a natural moment of colourful charity fun
shots: - time: 0s4s description: > The clip opens in a tight handheld frame filmed on a phone from inside a **clear glass gunge tank**. A woman with **loose wavy brunette hair** sits on a small stool, laughing softly beneath a complete covering of **thick yellow custard, bright green slime, and soft pink foam**. Her **school uniform white blouse, striped tie, and pleated navy skirt with sheer black tights is completely hidden** under the heavy, dripping mixture. Custard slides from her hair in slow, glossy streams, while green slime glides across her shoulders and face. She blinks through the mess, smiling faintly before letting out a quiet laugh. After a pause, she says, amused but resigned, "Well there's no getting out of this one."
- time: 4s9s description: > The camera shifts slightly as she moves, the thick mixture **oozing down her hair and sleeves in glossy waves**. Her features are almost fully obscured by layers of custard and slime, the colours swirling together as they continue to drip. She lifts one hand to wipe her eyes, leaving bright yellow streaks across her cheeks. Another blob of foam slides down from her fringe and lands squarely in her lap. She giggles, shaking her head with disbelief. "It's so heavy," she says lightly, pausing to catch her breath. "I can barely move."
- time: 9s15s description: > A close-up captures her completely drenched -- custard and slime still sliding from her head in slow, thick ribbons. The camera lingers on her expression as she exhales a small, breathy laugh, her eyelashes clumped with custard and streaks of pink foam tracing down her neck. She looks at the lens, smiling through the mess. "All for charity," she says softly, voice calm and good-natured. She gives a cheerful thumbs-up, then wipes her eyes again, smearing more foam across her forehead as another drip falls from her hair. The clip fades out on her quiet laughter and the steady sound of custard dripping to the floor.
style_and_tone_summary: > Realistic handheld vlog filmed inside a clear gunge tank under bright, warm lighting. The pacing is unhurried, with expressive close-ups and natural pauses between lines. The woman's entire outfit and hair are coated in thick custard, foam, and slime, giving a vivid sense of texture and weight. Her dialogue is brief, friendly, and genuine, with laughter breaking through between moments of disbelief. The overall tone is colourful, light-hearted, and humorous -- a cheerful, self-aware charity moment captured with warmth and playful realism.
Cool, get the idea. And in JSON...
{ "title": "Charity Gunge Vlog The School Challenge (Fully Covered Edition)" "duration": "15s" "aspect ratio": "9:16" "style": { "camera": [ "handheld phone vlog", "front-facing"] "lighting": ["bright", "even", "warm stage lighting reflecting softly on the glass"]