Poll: AI WAM videos by Wetmaxiskirts

Midlands, UK

As far as I'm aware, individual AI videos are currently limited to 8 seconds' duration (unless you're an extremely clever person who can seamlessly splice multiple videos together!), which gives the video creator very little time to convey the story they're trying to tell.

Bearing that in mind, please consider which of the following options most closely matches your view regarding generated characters speaking during AI videos. It's possible either to make characters say specific sentences or to provide the AI engine with a general idea of what's being said and it comes up with the actual words.

It's assumed that non-spoken vocalisations like laughing etc are accepted.

Sleazoid44 ✓

NJ USA

Posts

"talking or not" Most of the time, real or AI, I have the sound off. I prefer static images, or progressions, because I can make up my own stories, often muiltiple ones for the same images

Kabe22

Gallery Posts Blogs

I don't give them direction on what to say, but they sometimes say something cute that makes me like the video more. I get some version of "I can't believe I'm doing this!" or "This is so fun!" quite often with my videos, so that gets kinda repetitive. And sometimes I get something completely ridiculous, which makes me laugh. One of my favorites was a video of a woman happily playing in the pool in a ballgown and Veo made her say "I have no idea why I'm dressed like this" while smiling and giggling. I didn't expect that, and it made me laugh really hard.

Sometimes I wish the videos would have the subjects questioning why their creator put them in that situation or question my sanity or whatever, but I can't bring myself to prompt for that. I just don't think it'd be as funny if I know it's coming.

Wetmaxiskirts

Midlands, UK

Videos Gallery Posts

Kabe22 said: I don't give them direction on what to say, but they sometimes say something cute that makes me like the video more. I get some version of "I can't believe I'm doing this!" or "This is so fun!" quite often with my videos, so that gets kinda repetitive. And sometimes I get something completely ridiculous, which makes me laugh. One of my favorites was a video of a woman happily playing in the pool in a ballgown and Veo made her say "I have no idea why I'm dressed like this" while smiling and giggling. I didn't expect that, and it made me laugh really hard.

It is fun to let Veo come up with the actual words having given it some steers. I don't want to give it too much free reign for fear of wasting too many credits! There are enough things that can (and do) go wrong as it is!

Kabe22 said: Sometimes I wish the videos would have the subjects questioning why their creator put them in that situation or question my sanity or whatever, but I can't bring myself to prompt for that. I just don't think it'd be as funny if I know it's coming.

If it started doing that, I'm not sure I could trust it to have it's "heart" in generating good quality, realistic WAM images!

messg ✓

Videos Gallery Posts

VEO3 is 8 second limited although I believe 3.1 can extend.
Sora2 is 10second, 15 second and upto 25seconds using the storyboard. As you're generating SFW videos, I'd recommend playing around with Sora2.

Regardless of whether it's VEO3 or Sora2, I'd highly recommend trying JSON or YAML prompt formatting which you'll get way greater control over the output.

thereald ✓

Videos Gallery Posts

messg said: I'd highly recommend trying JSON or YAML prompt formatting which you'll get way greater control over the output.

Intrigued by this, I'll give it a google - does it jailbreak anything or it is just a more precise way of prompting?

I've made just a small handful of videos with Sora for the gameshow style stuff I make, and I didn't like how it created either a British male narrator with a local radio DJ type voice, or an American woman with a somewhat shrill and chirpy voice - I felt like I had no control over the voiceover, and it added little to the video. Sounds were also slightly out of sync with the video and that was off-putting.

I'd potentially be interested in something that could take a longer video and apply a realistic soundtrack to it following a script, and I haven't found anything good for this yet. But someone mouthing "oh fuck!" is just as good as someone saying "oh fuck!". It's mainly about the visuals for me.

messg ✓

Videos Gallery Posts

thereald said:

messg said: I'd highly recommend trying JSON or YAML prompt formatting which you'll get way greater control over the output.

Intrigued by this, I'll give it a google - does it jailbreak anything or it is just a more precise way of prompting?

I've made just a small handful of videos with Sora for the gameshow style stuff I make, and I didn't like how it created either a British male narrator with a local radio DJ type voice, or an American woman with a somewhat shrill and chirpy voice - I felt like I had no control over the voiceover, and it added little to the video. Sounds were also slightly out of sync with the video and that was off-putting.

I'd potentially be interested in something that could take a longer video and apply a realistic soundtrack to it following a script, and I haven't found anything good for this yet. But someone mouthing "oh fuck!" is just as good as someone saying "oh fuck!". It's mainly about the visuals for me.

No, they're structured formatted syntax that most modern Video and imaging models have an understanding of and can follow very well. They will remove some randomness because of the specificity but can be amended to introduce that back. Sora2 and VEO3 can follow these prompts very well but Nanobanana (original and pro) and QWEN can also follow these very well too. Wan2.2 I've not tested enough but YAML uses less tokens so may be better.

A sora2 YAML prompt:

title: Charity Gunge Vlog The School Challenge (Fully Covered Edition)
duration: 15s
aspect_ratio: 9:16
style:
camera: handheld phone vlog, front-facing
lighting: bright, even, warm stage lighting reflecting softly on the glass
mood: candid, expressive, playful
audio: faint laughter from off-camera, gentle dripping, light echo inside the tank
tone: genuine, humorous, and relaxed a natural moment of colourful charity fun

shots:
- time: 0s4s
description: >
The clip opens in a tight handheld frame filmed on a phone from inside a **clear glass gunge tank**.
A woman with **loose wavy brunette hair** sits on a small stool, laughing softly beneath a complete covering of **thick yellow custard, bright green slime, and soft pink foam**.
Her **school uniform white blouse, striped tie, and pleated navy skirt with sheer black tights is completely hidden** under the heavy, dripping mixture.
Custard slides from her hair in slow, glossy streams, while green slime glides across her shoulders and face.
She blinks through the mess, smiling faintly before letting out a quiet laugh.
After a pause, she says, amused but resigned, "Well there's no getting out of this one."

- time: 4s9s
description: >
The camera shifts slightly as she moves, the thick mixture **oozing down her hair and sleeves in glossy waves**.
Her features are almost fully obscured by layers of custard and slime, the colours swirling together as they continue to drip.
She lifts one hand to wipe her eyes, leaving bright yellow streaks across her cheeks.
Another blob of foam slides down from her fringe and lands squarely in her lap.
She giggles, shaking her head with disbelief.
"It's so heavy," she says lightly, pausing to catch her breath. "I can barely move."

- time: 9s15s
description: >
A close-up captures her completely drenched -- custard and slime still sliding from her head in slow, thick ribbons.
The camera lingers on her expression as she exhales a small, breathy laugh, her eyelashes clumped with custard and streaks of pink foam tracing down her neck.
She looks at the lens, smiling through the mess.
"All for charity," she says softly, voice calm and good-natured.
She gives a cheerful thumbs-up, then wipes her eyes again, smearing more foam across her forehead as another drip falls from her hair.
The clip fades out on her quiet laughter and the steady sound of custard dripping to the floor.

style_and_tone_summary: >
Realistic handheld vlog filmed inside a clear gunge tank under bright, warm lighting.
The pacing is unhurried, with expressive close-ups and natural pauses between lines.
The woman's entire outfit and hair are coated in thick custard, foam, and slime, giving a vivid sense of texture and weight.
Her dialogue is brief, friendly, and genuine, with laughter breaking through between moments of disbelief.
The overall tone is colourful, light-hearted, and humorous -- a cheerful, self-aware charity moment captured with warmth and playful realism.

thereald ✓

Videos Gallery Posts

messg said:

thereald said:

messg said: I'd highly recommend trying JSON or YAML prompt formatting which you'll get way greater control over the output.

Intrigued by this, I'll give it a google - does it jailbreak anything or it is just a more precise way of prompting?

I've made just a small handful of videos with Sora for the gameshow style stuff I make, and I didn't like how it created either a British male narrator with a local radio DJ type voice, or an American woman with a somewhat shrill and chirpy voice - I felt like I had no control over the voiceover, and it added little to the video. Sounds were also slightly out of sync with the video and that was off-putting.

I'd potentially be interested in something that could take a longer video and apply a realistic soundtrack to it following a script, and I haven't found anything good for this yet. But someone mouthing "oh fuck!" is just as good as someone saying "oh fuck!". It's mainly about the visuals for me.

No, they're structured formatted syntax that most modern Video and imaging models have an understanding of and can follow very well. They will remove some randomness because of the specificity but can be amended to introduce that back. Sora2 and VEO3 can follow these prompts very well but Nanobanana (original and pro) and QWEN can also follow these very well too. Wan2.2 I've not tested enough but YAML uses less tokens so may be better.

A sora2 YAML prompt:

title: Charity Gunge Vlog The School Challenge (Fully Covered Edition)
duration: 15s
aspect_ratio: 9:16
style:
camera: handheld phone vlog, front-facing
lighting: bright, even, warm stage lighting reflecting softly on the glass
mood: candid, expressive, playful
audio: faint laughter from off-camera, gentle dripping, light echo inside the tank
tone: genuine, humorous, and relaxed a natural moment of colourful charity fun

shots:
- time: 0s4s
description: >
The clip opens in a tight handheld frame filmed on a phone from inside a **clear glass gunge tank**.
A woman with **loose wavy brunette hair** sits on a small stool, laughing softly beneath a complete covering of **thick yellow custard, bright green slime, and soft pink foam**.
Her **school uniform white blouse, striped tie, and pleated navy skirt with sheer black tights is completely hidden** under the heavy, dripping mixture.
Custard slides from her hair in slow, glossy streams, while green slime glides across her shoulders and face.
She blinks through the mess, smiling faintly before letting out a quiet laugh.
After a pause, she says, amused but resigned, "Well there's no getting out of this one."

- time: 4s9s
description: >
The camera shifts slightly as she moves, the thick mixture **oozing down her hair and sleeves in glossy waves**.
Her features are almost fully obscured by layers of custard and slime, the colours swirling together as they continue to drip.
She lifts one hand to wipe her eyes, leaving bright yellow streaks across her cheeks.
Another blob of foam slides down from her fringe and lands squarely in her lap.
She giggles, shaking her head with disbelief.
"It's so heavy," she says lightly, pausing to catch her breath. "I can barely move."

- time: 9s15s
description: >
A close-up captures her completely drenched -- custard and slime still sliding from her head in slow, thick ribbons.
The camera lingers on her expression as she exhales a small, breathy laugh, her eyelashes clumped with custard and streaks of pink foam tracing down her neck.
She looks at the lens, smiling through the mess.
"All for charity," she says softly, voice calm and good-natured.
She gives a cheerful thumbs-up, then wipes her eyes again, smearing more foam across her forehead as another drip falls from her hair.
The clip fades out on her quiet laughter and the steady sound of custard dripping to the floor.

style_and_tone_summary: >
Realistic handheld vlog filmed inside a clear gunge tank under bright, warm lighting.
The pacing is unhurried, with expressive close-ups and natural pauses between lines.
The woman's entire outfit and hair are coated in thick custard, foam, and slime, giving a vivid sense of texture and weight.
Her dialogue is brief, friendly, and genuine, with laughter breaking through between moments of disbelief.
The overall tone is colourful, light-hearted, and humorous -- a cheerful, self-aware charity moment captured with warmth and playful realism.

Cool, get the idea. And in JSON...

{
"title": "Charity Gunge Vlog The School Challenge (Fully Covered Edition)"
"duration": "15s"
"aspect ratio": "9:16"
"style": {
"camera": [ "handheld phone vlog", "front-facing"]
"lighting": ["bright", "even", "warm stage lighting reflecting softly on the glass"]

etc...

Will try it out.

GoodPudding

Alicante, Spain

Videos Posts

For me, adding voices to an AI video has been a game changer and renewed my focus on making them. I've experimented making AI videos for a few of my fetishes and had reasonable success over the last 18 months or so, with one main exception. Having realitively authentic speech in various languages in my scenes has added so much to them; I've featured English -British and American (both younger and older) - mostly but also French, Spanish, German and some Russian and Japanese, and generally thevoices have come out quite authentic.

Having said that, specifically regarding messy scenes, voices aren't as critical to me as the right sound effects,but will help where the scene requires.

messg said: Regardless of whether it's VEO3 or Sora2, I'd highly recommend trying JSON or YAML prompt formatting which you'll get way greater control over the output.

Reading this thread will getting me looking more at YAML and/or JSON for my prompting. I'm still search for sufficient realism in my pie hits, throwing and pushed, but it may also iron out some smaller details that have irked me in scenes or ideas.

I'm happy to share some of my non-messy AI scenes, ideas, prompts or prompting experiences privately, to remain on topic here. (Specifically, my work mostly involves clothing fetishes for fur and/or satin.)

uue404

Videos Gallery Posts Blogs

thereald said:
I've made just a small handful of videos with Sora for the gameshow style stuff I make, and I didn't like how it created either a British male narrator with a local radio DJ type voice, or an American woman with a somewhat shrill and chirpy voice - I felt like I had no control over the voiceover, and it added little to the video. Sounds were also slightly out of sync with the video and that was off-putting.

Veo 3.1 seems to understand me if I tell it to use a particular accent. It also seems to have a grasp of dialect; in the attached, I asked for a Liverpool accent, and the words it came up to go with it were "Will you stop laughing and help me? I'm proper stuck and me cozzie's absolutely ruined now".

At least with the current state of the art, the main issue with dialogue seems to be that it doesn't always match the action. Typically, something like a character saying "I can't move!" while moving with total freedom, or "I've tripped!" two seconds before they actually trip.

animated

synthetic

3

the real GoOfBaLL

Posts

Wetmaxiskirts said: As far as I'm aware, individual AI videos are currently limited to 8 seconds' duration (unless you're an extremely clever person who can seamlessly splice multiple videos together!), which gives the video creator very little time to convey the story they're trying to tell.

I was using a trial version of VEO on GoogleAI studio and it let me generate videos by uploading two images; one was the 'start frame' and the other was the 'end frame'. The video would just fill in the blanks. It would let you daisy chain video together that was relatively seamless by making the end frame the start frame of the next clip.

Unfortunately I have not used this in a while and don't know if it is still available.