thumbnail of ochya0_0319.jpg
thumbnail of ochya0_0319.jpg
ochya0_0319 jpg
(147.94 KB, 919x1080)
 >>/67010/
It's only using speech to video via Wan 2.2 model (Chinese open source), which is pretty good for what it is but far far far behind Act 2 video2video performance, that is what I'll use ultimately. 

All the voices are actually just very advanced text to speech + dozens of generations and some editing to get the acting right. I didn't even do multiple generations for the video because it takes 30+ minutes every 10 seconds of video.

I did all of the writing, speech and video today.