文/图生视频

添码座原创大约 11 分钟

相对于文生文、文生图这两类，视频领域的AI类应用就少多了。

目前在文生视频领域，领跑的是Sora和Stable Video，以及曾经默默无闻的Runway，不知道为什么Midjourney反而没声响了。

不过，虽然Sora动静闹的挺大，但是至今未下一滴雨——到现在都只有官方的样例，试用的入口迟迟不出现——只能看，不能摸！

被媒体炒得火热的Vidu（清华版Sora），其实是清华联合生数科技搞出来的。

截至本文发布为止，Vidu仍旧无法使用，难不成非得等着Sora先出来？

Stable Video

这里拿之前用AI拓词的一个疯狂内卷的中国IT工程师来让Stable Video创建视频。

AI拓词的内容

best quality,4k,8k,highres,masterpiece::1.5,ultra-detailed,realistic,photorealistic,photo-realistic:1.37,portrait,concept artist,China,IT engineer,crazy,inside,Chinese,computer,office,technology,chaotic,work,overwork,exhausted,coffee,keyboard,mouse,screen,code,program,deadline,pressure,stress,competition,ambition,hardworking,innovation,modern,urban,city,night,cityscape,neon lights,glow,reflection,window,desk,chair,tired,eye bags,coffee stain,fast-paced,energetic,high-tech,rapid development,innovative,creative,mind-blowing,competitive spirit,innovation,efficiency,modern lifestyle,asian,East Asian,traditional,modern clash,hectic pace,blurry motion,late night,shadow,artificial light,computer-generated,scene,man,2man,working,typing,scrolling,monitor,multitasking,sleepy,caffeine,technology overload,deadline-driven,tech-savvy,ambitious,competitive,hardworking,overworked,overwhelmed,stressful,fast-paced,highly competitive,modern office,night shift,modern life,fast-paced lifestyle,workaholic,innovator,IT industry,Chinese culture,modern China,software development,innovative spirit,competition-driven,crazy competition,technological advancement,urban environment,fast-growing,modern workplace,tech industry,nighttime hustle,deadline pressure,overwhelming workload,innovative mindset,asian work culture,rapid technological change

用Stable Video创建视频的过程如下所示：

在以文生视频的地方，输入之前的AI拓词内容。

如果想画面更灵动，更有交互感，那么需要仔细设置镜头、机位、运行轨迹、放大缩小等。

正在创建视频。

创建完成后对结果进行预览。

虽然生成了视频，但是可以看出来，Stable Video依然有很大的瑕疵。

从预览可以看到人物的面部明显有扭曲。
所有的视频都只有4秒钟，远远不能满足正常视频的长度要求。

我用Stable Video生成的视频在这里：点击观看。

这是我用Stable Diffusion创作的图生视频（忘了截图😢)。

Runway

据说Runway是Stable Video背后的技术公司。

不过其视频生成模型从2023年3月直到现在仍然还是Gen-2，都有近一年的时间没有升级过了。

既然Stable Video用的是它的技术，那么Stable Diffusion生成视频的问题，它肯定也会有。

选择Gen-2大模型。

输入提示词。

果然是师出同门，连时长限制都一样。

而且同样存在画面扭曲、错乱问题。

即使是时间延长，看起来也好像没改一样。

我用Runway生成的视频在这里：点击观看。

ComfyUI

因为目前Sora还没放开，Stable Video和Runway又太拉胯，所以ComfyUI + SVD就成了另一个不错的替代选择。

以ComfyUI Web为例，进入时先清空所有的工作流，准备安装自定义节点和模型。

先清空所有的工作流准备安装自定义节点和模型

安装自定义节点安装SVD大模型

选择自定义节点

之后再下载ComfyUI官方提供的工作流文件（JSON格式），或者是其他工作流文件来完成图生视频流程。

剩下的事情就是按照ComfyUI Academy或ComfyUI Guide To Making Custom Nodes教程一步步地生成视频，这里就不再赘述了。

有一点需要注意的是：只有颜色相同的端点（或同类型的端点）之间才能连线。

不过，如果Stable Diffusion或者ComfyUI Web玩得够溜得话，是可以创造出下面这些非常惊艳的作品的（当然这对计算机的配置有要求）。

ComfyUI Web生成的视频

Pika

Pika也是一个AI视频生成工具，它可以生成3D动画、动漫、卡通和电影风格的视频，并且有强大的视频编辑功能，如画布延展、局部修改、视频时长拓展等。

完整的提示词

下面的部分因为超长被Pika截断了

, light blue, deep brown and other tones, it presents a fresh, natural, and lightweight feeling. Filled with a free, open-minded, and carefree atmosphere, it makes people feel comfortable, joyful, and relaxed. Using a close range side view, the beautiful lines and muscular lines of the goddess and horse are highlighted, enhancing the dynamic and aesthetic feel of the entire scene. By combining elements of realism and romanticism, a beautiful, elegant, and dreamy artistic style is created, allowing people to feel the nobility and elegance of goddesses and horses. --v 5.2 --s 100 --ar 1:1 --c 0 --q 1

我用Pika生成的视频在这里：点击观看。

从效果来看，貌似用的也是Runway的技术😢。

顺便说一句，上面这段提示词，我用Midjourney画，其中的两张是这样的。

海边的女神

StreamingT2V

根据Github上的介绍， StreamingT2V是一种先进的自回归技术，可以创建具有丰富运动动态且没有任何停滞的长视频。视频可以达到1200帧、时长2分钟，并且可以延长更长的持续时间。

这是它生成的视频。

StreamingT2V一共有三种模型（或者说模式）可以选择。

ModelscopeT2V：文生图。

帧数	更快预览的推理时间 (256x256)	最终结果的推理时间 (720x720)
24帧	40秒	165秒
56帧	75秒	360秒
80帧	110秒	525秒
240帧	340秒	1610秒
600帧	860秒	5128秒
1200帧	1710秒	10225秒

AnimateDiff：文生图。

帧数	更快预览的推理时间 (256x256)	最终结果的推理时间 (720x720)
24帧	50秒	180秒
56帧	85秒	370秒
80帧	120秒	535秒
240帧	350秒	1620秒
600帧	870秒	5138秒
1200帧	1720秒	10235秒

SVD：图生图。

帧数	更快预览的推理时间 (256x256)	最终结果的推理时间 (720x720)
24帧	80秒	210秒
56帧	115秒	400秒
80帧	150秒	565秒
240帧	380秒	1650秒
600帧	900秒	5168秒
1200帧	1750秒	10265秒

现在，用之前给Pika的提示词，再用StreamingT2V来生成一次视频。

提示词

Unreal engine, 3D render, At dusk on the beach, the sea waves are surging, with pure white and soft sand on the beach and orange red clouds in the sky. The goddess wears a red dress and rides on a tall and muscular white horse, holding a reins in her hand, smiling and looking into the distance. The afterglow of the sunset shines on the goddess and horses, casting soft and warm light and shadow, making the entire scene more beautiful and romantic. With white as the main tone, combined with light yellow, light blue, deep brown and other tones, it presents a fresh, natural, and lightweight feeling. Filled with a free, open-minded, and carefree atmosphere, it makes people feel comfortable, joyful, and relaxed. Using a close range side view, the beautiful lines and muscular lines of the goddess and horse are highlighted, enhancing the dynamic and aesthetic feel of the entire scene. By combining elements of realism and romanticism, a beautiful, elegant, and dreamy artistic style is created, allowing people to feel the nobility and elegance of goddesses and horses.

可以在这里试试StreamingT2V是不是真像它说的那样强大：StreamingT2V在线试玩