Lipsync
Transform images and ideas into stunning video content with perfect lipsync.
What is Lipsync?
AI Lip Sync technology allows you to transform still images into dynamic, talking avatars by syncing facial movements to audio or text. With just a photo, text-to-speech, or voiceover, you can create professional-grade videos with lip-syncing avatars in minutes.

How to use Lipsync?
Select Lipsync from the side navigation.
Select Your Lip Sync Model Choose the lip sync model that best fits your project. Different models offer settings like duration, resolution, and aspect ratio, with unique results.

Upload Your Image Upload an image of your avatar or select one from your existing creations. Ensure the image is clear for the best results.
Generate Audio or Write Text Some models allow you to generate speech from a variety of voices, while others use text-to-speech with a default voice. Write the script or upload an audio file.

Describe the Scene Enter the prompt to describe the scene or action your avatar will perform, such as "avatar speaking in a classroom."
Click "Generate" Hit Generate to create your talking avatar. The AI will sync the facial movements and lip sync to the provided audio or text.
Support
Lipsync is supported by the following models:
Model
Aspect Ratio
Duration
Resolution
Kling 2.6 Pro
Same as image provided
5s, 10s
1080p
Google Veo 3.1 Fast
16:9, 9:16
8s
720p, 1080p
Google Veo 3.1
16:9, 9:16
8s
720p, 1080p
Wan 2.5 Speak
16:9, 9:16
5s, 10s
720p, 480p, 1080p
Kling Avatars 2.0 Pro
Same as image provided
5s, 10s
1080p
Infini Talk
Same as image provided
6s
720p, 1080p
OmniHuman - Bytedance
Same as image provided
5s, 10s
720p, 1080p
Last updated

