Lipsync

Transform images and ideas into stunning video content with perfect lipsync.

What is Lipsync?

AI Lip Sync technology allows you to transform still images into dynamic, talking avatars by syncing facial movements to audio or text. With just a photo, text-to-speech, or voiceover, you can create professional-grade videos with lip-syncing avatars in minutes.

How to use Lipsync?

1

Select Lipsync from the side navigation.

2

Select Your Lip Sync Model Choose the lip sync model that best fits your project. Different models offer settings like duration, resolution, and aspect ratio, with unique results.

3

Upload Your Image Upload an image of your avatar or select one from your existing creations. Ensure the image is clear for the best results.

4

Generate Audio or Write Text Some models allow you to generate speech from a variety of voices, while others use text-to-speech with a default voice. Write the script or upload an audio file.

5

Describe the Scene Enter the prompt to describe the scene or action your avatar will perform, such as "avatar speaking in a classroom."

6

Click "Generate" Hit Generate to create your talking avatar. The AI will sync the facial movements and lip sync to the provided audio or text.

Support

Lipsync is supported by the following models:

Model

Aspect Ratio

Duration

Resolution

Kling 2.6 Pro

Same as image provided

5s, 10s

1080p

Google Veo 3.1 Fast

16:9, 9:16

8s

720p, 1080p

Google Veo 3.1

16:9, 9:16

8s

720p, 1080p

Wan 2.5 Speak

16:9, 9:16

5s, 10s

720p, 480p, 1080p

Kling Avatars 2.0 Pro

Same as image provided

5s, 10s

1080p

Infini Talk

Same as image provided

6s

720p, 1080p

OmniHuman - Bytedance

Same as image provided

5s, 10s

720p, 1080p

Last updated