Skip to main content

How to create lipsync videos on Layer

Using AI video and audio models on Layer to create fully lipsynced video assets

Updated this week

Layer now supports Lip-sync video generation - a fast and intuitive way to animate any character image using voice audio. By pairing a still image with a spoken audio file, you can generate a short video where your character speaks in sync with the dialogue. And with Layer’s built-in voice generation tools and growing library of character voices, it’s easier than ever to bring everything together - all on one platform.

Whether you’re building game cutscenes, pitching a character idea, or making stylized content for social, Lip-sync gives you an effortless way to generate expressive, animated dialogue - no animation skills needed.


What You Need

To create a Lip-sync video, you’ll need two files:

  • An audio file with speech
    You can upload your own, or generate one in Layer using the Audio Generation tab. Select Voice (not Sound Effects) to create natural-sounding speech using one of our AI voice models.

  • A character image
    You can forge a character in Layer or upload your own image. For best results, we recommend using a full-body or medium shot - close-up images can lead to less stable animation.


How to Use Lip-sync

  1. Select the Lipsync model
    In the Video model selection dropdown, choose AI Avatar or OmniHuman. These models are designed specifically for Lip-sync.

  2. Set your image as the first frame
    Drag your character image into the first frame drop zone in the Forge panel.

  3. Add your voice audio
    Drag in your audio file. You’ll see a waveform appear in the timeline.
    (Optional: You can also enter a prompt, but in most cases, it has little impact on Lip-sync videos - results are primarily driven by the image and audio. Prompt is not required when using OmniHuman.)

  4. Click Forge
    Layer will generate a short video with your character animated to speak the provided audio.


Tips for Best Results

  • Use a full-body or medium shot for your character image. Extreme close-ups may produce less consistent animation.

  • Stick to clean voice audio without overlapping sound effects.

  • Adjust the duration of your video to match the full length of your audio clip.

  • Adding a prompt is optional - Lip-sync generation mainly follows the image and voice.


Use Cases

  • Game character dialogue previews

  • AI-generated cutscenes or trailers

  • Character storytelling with voiceovers

  • Marketing or social videos featuring animated characters

  • Talking avatars for presentations or pitch decks


Lip-sync makes it simple to animate your characters and give them a voice - all without leaving the Layer platform. Try it now with any forged image and voice line to bring your creations to life in just a few clicks.

Did this answer your question?