How to train a Characters Model to generate new characters | Layer

Generate new characters - this can be done in two ways: you can be searching of a new character concept and want AI to have creative freedom or you may want to generate new character assets in your exact specific format, poses, outfits etc.

Start with Your Assets

To create a diverse character training , your training images should include a variety of shots and expressions:

	Headshots and closeups
	Full body poses
	Varying facial expressions
	Different angles

This gives the model a broader understanding of different characters visual identity - how they look, move, and emote. The more variations your data set includes the more flexible your custom model will be in creating brand new concepts.

This kind of input ensures your characters stays visually consistent, even as you change outfits, expressions, or settings.

If you're looking for more details on the types of images you should be using to train your model, check out this video. Remember following correct formatting is critical in order to get the best results with AI.

Why Captions Matter

(and Why Auto-Captions Aren’t Enough)

When you upload images for model training, Layer will automatically caption them, but for character trainings, this isn’t ideal. The auto-captioning might miss key details or fail to structure descriptions consistently - which makes it harder for the AI to learn who your character really is.

That’s where LLMs like OpenAI’s ChatGPT or Google’s Gemini come in. They can help you write detailed, structured descriptions and keep things consistent. (They are also MUCH faster than doing it manually)

Use an LLM to Write Descriptions

You’ll want to kick off your LLM session with a clear prompt that defines the task, the format, the tone, and the length.

Here’s the full starting prompt you can use:

LLM Starting Prompt (Captioning for Set of Characters)

I’m working on training a LoRA for a set of characters. I will send you images and I need you to describe them to me. I need detailed descriptions that follow a consistent format for each image.

The descriptions should follow this format: [character’s gender], [physical appearance], [pose], [expression], [shot type], and the [overall art style]. It’s important to maintain the same format and language across all images for this characters. The text should be under 1024 characters, but aim for around 900.

Here are two examples of what a good description looks like:

Example 1:

“a slender woman with pale skin and long, flowing dark hair. She has striking blue eyes and a sharp jawline, wearing a sleek, black bodysuit with silver accents. She stands confidently with one hand on her hip and the other holding a glowing orb of light. Her expression is calm yet determined, with a slight smirk. Full body shot. Semi-realistic art style with detailed shading, emphasizing sharp contrasts between light and dark tones.”

Example 2:

“Fred is a stout man with fair skin, a round, slightly jowly face, and reddish-brown hair styled with a side part that often appears neatly combed. He consistently wears distinctive pink glasses, which frame his friendly, green eyes. His build is generally stocky, and he has a cheerful, expressive demeanor. Wearing a straw hat, Michael strikes a playful, crouched pose with a joyful smile. He has brown suspenders with a plain white shirt and light blue pants. His energetic gesture makes him look like he is dancing! The art style is a 3D animated style, characterized by smooth, almost plasticky surfaces, bright and saturated colors, and soft, diffused lighting. The overall aesthetic is cartoonish and friendly, with exaggerated proportions and expressive facial features that contribute to a lighthearted, family-friendly appeal. There’s a notable absence of harsh lines or shadows, enhancing the gentle and approachable nature of the visuals.”

Comparison: Auto-Caption vs LLM-Enhanced

Let’s compare what Layer’s auto-caption might give you vs. what you can get with a few minutes of help from an LLM.

Auto-caption (Layer):

“Cheerful man with blond hair, wearing a bright pink suit, yellow accents, and glasses, holding a blue box with yellow items; standing pose with a wide smile, 3D cartoon style, full body shot.”

LLM-enhanced caption:

“a stout man with fair skin, a round face, and neatly combed reddish-brown hair. He wears distinctive pink glasses, framing his friendly, green eyes. He’s dressed in a bright pink suit with a yellow collar and a pink tie, over a blue vest. He sports yellow shoes with pink trim. Standing upright, Fred gestures playfully with his fingers while smiling cheerfully. Full body shot. The art style is 3D animated, with smooth, plasticky surfaces, bright and saturated colors, and soft lighting. The overall aesthetic is cartoonish and friendly, with exaggerated proportions and expressive features.”

That extra attention to detail — and structure — helps model to capture the style much more accurately.

Yes, It’s a Bit Tedious (But Worth It)

We know this process can be slow — manually writing or refining captions, even with AI help, takes time. We’re actively working on product updates to support single character training better (as of April 2025). But right now, this is the best method to get great results.

Also keep in mind: even LLMs make mistakes. Sometimes they won’t follow your formatting exactly. You may need to lightly edit or re-prompt to stay consistent.

Example Prompts

Once you’ve finished your captions you'll the auto-generated example prompts if you are not happy with them because they are not representative enough you can reuse the same LLM session to generate prompts. Here’s what to say:

“I’m at the final stage of a LoRA training, and I need to generate 5 new ideas for [insert asset type here—e.g., characters, in-game items, backgrounds, etc.]. These new ideas should be similar to the assets already in the training set, but they should introduce some variation, like different characters. The descriptions must follow the same format and consistency we’ve used throughout the project, with each description around 900 characters, not exceeding 1024.”

Once the LLM gives you 5 new ideas, copy and paste your favorite 3–5 as your example prompts, and you’re good to go.

While It’s Training: Set Up Prompt Prefix + Suffix

While your style is training, take a few minutes to set up your Prompt Prefix + Suffix.

This helps guide how the model behaves when you generate assets, keeping prompts permanent when forging — for example:

Always inserting what you like to keep consistent about the character at the start such as "full body shot of"
Always appending a style description like “3D cartoon style with soft lighting”

Prefixes and suffixes are powerful ways to lock in tone, naming, and consistency once the model is ready to use.

We’ll explore prefix/suffix best practices more in a follow-up article.

How do I create a Custom Model in Layer?

How to use prompt Prefix + Suffix in Layer

How to train a model for in-game items in Layer

Using ChatGPT or Gemini to caption images for Model Training

How to Train a Custom Single Character Model in Layer