How Do I Write Good Captions? | Layer AI

One of the biggest culprits of bad forging results is bad quality captions. Captioning your source images is just as important, if not more important than the images themselves -- captions tell Layer what to expect when you type in a prompt.

When captioning an image, you want to be as descriptive as possible, as if you are describing the image to a blind person.

Use commas to separate descriptions. If some of the characters or objects have proper names, you can also include them so that later during generation you can refer to it by name and the model will know what you are looking to create.

Caption The Image as If You Are Describing it to a blind person.

The descriptions of your assets need to be detailed. Write the caption as if you were describing the image to a blind person. Include things like item, gender, age, colors, outfits, unique details, etc...

You Can Use Auto-Caption, But Always Review!

Auto-caption is a great assistant to help you write image captions. But do not rely on it fully. Use it as a kick-off point instead.

Here are some examples:

Axe master, big muscular man wearing a gray executioner hood with diagonal belt straps coming down from it, white eyes with red face paint underneath, shirtless, pale skin, well-defined muscles, small scars on chest, strong nose, sharp jawline, gray cloth covering chiseled six pack abs, black leather skirt and pants with golden buckle and silver metal center, gray fingerless gloves with black leather bracers adorned with golden metal accents, black leather boots with belts around the ankle, standing menacingly

UI Dialog box made of wood, cartoon style, rich brown wooden frame, lighter tan wood planks, green grassy frame on the top and the bottom, adorned with fruit and cacti on the top, small banana showing from behind left side of the wooden frame

Isometric background of a modern victorian kitchen, blue painted cabinets and counters with tan wood countertops, center kitsch island, fume vent hood over two door oven with pots on the range, various pots and plants on counter tops, cozy sunlight coming in from windows, hanging cooking utensils on walls, wooden tables with shelves filled with supplies and potted plants on top, homey feeling, white floor

Captions and their weights

While AI generates images it processes the words together by an encoder and then processed with image patches via an “attention” mechanism. From that respect, all words in the prompt should be weighted equally, unless we specify a weight change for a particular part of the prompt (with +/- sign or numerical values).

However, in reality, some studies have found words latter in the prompt effect less the image generation result. That's why prioritizing the most important details about the images would be critical in achieving strong captions.

How Do I Create a Custom Style?