Preparing the assets for style training
Do the training images have the same size?
💡 During the style training process all input images are resized to 512x512 which is the default size for AI learning. If the images are 1024x1024 they will all get resized to 512x512 therefore it's ideal to upload images in this size by default. If you have bigger images that's also OK for AI learning it's just would be important to at least have ALL images uploaded in the same size so that they can be resized the same amount at least to make sure proportions of the assets.
Are assets centered the same way across all the input images?
💡 Let's say you're training a set of furnitures which have the same proportions if assets upload are not centered the same way for ex: some are zoomed in and some are zoomed out AI might have hard time generating outputs in the fixed proportions. Here is a good example of how to format the assets paying attention to assets positions:
Do the training images have the same aspect ratio?
💡 As mentioned in the first point, AI resizes the images to 512x512 aspect ratio 1:1. If any of these aspect ratios is not identical to the others, Layer completes the images to square before resizing (and this resizing is not content aware), this might mean loss of certain important pixels or addition of new pixels which might cause distortion in certain areas. and this resizing is not content-aware) therefore it’s important to make sure assets going into training have the same aspect ratio. This artifact is particularly obvious when the image contains some objects we humans are very familiar with, e.g., faces, human bodies. For ex: if you are training backgrounds as long as you upload images as all portraits they will be completed to squares the same amount therefore training quality may not be affected significant enough to bother you.
Are the dimensions of the training samples close to Layer’s suggested resolution (e.g., 512 pixels in width and 512 pixels in height for square)?
💡 default learning happens with 512x512 so for top results upload images in this resolution, second best thing to do is for all images to at least have the same resolution.
Does the training set provide clear enough samples for those details you’d like to see in the forged image?
💡When low quality images such as blurry or low resolution are uploaded as training images, AI will learn those artifacts as a part of the target style. Accordingly this will then generate suboptimal results.
Have you tried to upscale? When an unsatisfactory detail of the forged image is regional (e.g., face of a character) instead of global, try upscale and re-forge the selected region (with the preferred style).
💡 Upscaling helps add new pixels to the AI generated image which might mean distortions or blurriness in the images get fixed.
Do the training images contain sufficient diversity/variation?
💡AI learns more effectively from variation and diversity. For example if you are training a set of characters, you need to have enough variety of characters in the training otherwise AI will think the characters features are constant. Here is an example showing how characters with different ages, gender, facial expressions are used for a rich training dataset.
Have the captions captured all changing attributes of the subject?
💡If one attribute/accessory of a character is always present in the training set, it’s easily recognized as an inseparable part of the target style. For example, if only smiling faces of a character are provided, smiling could be learned as your character’s default facial expression; sad faces later generated for your character can be different from what you wanted. Similarly, if the training set contains multiple similar/identical images, that dominant mode can be learned as a feature of your style while the remaining training samples’ impact gets suppressed.
Number of images
Did you have more images for training but haven't uploaded them all?
💡We recommend uploading 25 images if you have them. More images allow a richer learning experience for AI and uploading images respecting upper limits is more ideal.
Spelling and wording
Do the captions capture changing details between training samples?
Have you removed all typos in the image captions?
Are alphabets properly capitalized when needed? For example if the character is called Max the prompts related to that should be “Max” with M capitalized vs “max” as it would confuse the ai as it might also mean maximum.
Can ambiguous words in the captions be avoided/elaborated? E.g., trunk -> tree stem; max -> Maxwell
Have you specified the color of background or foreground in the training?
Place important attributes close to the subject in your prompt, making sure they are well captured as relevant keywords by the neural network.
💡the rationale is that prompt words used to describe the subject will all be translated into some 'codes' in a magical high dimensional space for NN to use to guide the image generation process. In this translation process, words close together are more likely associated. So if the subject is mentioned in the beginning of the prompt, which usually is the case, important attributes should also be mentioned earlier than later.
During the forge
Have you given a second try if the 1st one doesn’t look pretty?
Have you upscaled the outputs to see if it helps fix distortion?
💡 AI painting inherently carries randomness. We encourage you to take advantage of your unlimited plans on Layer.ai to generate many different variations and select the best one!
Lay out a scene with a few kinds of elements that have details roughly at the same scales.
[O] A kitchen with a dining table and a few chairs.
[O] A restaurant sitting at the corner of a busy street.
[X] A restaurant sitting at the corner of a busy street on a sunny day. There is a dining table and a few chairs in its kitchen, where the table has 4 legs, and each leg has a flower pattern near the bottom.
If you want to dramatically change a specific portion in a busy scene, have you tried canvas?
Why 💡 AI can draw many items at the same time in one single image; It can also draw many details of one specific item; it’s just challenging to do both at once! To create a complex image with great details, one can try [top down]: generate the first image with rough layout, enlarge it, and edit each component one by one; or [bottom up]: generate asset one by one in separate images, each of them with great detail, and compose them in a big canvas. Note, some information is simply hard to describe in language. Consider using a reference image to guide your forge. Please see below.
Using reference image/sketch/depth/pose
Did you specify number of objects in the reference image/sketch?
Did you specify the positions of objects precisely in the reference image/sketch
Did you specify the pose of the character on the reference image?
Coloring a scene that is already modeled in Blender? -> Reference depth
Is your sketch black and white? If you need to use Sketch reference, please upload black & white sketches. We will anyway convert the uploaded image for you before sending it to neural networks, but doing this conversion before uploading can give you full confidence that the guidance sketch looks exactly like what you want it to be.
💡 While text prompt is handy to describe an image in your mind, finer guidance may need to be specified by reference, because other modalities can more easily carry information in some cases. One prominent example is handling human pose. Try letting the reference skeleton guide the character pose, even if you don’t know how to describe some tricky posture in language. Another example is AI is not good at counting and physics. If you want exactly 8 flowers, not 7, nor 9, on the ground, a reference sketch will give you a higher chance to get it.
Does reference image has the same aspect ratio as that of training samples? Choose the same aspect ratio for the forged image as that of the training samples.