rw-book-cover

Metadata

Highlights

  • The next fun step is image regularization. Common training images for SD are 512x512 pixels. That means all your images must be resized, or at least that was the common wisdom at the time I started training. I still do so and get good results but most people resort to bucketing which is allowing the LORA training script to autosort the images in size buckets. Also the common consensus is that too many buckets can cause poor quality in the training. My suggestion? Either resize everything to your desired training resolution or chose a couple of bucket sizes and resize everything to their closest appropriate bucket either manually or allowing upscaling in the training script. (View Highlight)
  • are the Dim, optimizer, the learning rate and the amount of epochs and the repetitions. Everyone has their own recipes for fine tuning. Some are better some are worse. Mine is as generic as it can be and it normally gives good results when generating at around .7 weight. I use Dim 32, AdamW with a learning rate of .0001. I strive to do 8 epochs of 1000 steps per epoch per sub concept. Alternatively use Dim 32, Prodigy with a learning rate of 1 with 500 steps per epoch per sub concept. (View Highlight)
  • Remember you are also training the main concept when doing this, in the case above this results in the character being trained 3000 steps. So be careful not to overcook it. The more overlapping concepts you add the higher the risk of overcooking the lora. This can be mitigated by removing the relation between the character and the outfit. Take for example outfit1 from above, I could take 50 of the images and remove the character tag and replace it for the original description tags(hair color, eye color, etc). that way when outfit1 is being trained character is not. Another alternative that somewhat works is using scale normalization that “flattens” values that shoot too high beyond the rest limiting overcooking a bit. The final method to keep overcooking under control is using the Prodigy optimizer which should make thing less prone to overcook, but i am still testing it. (View Highlight)
  • Captioning can be as deep as a puddle or as the Marianas trench. Captioning(adding tags) and pruning(deleting tags) are the way we create triggers, a trigger is a custom tag which has absorbed (for lack of a better word) the concepts of pruned tags. For anime characters it is recommended to use deepboru WD1.4 vit-v2 which uses danbooru style tagging. The best way i have found is to use diffusion-webui-dataset-tag-editor(look for it at extensions) for A1111 which includes a tag manager and waifu diffusion tagger. (View Highlight)
  • Caption cleaning: Before starting the Trigger selection it is best to do some tag cleaning(make sure to ignore tags that will be folded into triggers as those will likely be pruned). Superfluous tags are best served in the following ways: (View Highlight)
  • The proper way is to add the trigger tag to all images and then prune all intrinsic characteristics of the character like eye color, hair style (ponytail, bangs , etc), skin color, notable physical characteristics, maybe some hair ornaments or tattoos. The benefit is that the character will appear pretty much as expected when using the trigger. On the other hand it will fight the users if they want to change hair or eye color. (View Highlight)
  • . Well regularization images are like negative prompts but not really, they pull your model back to a “neutral” state. Either way unless you really need them ignore them. Normally they are not used in LORAs as there is no need to restore the model, as you can simply lower the LORA weight or simply deactivate it. There’s some theoretical uses below are a couple of examples. (View Highlight)
  • Mitigate the bleed over from your trigger. Suppose I want to train a character called Mona_Taco The result will be contaminated with images from the Monalisa and tacos. So you can go to A1111 and generate a bunch of images with the prompt Taco and Mona and dump them into your regularization folder with their appropriate captioning. Now your Lora Will know That Mona_Taco has nothing to do with the Mona Lisa nor Tacos. Alternatively simply use a different tag or concatenate it, MonaTaco will probably work fine by itself without the extra steps. I would still recommend to simply use a meaningless word that returns noise. (View Highlight)
  • Another use is for style neutralization, suppose you trained a character lora with a thousand images all including the tag 1girl. now whenever you run your lora and put 1girl it will always display your character. So to prevent this you should put a thousand different images of different girls all tagged as 1girl to balance out your training and remove your lora influence from the 1girl concept. Of course you have to do this with as many affected tags as you can. (View Highlight)
  • Locon(Lycoris): it picks up more detail and that may be a good thing in intricate objects but keep in mind the quality of your dataset as it will also pick up the noise and artifacts more strongly. Has a slight edgee on multy outfit loras as the extra details help it diferentiate the outfits limiting bleedover a bit(very slight improvement). (View Highlight)
  • • LoRA: Dim 32 alpha 16. • LoCON: either Dim 32 alpha 16 conv dim 32 and conv alpha 16 OR Dim 32 alpha 16 conv dim 16 and conv alpha 8. Don’t go over conv dim 64 and conv alpha 32 • LoHA: Dim 32 alpha 16 should work? Don’t go higher than dim 32 • LoKR: Very similar to LOHA Dim 32 alpha 16 should work? Don’t go higher than dim 32. According to the repos it might need some tweaking in the learning rate so try between 5e-5 to 8e-5 (.00005 to .00009) • IA3: Dim 32 alpha 16 should work. Need higher learning rate currently recommended is 5e-3 to 1e-2 (.0005 to .001) with adamw. Prodigy works fine at LR=1(tested). • Dylora: For Dylora the higher the better (they should always be divisible by 4) but also increases the training time. So dim=64, alpha=32 seems like a good compromise speedwise. The steps are configurable in the dylora unit value, the common value is 4 dim/ alpha so after training you could generate 64/32, 60/32, 64/28…4/4. Obviously Dylora take a lot longer to train or everyone would be using them for the extra flexibility. (View Highlight)
  • ptimizer: I currently recommend either Prodigy, AdamW or adamW8bit. If your Lora is in no risk of burning, I recommend to stick with AdamW. If on the other hard you are getting borderline due to dataset issues, Prodigy is the way to go to limit overbaking. For prodigy I would recommend to per keep total outfit repetitions to a maximum of 500 steps(Ie. 50 images 10 repetitions or 100 images 5 repetitions) per epoch as it uses a more aggressive learning rate. (View Highlight)
  • Learning rate modifiers: Technically important as they manipulate the learning rate. In practice? Just select “cosine with restart” for adamW and for prodigy “annealed cosine with warmup restarts” gave me good results. (View Highlight)
  • Bucketing: As far as I know the less buckets the better, for example if you have a minimum of 256 and a maximum of 1024 and 64 steps in between you can have a maximum of 12 buckets ((1024-256)/64=12) per side, with the complementary side sizes that do not exceed the max total pixel of the training resolution 262144 (512*512) resulting in 47 potential buckets in total. (View Highlight)
  • It is highly recommended to choose 4 or 5 buckets and resize your images to those resolutions as having too many buckets has been linked to getting blurry images. (View Highlight)