Automate Textual Inversion Training with Your Own Image Dataset (Faces)
--
Textual inversion allows you to train stable diffusion models on custom tokens. This allows you to create consistent images of someone (i.e. yourself) or something using stable diffusion! There are many articles online on how to do this, I used the process explained in this video; you can even do this online in Google Colab.
If you wanted to quickly create a training dataset for yourself (i.e. face images of yourself), you will realise it's quiet a cumbersome process. To make this task easier, I created a python script that crawls a file directory of images, finds faces within them, groups similar faces together (i.e. faces of a given person), and creates cropped training images of all of the faces found.
The process is quite straightforward:
Using the script
Your python environment must have the following packages installed, these packages are available via both pip and conda
numpy
PIL
tqdm
face_recognition
Then locate or create a folder of images you wish to process. For this example, I took snapshots from the TV show Scrubs (season 8, episode 1) instead of using my own images for privacy reasons.
Download the script from here: https://gist.github.com/thezaza101/2b69f049e533356c15cb813e14f1a3dc
Then simply point the script to the image library:
python createFaceExtracts.py -i [path to image library]
i.e.
python createFaceExtracts.py -i D:\Downloads\ScrubsSnapshots
You can also adjust how similar faces have to be to be grouped together using the -s parameter, the default is 0.6. Lower numbers will mean that faces need to be extremely similar to be grouped together (resulting in less false positives and more groups) and higher numbers will mean that faces can be less similar to be grouped together (resulting in more false positives but less groups).
Using the default settings, the script found 12 unique faces in the 58 input images. The image below is of the first group: