Automate Textual Inversion Training with Your Own Image Dataset (Faces)

Zahid Parvez
3 min readJan 28, 2023

Textual inversion allows you to train stable diffusion models on custom tokens. This allows you to create consistent images of someone (i.e. yourself) or something using stable diffusion! There are many articles online on how to do this, I used the process explained in this video; you can even do this online in Google Colab.

If you wanted to quickly create a training dataset for yourself (i.e. face images of yourself), you will realise it's quiet a cumbersome process. To make this task easier, I created a python script that crawls a file directory of images, finds faces within them, groups similar faces together (i.e. faces of a given person), and creates cropped training images of all of the faces found.

The process is quite straightforward:

Using the script

Your python environment must have the following packages installed, these packages are available via both pip and conda


Then locate or create a folder of images you wish to process. For this example, I took snapshots from the TV show Scrubs (season 8, episode 1) instead of using my own images for privacy reasons.



Zahid Parvez

I am an analyst with a passion for data, software, and integration. In my free time, I also like to dabble in design, photography, and philosophy.