Stable Diffusion tutorial: How to make videos with Stable Diffusion? - Interpolation

Thursday, September 22, 2022 by jakub.misilo
Stable Diffusion tutorial: How to make videos with Stable Diffusion? - Interpolation

🧐 What is Stable Diffusion?

Stable diffusion is Open Source latent text-to-image diffusion model. You can find out more here or try it by yourself - code is available here.

🎯 What is our goal and how will we achieve it?

Our goal is to make a video using interpolation process. We will use the Stable Diffusion model to generate images and then we will use them to make a video. Fortunately, we don't have to write the code ourselves for interpolation between latent spaces. We will use stable_diffusion_videos library. If you want to know exactly how it works under the hood, feel free to explore the code on Github. If you need help - ask a question on the channel dedicated to this guide. You will find it on our discord!

To run this tutorial we will use Google Colab and Google Drive

⚙️ Preparing dependencies

First of all, you need to install all dependencies and connect our Google Drive with Colab - to save movie and frames. You can do it by running:

from google.colab import drive
drive.mount('/content/drive')

and then:

!pip install stable_diffusion_videos

Next step is authentication with Hugging Face. You can find your token here.

!huggingface-cli login

🎥 Generating images/video

To generate video, we need to define prompts between which model will interpolate. We will use a dictionary for it:

prompts = {
    'blonde woman': 897,
    'brunette woman': 897
}

Now we are able to generate images/video using:

from stable_diffusion_videos import walk

walk(
    prompts=list(prompts.keys()),
    seeds=list(prompts.values()),
    output_dir='./drive/MyDrive/dir1',     # Where images/video will be saved
    name='subdir',                         # Subdirectory of output_dir where images/video will be saved
    guidance_scale=6.25,                   # Higher adheres to prompt more, lower lets model take the wheel
    num_steps=100,                         # Number of generated frames
    num_inference_steps=50,                # Number of steps by model to generate one image
    scheduler='klms',
    disable_tqdm=False,
    make_video=True,                       # If false, just save images
    use_lerp_for_text=True,
    do_loop=False,
)

This process may take a long time, depending on passed parameters.

There are some descriptions of parameters, but if you want to know more - check stable_diffusion_videos code! I use 100 steps between prompts, but you can use more for better results. You can also modify num_inference_steps and other parameters. Feel free to experiment! After running this code, you will find the video in your Google Drive. You can download it and watch and share with your friends!

If you want to reproduce my results - just copy and paste code below, but I recommend to use your own prompts and experiment with model - it's worth it!

➕ Bonus

You can use more than two prompts. Example:

prompts = {
    'blonde woman': 897,
    'brunette woman': 897,
    'ball': 10
}

Thanks for reading! Wait for next tutorials!

Jakub Misiło - Junior Data Scientist in New Native