Generate Images with Stable Diffusion XL
Local Stable Diffusion XL on Mac
I recently tried out running Stability AI's current biggest open source model - Stable Diffusion XL - locally. With suprisingly few lines of code and a few packages installed I got it up and running. This quick guide is only meant as a starting point, the topic of running and eventually finetuning diffusion models for image generation really deserves a deep dive at some point.
With the recently introduced metal support for pytorch, we can easily utilize apples GPU. One aspect that makes macs so interesting for running larger models, is the unified memory approach. For now though, support and documentation for how to utilize a mac for deep learning is a bit sad.
Install packages
I am using poetry for package management in python, as mentioned here. I recommend using python 3.10
for this, as I ran into some issues with newer versions. You can of course use conda with conda install
or pip pip install
commands too.
With poetry, after initialising a new project with poetry new my-proj-name
we add the following packages:
poetry add diffusers invisible-watermark accelerate safetensors torch jupyter
Write the Model Code
Create a new notebook and name it appropiately - I called mine stable_diff_xl.ipynb
.
First import the packages and tell torch to use the metal backend we need like this:
from diffusers import DiffusionPipeline, AutoencoderKL
import torch
mps_device = torch.device("mps")
Then add the code to download and initialize the base diffusion model. As of writing this blog post, some issues with fp16 arose and one mitigation was using the vae below. The download of all the required model parts might take some time so be patient. The last line tells torch to use our mps_device for the base pipeline.
vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16)
base = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16, use_safetensors=True,vae=vae, variant="fp16")
mps_device = torch.device("mps")
Stable Diffusion XL comes with an optional refiner model that is supposed to increase the quality of images after generation. However, when trying it out, it simply made the image less detailled and kind of blurry - no matter the parameters used.
If you want to try adding the refiner model too, take a look at this introduction on huggingface.
Generate Images
Thats all we need to get started with generating images! A basic call of our pipeline takes 2 important parameters to get going:
prompt
: The actual, textual prompt we use to tell our model what to generate and in what stylen_steps
: The amount of steps the diffusion model uses to refine the generated image. Usually a value between 30-50 makes sense here.
Of course there are a lot more parameters you can play with such as negative prompts (what you want your model NOT to generate). The code for generation looks like this:
n_steps = 30
prompt = "minimalistic web graphic, vector graphic,A Diffusion Model running on a Macbook, purple and mint accents, modern "
image = base(prompt=prompt,num_inference_steps=n_steps).images[0]
image
The image generated by this prompt was used as the header Image of this blog post btw. :j