CS180 Project 5: Fun with Diffusion Models
Matthew Sulistijowadi, pre-amble countdown
Overview:
Played with diffusion models, including sampling loops used for inpainting and creation of optical illusions. Part B included training our own diffusion model using MNIST.
Part A: The Power of Diffusion Models!
Part 0: Setup
Random seed used is 47
With the given 3 prompts, generated images for each of them, and used inference steps of 5 and 20.
1.1 Forward Process
Did forward process of diffusion, at different timesteps added scaled noise to clean images with visuals for 250, 500 and 750.
1.2 Classical Denoising
Used Gaussian blur filtering to try and elim noise for above images
1.3 One-Step Denoising
Using pretrained diffusion model UNet to denoise images. Try to recover some gaussian noise from image and get something closer to OG image
1.4 Iterative Denoising
Cleans up noisy images in each step by starting with image being heavily distorted to then decrease noise levels over steps getting even better higher quality images
1.5 Diffusion Model Sampling
Generated images from scratch from random noise and used the print “a high quality photo”
1.6 Classifier-free Guidance (CFG)
Images generated with CFG are greatly enhanced and of higher quality. Combine noise estimates of conditioning a prompt and another unconditional estimate. Compared to previously, these images look better by a large amount
1.7 Image-to-image Translation (with CFG from now on)
Start with different noise levels with higher levels being closer to OG image.
1.7.1 Editing Hand-Drawn and Web Images
Using non realistic images, use the diffusion model and transform them closer to image. Also for hand drawn images, use same process to make them look better.
1.7.2 Inpainting
Can alter desired parts of images without changing the rest of it. can make new content whenever the mask is set to 1. Can apply to Campanile and other previous images
1.7.3 Text-Conditioned Image-to-image Translation
Guide the projected output with text prompt giving some control (ie Campanile is “a rocket ship”)
“a cheese block”
“a wall of fire”
1.8 Visual Anagrams
Create optical illusions with diffusion models, looks like one thing but if flip, then something else.
1.9 Hybrid Images
Depending on distance from image, the view changes.
Part B: Diffusion Models from Scratch!
Training own diffusion model for MNSIT with first part making UNet architecture to have image denoising.
UNet contains some up and down sampling with skip connections, constructed the Unconditional UNet and Standard operations for it.
Part 1: Training a Single-Step Denoising UNet
Given z (noisy image) want to train the denoiser so it corresponds to a clean image and below, can see the progress of gradually decreasing the noise in each image.
Part 2: Training a Diffusion Model
Started training model with time-conditioned UNet to get somewhat reasonable results but also did class-conditioning to dramatically improve quality and consistency to get numbers that look fairly normal. Made a new UNet by tweaking the UNet previously created to insert timestep parts usign the FCBlocks.
2.3 Sampling from UNet
Images denoised in order to output digits although a noticeable majority have some deformation or unclear even which number it is (if any). Thus use class conditioning to increase the accuracy with classifier-free guidance in which digits are created (and actually look like numbers that someone would write).
2.4 + 2.5 Adding and sampling Class-Conditioning UNet
Training Loss for the class conditioning extended UNet.