CS180 Project 5: Fun with Diffusion Models

Matthew Sulistijowadi, pre-amble countdown

Overview:

Played with diffusion models, including sampling loops used for inpainting and creation of optical illusions. Part B included training our own diffusion model using MNIST.

Part A: The Power of Diffusion Models!

Part 0: Setup

Random seed used is 47

With the given 3 prompts, generated images for each of them, and used inference steps of 5 and 20.

Num Inference 5
Num Inference 20

1.1 Forward Process

Did forward process of diffusion, at different timesteps added scaled noise to clean images with visuals for 250, 500 and 750.

Noisy 250
Noisy 500
Noisy 750

1.2 Classical Denoising

Used Gaussian blur filtering to try and elim noise for above images

De-Noisy 250

De-Noisy 500

De-Noisy 750

1.3 One-Step Denoising

Using pretrained diffusion model UNet to denoise images. Try to recover some gaussian noise from image and get something closer to OG image

Original (blurry since originally small)
Noisy 250
Noisy 500
Noisy 750
Est. Clean Image 250
Est. Clean Image 500
Est. Clean Image 750

1.4 Iterative Denoising

Cleans up noisy images in each step by starting with image being heavily distorted to then decrease noise levels over steps getting even better higher quality images

5th Loop Denoise
10th Loop Denoise
15th Loop Denoise
20th Loop Denoise
25th Loop Denoise

Clean Iterative Denoise
Clean One-Denoise
Clean Gauss Blur/

1.5 Diffusion Model Sampling

Generated images from scratch from random noise and used the print “a high quality photo”

1.6 Classifier-free Guidance (CFG)

Images generated with CFG are greatly enhanced and of higher quality. Combine noise estimates of conditioning a prompt and another unconditional estimate. Compared to previously, these images look better by a large amount

1.7 Image-to-image Translation (with CFG from now on)

Start with different noise levels with higher levels being closer to OG image.

Campanile
Japanese Temple
Parliament
Noise Level 1
Noise Level 3
Noise Level 5
Noise Level 7
Noise Level 10
Noise Level 20
Noise Level 1
Noise Level 3
Noise Level 5
Noise Level 7
Noise Level 10
Noise Level 20
Noise Level 1
Noise Level 3
Noise Level 5
Noise Level 7
Noise Level 10
Noise Level 20

1.7.1 Editing Hand-Drawn and Web Images

Using non realistic images, use the diffusion model and transform them closer to image. Also for hand drawn images, use same process to make them look better.

Avocado Grid
1
3
5
7
10
20
Drawing of Duck
1
3
5
7
10
20
funny 20 result, punk look with sign saying LOL
Drawing of Flower
1
3
5
7
10
20

1.7.2 Inpainting

Can alter desired parts of images without changing the rest of it. can make new content whenever the mask is set to 1. Can apply to Campanile and other previous images

1.7.3 Text-Conditioned Image-to-image Translation

Guide the projected output with text prompt giving some control (ie Campanile is “a rocket ship”)

1
3
5

7
10
20

“a cheese block”

“a wall of fire”

1.8 Visual Anagrams

Create optical illusions with diffusion models, looks like one thing but if flip, then something else.

‘an oil painting of people around a campfire’
‘an oil painting of an old man’
“a woman holding beans”
“a photo of a water buffalo” (woman arms become horns and dress makes buffalo head
“an oil painting of noodles”

“an oil painting of an orange”

1.9 Hybrid Images

Depending on distance from image, the view changes.

Waterfall close up, Skull further away
Goose close up, Statue of Liberty further away
Lake close up, Car further away
Bear close up, Egg further away

Part B: Diffusion Models from Scratch!

Training own diffusion model for MNSIT with first part making UNet architecture to have image denoising.

UNet contains some up and down sampling with skip connections, constructed the Unconditional UNet and Standard operations for it.

Tensor operations and UNet structure implemented

Part 1: Training a Single-Step Denoising UNet

Given z (noisy image) want to train the denoiser so it corresponds to a clean image and below, can see the progress of gradually decreasing the noise in each image.

Visualization of Noising Process with [0.0, 0.2, 0.4, 0.5, 0.6, 0.8, 1.0]
Curve from beginning before settling
Training loss curve plot (super eratic but notice the Loss being very small at < 1%
Test set results from 1 Epoch
Test set results from 5 Epochs
Test Varying noise levels on digits with sigmas not previously trained on. Meets website looks of sigmas 0.8 and 1.0 having worse looking results while the others look fine.

Part 2: Training a Diffusion Model

Started training model with time-conditioned UNet to get somewhat reasonable results but also did class-conditioning to dramatically improve quality and consistency to get numbers that look fairly normal. Made a new UNet by tweaking the UNet previously created to insert timestep parts usign the FCBlocks.

Added Time conditioning using FCBlocks (red boxed section)

Training Loss after 20 epochs

2.3 Sampling from UNet

Images denoised in order to output digits although a noticeable majority have some deformation or unclear even which number it is (if any). Thus use class conditioning to increase the accuracy with classifier-free guidance in which digits are created (and actually look like numbers that someone would write).

Sampling UNet Time conditioning with 5 epochs
Sampling UNet Time conditioning with 20 epochs

2.4 + 2.5 Adding and sampling Class-Conditioning UNet

Training Loss for the class conditioning extended UNet.

5 Epochs Class Conditioning (notice how most numbers look pretty good tho have thicker lines vs 20 epochs indicating still incomplete and needs more processing)
20 Epochs Class Conditioning