# Chapter 10: Project: Image Generation with Diffusion Models

## 10.2 Model Creation

Creating a diffusion model for image generation involves designing and implementing a neural network architecture capable of learning the denoising process. In this section, we will build a diffusion model step-by-step, including the noise addition layer, denoising network, and step encoding. We will also compile the model with an appropriate optimizer and loss function.

### 10.2.1 Noise Addition Layer

The noise addition layer simulates the forward diffusion process by adding Gaussian noise to the input images at each step. This layer will be used during both training and inference to progressively transform the images into a noise distribution.

**Example: Noise Addition Layer**

`import tensorflow as tf`

from tensorflow.keras.layers import Layer

class NoiseAddition(Layer):

def __init__(self, noise_scale=0.1, **kwargs):

super(NoiseAddition, self).__init__(**kwargs)

self.noise_scale = noise_scale

def call(self, inputs, training=None):

if training:

noise = tf.random.normal(shape=tf.shape(inputs), mean=0.0, stddev=self.noise_scale, dtype=tf.float32)

return inputs + noise

return inputs

# Example usage with a batch of images

noise_layer = NoiseAddition(noise_scale=0.1)

noisy_images = noise_layer(train_images[:10], training=True)

# Plot original and noisy images for comparison

import matplotlib.pyplot as plt

plt.figure(figsize=(12, 4))

for i in range(10):

plt.subplot(2, 10, i + 1)

plt.imshow((train_images[i] * 0.5) + 0.5)

plt.axis('off')

plt.subplot(2, 10, i + 11)

plt.imshow((noisy_images[i] * 0.5) + 0.5)

plt.axis('off')

plt.show()

This code uses the TensorFlow library to define a custom layer class called `NoiseAddition`

. This class adds random noise to its input data, but only when it's in training mode. The noise is normally distributed with a mean of 0 and a standard deviation specified by `noise_scale`

. The `call`

method checks if the layer is in training mode and if so, adds the noise to the input data.

The code then demonstrates how to use the `NoiseAddition`

layer by creating an instance of it, applying it to a batch of training images, and storing the noisy images. It then plots the original and noisy images for comparison using the `matplotlib`

library.

### 10.2.2 Denoising Network

The denoising network is the core component of the diffusion model. It predicts and removes the noise added to the images at each step. We will use a Convolutional Neural Network (CNN) for this purpose, as CNNs are well-suited for image processing tasks.

**Example: Denoising Network**

`from tensorflow.keras.layers import Conv2D, BatchNormalization, LeakyReLU, UpSampling2D`

def build_denoising_network(input_shape):

"""

Builds a denoising network using a Convolutional Neural Network (CNN).

Parameters:

- input_shape: Shape of the input images.

Returns:

- A Keras model for denoising.

"""

inputs = Input(shape=input_shape)

# Encoder

x = Conv2D(64, (3, 3), padding='same')(inputs)

x = BatchNormalization()(x)

x = LeakyReLU()(x)

x = Conv2D(128, (3, 3), padding='same', strides=2)(x)

x = BatchNormalization()(x)

x = LeakyReLU()(x)

# Bottleneck

x = Conv2D(256, (3, 3), padding='same')(x)

x = BatchNormalization()(x)

x = LeakyReLU()(x)

# Decoder

x = UpSampling2D()(x)

x = Conv2D(128, (3, 3), padding='same')(x)

x = BatchNormalization()(x)

x = LeakyReLU()(x)

x = Conv2D(64, (3, 3), padding='same')(x)

x = BatchNormalization()(x)

x = LeakyReLU()(x)

outputs = Conv2D(3, (3, 3), padding='same', activation='tanh')(x)

return Model(inputs, outputs)

# Example usage with CIFAR-10 image shape

input_shape = (32, 32, 3)

denoising_network = build_denoising_network(input_shape)

denoising_network.summary()

This code defines a function that builds a Convolutional Neural Network (CNN) for denoising images. It uses Keras, a machine learning library in Python.

The network is divided into three parts: encoder, bottleneck, and decoder.

The encoder reduces the spatial dimensions of the input while increasing the depth. The bottleneck is the deepest layer, where the image is compressed. The decoder then reconstructs the image from the compressed representation, aiming to remove the noise while retaining the original information.

The function is then used to build a denoising network for images of shape (32, 32, 3), which is the shape of images in the CIFAR-10 dataset, and the structure of the built network is printed out.

### 10.2.3 Step Encoding

Step encoding is used to provide the denoising network with information about the current time step of the diffusion process. This helps the network understand the level of noise in the input images and make accurate predictions. We will use sinusoidal encoding for this purpose.

**Example: Step Encoding**

`def sinusoidal_step_encoding(t, d_model):`

"""

Computes sinusoidal step encoding.

Parameters:

- t: Current time step.

- d_model: Dimensionality of the model.

Returns:

- Sinusoidal step encoding vector.

"""

angle_rates = 1 / np.power(10000, (2 * (np.arange(d_model) // 2)) / np.float32(d_model))

angle_rads = t * angle_rates

angle_rads[:, 0::2] = np.sin(angle_rads[:, 0::2])

angle_rads[:, 1::2] = np.cos(angle_rads[:, 1::2])

return angle_rads

# Example usage with a specific time step and model dimensionality

t = np.arange(10).reshape(-1, 1)

d_model = 128

step_encoding = sinusoidal_step_encoding(t, d_model)

# Print the step encoding

print(step_encoding)

This code defines a function called `sinusoidal_step_encoding`

, which calculates a sinusoidal step encoding. This is a technique often used in natural language processing to encode the position of words in a sentence.

The function takes two parameters:

`t`

(the current time step),`d_model`

(the dimensionality of the model).

It then computes `angle_rates`

and `angle_rads`

, applying sine to even indices and cosine to odd indices in the `angle_rads`

array. This creates a pattern of sine and cosine waves that provides unique encodings for different positions in a sequence.

The bottom part of the code provides an example of how to use this function. It creates a numpy array `t`

with a range from 0 to 9 (reshaped into a column vector), sets `d_model`

to 128, uses these values to compute the step encoding, and then prints the result.

### 10.2.4 Full Diffusion Model

Combining the noise addition layer, denoising network, and step encoding, we can construct the full diffusion model. This model will iteratively denoise the input images, guided by the step encoding and the loss function.

**Example: Full Diffusion Model**

`from tensorflow.keras.layers import Input, Concatenate`

def build_full_diffusion_model(input_shape, d_model):

"""

Builds the full diffusion model.

Parameters:

- input_shape: Shape of the input images.

- d_model: Dimensionality of the model.

Returns:

- A Keras model for the full diffusion process.

"""

# Input layers for images and step encoding

image_input = Input(shape=input_shape)

step_input = Input(shape=(d_model,))

# Apply noise addition layer

noisy_images = NoiseAddition()(image_input)

# Flatten and concatenate inputs

x = Conv2D(64, (3, 3), padding='same')(noisy_images)

x = BatchNormalization()(x)

x = LeakyReLU()(x)

step_embedding = Dense(np.prod(input_shape))(step_input)

step_embedding = Reshape(input_shape)(step_embedding)

x = Concatenate()([x, step_embedding])

# Apply denoising network

denoised_images = build_denoising_network(input_shape)(x)

return Model([image_input, step_input], denoised_images)

# Example usage with CIFAR-10 image shape

input_shape = (32, 32, 3)

d_model = 128

diffusion_model = build_full_diffusion_model(input_shape, d_model)

diffusion_model.summary()

This snippet of code defines a function to build the full diffusion model using Keras. This model is used in machine learning for tasks such as image denoising. The function takes the shape of the input images and the dimensionality of the model as arguments. It first creates input layers for images and step encoding.

Then, it adds noise to the images and flattens and concatenates the inputs. The denoising network is then applied to the noisy images. The function returns the built model.

### 10.2.5 Compiling the Model

To compile the diffusion model, we need to specify an optimizer and a loss function. The mean squared error (MSE) loss function is commonly used for training diffusion models, as it measures the difference between the predicted noise and the actual noise.

**Example: Compiling the Model**

`from tensorflow.keras.optimizers import Adam`

from tensorflow.keras.losses import MeanSquaredError

# Compile the diffusion model

diffusion_model.compile(optimizer=Adam(learning_rate=1e-4), loss=MeanSquaredError())

# Print the model summary

diffusion_model.summary()

The code uses the Tensorflow and Keras libraries. It's used to compile a machine learning model called `diffusion_model`

with specific configurations. The Adam optimization algorithm is selected with a learning rate of 0.0001. The loss function, which measures how well the model is performing, is set to Mean Squared Error (MSE). After setting these configurations, the model is compiled and the summary of the model's architecture is printed.

**Summary**

In this section, we successfully created the diffusion model for our image generation project. We started by implementing the noise addition layer, which simulates the forward diffusion process. Next, we built a denoising network using a Convolutional Neural Network (CNN) to predict and remove noise from the images. We also implemented step encoding to provide temporal information to the denoising network.

Combining these components, we constructed the full diffusion model, which iteratively denoises the input images. Finally, we compiled the model with an appropriate optimizer and loss function, preparing it for training.

With our model ready, we can now move on to the next step: training the diffusion model on the prepared data. In the following sections, we will train the model, generate images, and evaluate its performance, providing a comprehensive understanding of how to apply diffusion models to real-world image generation tasks.

## 10.2 Model Creation

Creating a diffusion model for image generation involves designing and implementing a neural network architecture capable of learning the denoising process. In this section, we will build a diffusion model step-by-step, including the noise addition layer, denoising network, and step encoding. We will also compile the model with an appropriate optimizer and loss function.

### 10.2.1 Noise Addition Layer

The noise addition layer simulates the forward diffusion process by adding Gaussian noise to the input images at each step. This layer will be used during both training and inference to progressively transform the images into a noise distribution.

**Example: Noise Addition Layer**

`import tensorflow as tf`

from tensorflow.keras.layers import Layer

class NoiseAddition(Layer):

def __init__(self, noise_scale=0.1, **kwargs):

super(NoiseAddition, self).__init__(**kwargs)

self.noise_scale = noise_scale

def call(self, inputs, training=None):

if training:

noise = tf.random.normal(shape=tf.shape(inputs), mean=0.0, stddev=self.noise_scale, dtype=tf.float32)

return inputs + noise

return inputs

# Example usage with a batch of images

noise_layer = NoiseAddition(noise_scale=0.1)

noisy_images = noise_layer(train_images[:10], training=True)

# Plot original and noisy images for comparison

import matplotlib.pyplot as plt

plt.figure(figsize=(12, 4))

for i in range(10):

plt.subplot(2, 10, i + 1)

plt.imshow((train_images[i] * 0.5) + 0.5)

plt.axis('off')

plt.subplot(2, 10, i + 11)

plt.imshow((noisy_images[i] * 0.5) + 0.5)

plt.axis('off')

plt.show()

This code uses the TensorFlow library to define a custom layer class called `NoiseAddition`

. This class adds random noise to its input data, but only when it's in training mode. The noise is normally distributed with a mean of 0 and a standard deviation specified by `noise_scale`

. The `call`

method checks if the layer is in training mode and if so, adds the noise to the input data.

The code then demonstrates how to use the `NoiseAddition`

layer by creating an instance of it, applying it to a batch of training images, and storing the noisy images. It then plots the original and noisy images for comparison using the `matplotlib`

library.

### 10.2.2 Denoising Network

The denoising network is the core component of the diffusion model. It predicts and removes the noise added to the images at each step. We will use a Convolutional Neural Network (CNN) for this purpose, as CNNs are well-suited for image processing tasks.

**Example: Denoising Network**

`from tensorflow.keras.layers import Conv2D, BatchNormalization, LeakyReLU, UpSampling2D`

def build_denoising_network(input_shape):

"""

Builds a denoising network using a Convolutional Neural Network (CNN).

Parameters:

- input_shape: Shape of the input images.

Returns:

- A Keras model for denoising.

"""

inputs = Input(shape=input_shape)

# Encoder

x = Conv2D(64, (3, 3), padding='same')(inputs)

x = BatchNormalization()(x)

x = LeakyReLU()(x)

x = Conv2D(128, (3, 3), padding='same', strides=2)(x)

x = BatchNormalization()(x)

x = LeakyReLU()(x)

# Bottleneck

x = Conv2D(256, (3, 3), padding='same')(x)

x = BatchNormalization()(x)

x = LeakyReLU()(x)

# Decoder

x = UpSampling2D()(x)

x = Conv2D(128, (3, 3), padding='same')(x)

x = BatchNormalization()(x)

x = LeakyReLU()(x)

x = Conv2D(64, (3, 3), padding='same')(x)

x = BatchNormalization()(x)

x = LeakyReLU()(x)

outputs = Conv2D(3, (3, 3), padding='same', activation='tanh')(x)

return Model(inputs, outputs)

# Example usage with CIFAR-10 image shape

input_shape = (32, 32, 3)

denoising_network = build_denoising_network(input_shape)

denoising_network.summary()

This code defines a function that builds a Convolutional Neural Network (CNN) for denoising images. It uses Keras, a machine learning library in Python.

The network is divided into three parts: encoder, bottleneck, and decoder.

The encoder reduces the spatial dimensions of the input while increasing the depth. The bottleneck is the deepest layer, where the image is compressed. The decoder then reconstructs the image from the compressed representation, aiming to remove the noise while retaining the original information.

The function is then used to build a denoising network for images of shape (32, 32, 3), which is the shape of images in the CIFAR-10 dataset, and the structure of the built network is printed out.

### 10.2.3 Step Encoding

Step encoding is used to provide the denoising network with information about the current time step of the diffusion process. This helps the network understand the level of noise in the input images and make accurate predictions. We will use sinusoidal encoding for this purpose.

**Example: Step Encoding**

`def sinusoidal_step_encoding(t, d_model):`

"""

Computes sinusoidal step encoding.

Parameters:

- t: Current time step.

- d_model: Dimensionality of the model.

Returns:

- Sinusoidal step encoding vector.

"""

angle_rates = 1 / np.power(10000, (2 * (np.arange(d_model) // 2)) / np.float32(d_model))

angle_rads = t * angle_rates

angle_rads[:, 0::2] = np.sin(angle_rads[:, 0::2])

angle_rads[:, 1::2] = np.cos(angle_rads[:, 1::2])

return angle_rads

# Example usage with a specific time step and model dimensionality

t = np.arange(10).reshape(-1, 1)

d_model = 128

step_encoding = sinusoidal_step_encoding(t, d_model)

# Print the step encoding

print(step_encoding)

This code defines a function called `sinusoidal_step_encoding`

, which calculates a sinusoidal step encoding. This is a technique often used in natural language processing to encode the position of words in a sentence.

The function takes two parameters:

`t`

(the current time step),`d_model`

(the dimensionality of the model).

It then computes `angle_rates`

and `angle_rads`

, applying sine to even indices and cosine to odd indices in the `angle_rads`

array. This creates a pattern of sine and cosine waves that provides unique encodings for different positions in a sequence.

The bottom part of the code provides an example of how to use this function. It creates a numpy array `t`

with a range from 0 to 9 (reshaped into a column vector), sets `d_model`

to 128, uses these values to compute the step encoding, and then prints the result.

### 10.2.4 Full Diffusion Model

Combining the noise addition layer, denoising network, and step encoding, we can construct the full diffusion model. This model will iteratively denoise the input images, guided by the step encoding and the loss function.

**Example: Full Diffusion Model**

`from tensorflow.keras.layers import Input, Concatenate`

def build_full_diffusion_model(input_shape, d_model):

"""

Builds the full diffusion model.

Parameters:

- input_shape: Shape of the input images.

- d_model: Dimensionality of the model.

Returns:

- A Keras model for the full diffusion process.

"""

# Input layers for images and step encoding

image_input = Input(shape=input_shape)

step_input = Input(shape=(d_model,))

# Apply noise addition layer

noisy_images = NoiseAddition()(image_input)

# Flatten and concatenate inputs

x = Conv2D(64, (3, 3), padding='same')(noisy_images)

x = BatchNormalization()(x)

x = LeakyReLU()(x)

step_embedding = Dense(np.prod(input_shape))(step_input)

step_embedding = Reshape(input_shape)(step_embedding)

x = Concatenate()([x, step_embedding])

# Apply denoising network

denoised_images = build_denoising_network(input_shape)(x)

return Model([image_input, step_input], denoised_images)

# Example usage with CIFAR-10 image shape

input_shape = (32, 32, 3)

d_model = 128

diffusion_model = build_full_diffusion_model(input_shape, d_model)

diffusion_model.summary()

This snippet of code defines a function to build the full diffusion model using Keras. This model is used in machine learning for tasks such as image denoising. The function takes the shape of the input images and the dimensionality of the model as arguments. It first creates input layers for images and step encoding.

Then, it adds noise to the images and flattens and concatenates the inputs. The denoising network is then applied to the noisy images. The function returns the built model.

### 10.2.5 Compiling the Model

To compile the diffusion model, we need to specify an optimizer and a loss function. The mean squared error (MSE) loss function is commonly used for training diffusion models, as it measures the difference between the predicted noise and the actual noise.

**Example: Compiling the Model**

`from tensorflow.keras.optimizers import Adam`

from tensorflow.keras.losses import MeanSquaredError

# Compile the diffusion model

diffusion_model.compile(optimizer=Adam(learning_rate=1e-4), loss=MeanSquaredError())

# Print the model summary

diffusion_model.summary()

The code uses the Tensorflow and Keras libraries. It's used to compile a machine learning model called `diffusion_model`

with specific configurations. The Adam optimization algorithm is selected with a learning rate of 0.0001. The loss function, which measures how well the model is performing, is set to Mean Squared Error (MSE). After setting these configurations, the model is compiled and the summary of the model's architecture is printed.

**Summary**

In this section, we successfully created the diffusion model for our image generation project. We started by implementing the noise addition layer, which simulates the forward diffusion process. Next, we built a denoising network using a Convolutional Neural Network (CNN) to predict and remove noise from the images. We also implemented step encoding to provide temporal information to the denoising network.

Combining these components, we constructed the full diffusion model, which iteratively denoises the input images. Finally, we compiled the model with an appropriate optimizer and loss function, preparing it for training.

With our model ready, we can now move on to the next step: training the diffusion model on the prepared data. In the following sections, we will train the model, generate images, and evaluate its performance, providing a comprehensive understanding of how to apply diffusion models to real-world image generation tasks.

## 10.2 Model Creation

Creating a diffusion model for image generation involves designing and implementing a neural network architecture capable of learning the denoising process. In this section, we will build a diffusion model step-by-step, including the noise addition layer, denoising network, and step encoding. We will also compile the model with an appropriate optimizer and loss function.

### 10.2.1 Noise Addition Layer

The noise addition layer simulates the forward diffusion process by adding Gaussian noise to the input images at each step. This layer will be used during both training and inference to progressively transform the images into a noise distribution.

**Example: Noise Addition Layer**

`import tensorflow as tf`

from tensorflow.keras.layers import Layer

class NoiseAddition(Layer):

def __init__(self, noise_scale=0.1, **kwargs):

super(NoiseAddition, self).__init__(**kwargs)

self.noise_scale = noise_scale

def call(self, inputs, training=None):

if training:

noise = tf.random.normal(shape=tf.shape(inputs), mean=0.0, stddev=self.noise_scale, dtype=tf.float32)

return inputs + noise

return inputs

# Example usage with a batch of images

noise_layer = NoiseAddition(noise_scale=0.1)

noisy_images = noise_layer(train_images[:10], training=True)

# Plot original and noisy images for comparison

import matplotlib.pyplot as plt

plt.figure(figsize=(12, 4))

for i in range(10):

plt.subplot(2, 10, i + 1)

plt.imshow((train_images[i] * 0.5) + 0.5)

plt.axis('off')

plt.subplot(2, 10, i + 11)

plt.imshow((noisy_images[i] * 0.5) + 0.5)

plt.axis('off')

plt.show()

This code uses the TensorFlow library to define a custom layer class called `NoiseAddition`

. This class adds random noise to its input data, but only when it's in training mode. The noise is normally distributed with a mean of 0 and a standard deviation specified by `noise_scale`

. The `call`

method checks if the layer is in training mode and if so, adds the noise to the input data.

The code then demonstrates how to use the `NoiseAddition`

layer by creating an instance of it, applying it to a batch of training images, and storing the noisy images. It then plots the original and noisy images for comparison using the `matplotlib`

library.

### 10.2.2 Denoising Network

The denoising network is the core component of the diffusion model. It predicts and removes the noise added to the images at each step. We will use a Convolutional Neural Network (CNN) for this purpose, as CNNs are well-suited for image processing tasks.

**Example: Denoising Network**

`from tensorflow.keras.layers import Conv2D, BatchNormalization, LeakyReLU, UpSampling2D`

def build_denoising_network(input_shape):

"""

Builds a denoising network using a Convolutional Neural Network (CNN).

Parameters:

- input_shape: Shape of the input images.

Returns:

- A Keras model for denoising.

"""

inputs = Input(shape=input_shape)

# Encoder

x = Conv2D(64, (3, 3), padding='same')(inputs)

x = BatchNormalization()(x)

x = LeakyReLU()(x)

x = Conv2D(128, (3, 3), padding='same', strides=2)(x)

x = BatchNormalization()(x)

x = LeakyReLU()(x)

# Bottleneck

x = Conv2D(256, (3, 3), padding='same')(x)

x = BatchNormalization()(x)

x = LeakyReLU()(x)

# Decoder

x = UpSampling2D()(x)

x = Conv2D(128, (3, 3), padding='same')(x)

x = BatchNormalization()(x)

x = LeakyReLU()(x)

x = Conv2D(64, (3, 3), padding='same')(x)

x = BatchNormalization()(x)

x = LeakyReLU()(x)

outputs = Conv2D(3, (3, 3), padding='same', activation='tanh')(x)

return Model(inputs, outputs)

# Example usage with CIFAR-10 image shape

input_shape = (32, 32, 3)

denoising_network = build_denoising_network(input_shape)

denoising_network.summary()

This code defines a function that builds a Convolutional Neural Network (CNN) for denoising images. It uses Keras, a machine learning library in Python.

The network is divided into three parts: encoder, bottleneck, and decoder.

The encoder reduces the spatial dimensions of the input while increasing the depth. The bottleneck is the deepest layer, where the image is compressed. The decoder then reconstructs the image from the compressed representation, aiming to remove the noise while retaining the original information.

The function is then used to build a denoising network for images of shape (32, 32, 3), which is the shape of images in the CIFAR-10 dataset, and the structure of the built network is printed out.

### 10.2.3 Step Encoding

Step encoding is used to provide the denoising network with information about the current time step of the diffusion process. This helps the network understand the level of noise in the input images and make accurate predictions. We will use sinusoidal encoding for this purpose.

**Example: Step Encoding**

`def sinusoidal_step_encoding(t, d_model):`

"""

Computes sinusoidal step encoding.

Parameters:

- t: Current time step.

- d_model: Dimensionality of the model.

Returns:

- Sinusoidal step encoding vector.

"""

angle_rates = 1 / np.power(10000, (2 * (np.arange(d_model) // 2)) / np.float32(d_model))

angle_rads = t * angle_rates

angle_rads[:, 0::2] = np.sin(angle_rads[:, 0::2])

angle_rads[:, 1::2] = np.cos(angle_rads[:, 1::2])

return angle_rads

# Example usage with a specific time step and model dimensionality

t = np.arange(10).reshape(-1, 1)

d_model = 128

step_encoding = sinusoidal_step_encoding(t, d_model)

# Print the step encoding

print(step_encoding)

This code defines a function called `sinusoidal_step_encoding`

, which calculates a sinusoidal step encoding. This is a technique often used in natural language processing to encode the position of words in a sentence.

The function takes two parameters:

`t`

(the current time step),`d_model`

(the dimensionality of the model).

It then computes `angle_rates`

and `angle_rads`

, applying sine to even indices and cosine to odd indices in the `angle_rads`

array. This creates a pattern of sine and cosine waves that provides unique encodings for different positions in a sequence.

The bottom part of the code provides an example of how to use this function. It creates a numpy array `t`

with a range from 0 to 9 (reshaped into a column vector), sets `d_model`

to 128, uses these values to compute the step encoding, and then prints the result.

### 10.2.4 Full Diffusion Model

Combining the noise addition layer, denoising network, and step encoding, we can construct the full diffusion model. This model will iteratively denoise the input images, guided by the step encoding and the loss function.

**Example: Full Diffusion Model**

`from tensorflow.keras.layers import Input, Concatenate`

def build_full_diffusion_model(input_shape, d_model):

"""

Builds the full diffusion model.

Parameters:

- input_shape: Shape of the input images.

- d_model: Dimensionality of the model.

Returns:

- A Keras model for the full diffusion process.

"""

# Input layers for images and step encoding

image_input = Input(shape=input_shape)

step_input = Input(shape=(d_model,))

# Apply noise addition layer

noisy_images = NoiseAddition()(image_input)

# Flatten and concatenate inputs

x = Conv2D(64, (3, 3), padding='same')(noisy_images)

x = BatchNormalization()(x)

x = LeakyReLU()(x)

step_embedding = Dense(np.prod(input_shape))(step_input)

step_embedding = Reshape(input_shape)(step_embedding)

x = Concatenate()([x, step_embedding])

# Apply denoising network

denoised_images = build_denoising_network(input_shape)(x)

return Model([image_input, step_input], denoised_images)

# Example usage with CIFAR-10 image shape

input_shape = (32, 32, 3)

d_model = 128

diffusion_model = build_full_diffusion_model(input_shape, d_model)

diffusion_model.summary()

This snippet of code defines a function to build the full diffusion model using Keras. This model is used in machine learning for tasks such as image denoising. The function takes the shape of the input images and the dimensionality of the model as arguments. It first creates input layers for images and step encoding.

Then, it adds noise to the images and flattens and concatenates the inputs. The denoising network is then applied to the noisy images. The function returns the built model.

### 10.2.5 Compiling the Model

To compile the diffusion model, we need to specify an optimizer and a loss function. The mean squared error (MSE) loss function is commonly used for training diffusion models, as it measures the difference between the predicted noise and the actual noise.

**Example: Compiling the Model**

`from tensorflow.keras.optimizers import Adam`

from tensorflow.keras.losses import MeanSquaredError

# Compile the diffusion model

diffusion_model.compile(optimizer=Adam(learning_rate=1e-4), loss=MeanSquaredError())

# Print the model summary

diffusion_model.summary()

The code uses the Tensorflow and Keras libraries. It's used to compile a machine learning model called `diffusion_model`

with specific configurations. The Adam optimization algorithm is selected with a learning rate of 0.0001. The loss function, which measures how well the model is performing, is set to Mean Squared Error (MSE). After setting these configurations, the model is compiled and the summary of the model's architecture is printed.

**Summary**

In this section, we successfully created the diffusion model for our image generation project. We started by implementing the noise addition layer, which simulates the forward diffusion process. Next, we built a denoising network using a Convolutional Neural Network (CNN) to predict and remove noise from the images. We also implemented step encoding to provide temporal information to the denoising network.

Combining these components, we constructed the full diffusion model, which iteratively denoises the input images. Finally, we compiled the model with an appropriate optimizer and loss function, preparing it for training.

With our model ready, we can now move on to the next step: training the diffusion model on the prepared data. In the following sections, we will train the model, generate images, and evaluate its performance, providing a comprehensive understanding of how to apply diffusion models to real-world image generation tasks.

## 10.2 Model Creation

### 10.2.1 Noise Addition Layer

**Example: Noise Addition Layer**

`import tensorflow as tf`

from tensorflow.keras.layers import Layer

class NoiseAddition(Layer):

def __init__(self, noise_scale=0.1, **kwargs):

super(NoiseAddition, self).__init__(**kwargs)

self.noise_scale = noise_scale

def call(self, inputs, training=None):

if training:

noise = tf.random.normal(shape=tf.shape(inputs), mean=0.0, stddev=self.noise_scale, dtype=tf.float32)

return inputs + noise

return inputs

# Example usage with a batch of images

noise_layer = NoiseAddition(noise_scale=0.1)

noisy_images = noise_layer(train_images[:10], training=True)

# Plot original and noisy images for comparison

import matplotlib.pyplot as plt

plt.figure(figsize=(12, 4))

for i in range(10):

plt.subplot(2, 10, i + 1)

plt.imshow((train_images[i] * 0.5) + 0.5)

plt.axis('off')

plt.subplot(2, 10, i + 11)

plt.imshow((noisy_images[i] * 0.5) + 0.5)

plt.axis('off')

plt.show()

`NoiseAddition`

. This class adds random noise to its input data, but only when it's in training mode. The noise is normally distributed with a mean of 0 and a standard deviation specified by `noise_scale`

. The `call`

method checks if the layer is in training mode and if so, adds the noise to the input data.

`NoiseAddition`

layer by creating an instance of it, applying it to a batch of training images, and storing the noisy images. It then plots the original and noisy images for comparison using the `matplotlib`

library.

### 10.2.2 Denoising Network

**Example: Denoising Network**

`from tensorflow.keras.layers import Conv2D, BatchNormalization, LeakyReLU, UpSampling2D`

def build_denoising_network(input_shape):

"""

Builds a denoising network using a Convolutional Neural Network (CNN).

Parameters:

- input_shape: Shape of the input images.

Returns:

- A Keras model for denoising.

"""

inputs = Input(shape=input_shape)

# Encoder

x = Conv2D(64, (3, 3), padding='same')(inputs)

x = BatchNormalization()(x)

x = LeakyReLU()(x)

x = Conv2D(128, (3, 3), padding='same', strides=2)(x)

x = BatchNormalization()(x)

x = LeakyReLU()(x)

# Bottleneck

x = Conv2D(256, (3, 3), padding='same')(x)

x = BatchNormalization()(x)

x = LeakyReLU()(x)

# Decoder

x = UpSampling2D()(x)

x = Conv2D(128, (3, 3), padding='same')(x)

x = BatchNormalization()(x)

x = LeakyReLU()(x)

x = Conv2D(64, (3, 3), padding='same')(x)

x = BatchNormalization()(x)

x = LeakyReLU()(x)

outputs = Conv2D(3, (3, 3), padding='same', activation='tanh')(x)

return Model(inputs, outputs)

# Example usage with CIFAR-10 image shape

input_shape = (32, 32, 3)

denoising_network = build_denoising_network(input_shape)

denoising_network.summary()

The network is divided into three parts: encoder, bottleneck, and decoder.

### 10.2.3 Step Encoding

**Example: Step Encoding**

`def sinusoidal_step_encoding(t, d_model):`

"""

Computes sinusoidal step encoding.

Parameters:

- t: Current time step.

- d_model: Dimensionality of the model.

Returns:

- Sinusoidal step encoding vector.

"""

angle_rates = 1 / np.power(10000, (2 * (np.arange(d_model) // 2)) / np.float32(d_model))

angle_rads = t * angle_rates

angle_rads[:, 0::2] = np.sin(angle_rads[:, 0::2])

angle_rads[:, 1::2] = np.cos(angle_rads[:, 1::2])

return angle_rads

# Example usage with a specific time step and model dimensionality

t = np.arange(10).reshape(-1, 1)

d_model = 128

step_encoding = sinusoidal_step_encoding(t, d_model)

# Print the step encoding

print(step_encoding)

`sinusoidal_step_encoding`

, which calculates a sinusoidal step encoding. This is a technique often used in natural language processing to encode the position of words in a sentence.

The function takes two parameters:

`t`

(the current time step),`d_model`

(the dimensionality of the model).

`angle_rates`

and `angle_rads`

, applying sine to even indices and cosine to odd indices in the `angle_rads`

array. This creates a pattern of sine and cosine waves that provides unique encodings for different positions in a sequence.

`t`

with a range from 0 to 9 (reshaped into a column vector), sets `d_model`

to 128, uses these values to compute the step encoding, and then prints the result.

### 10.2.4 Full Diffusion Model

**Example: Full Diffusion Model**

`from tensorflow.keras.layers import Input, Concatenate`

def build_full_diffusion_model(input_shape, d_model):

"""

Builds the full diffusion model.

Parameters:

- input_shape: Shape of the input images.

- d_model: Dimensionality of the model.

Returns:

- A Keras model for the full diffusion process.

"""

# Input layers for images and step encoding

image_input = Input(shape=input_shape)

step_input = Input(shape=(d_model,))

# Apply noise addition layer

noisy_images = NoiseAddition()(image_input)

# Flatten and concatenate inputs

x = Conv2D(64, (3, 3), padding='same')(noisy_images)

x = BatchNormalization()(x)

x = LeakyReLU()(x)

step_embedding = Dense(np.prod(input_shape))(step_input)

step_embedding = Reshape(input_shape)(step_embedding)

x = Concatenate()([x, step_embedding])

# Apply denoising network

denoised_images = build_denoising_network(input_shape)(x)

return Model([image_input, step_input], denoised_images)

# Example usage with CIFAR-10 image shape

input_shape = (32, 32, 3)

d_model = 128

diffusion_model = build_full_diffusion_model(input_shape, d_model)

diffusion_model.summary()

### 10.2.5 Compiling the Model

**Example: Compiling the Model**

`from tensorflow.keras.optimizers import Adam`

from tensorflow.keras.losses import MeanSquaredError

# Compile the diffusion model

diffusion_model.compile(optimizer=Adam(learning_rate=1e-4), loss=MeanSquaredError())

# Print the model summary

diffusion_model.summary()

`diffusion_model`

with specific configurations. The Adam optimization algorithm is selected with a learning rate of 0.0001. The loss function, which measures how well the model is performing, is set to Mean Squared Error (MSE). After setting these configurations, the model is compiled and the summary of the model's architecture is printed.

**Summary**