Code icon

The App is Under a Quick Maintenance

We apologize for the inconvenience. Please come back later

Menu iconMenu iconGenerative Deep Learning Updated Edition
Generative Deep Learning Updated Edition

Chapter 6: Project: Handwritten Digit Generation with VAEs

6.2 Model Creation

In this section, we will focus on creating the Variational Autoencoder (VAE) model for generating handwritten digits. The model consists of two main components: the encoder and the decoder. The encoder maps input images to a latent space, while the decoder reconstructs images from the latent space. We will also implement the reparameterization trick to ensure that the model can be trained effectively using gradient descent.

6.2.1 Defining the Encoder

The encoder compresses the input data into a lower-dimensional latent space. It outputs the parameters of the latent distribution, typically the mean and the log variance.

Key Components:

  • Input Layer: Receives the original image data.
  • Dense Layers: Process the input data.
  • Latent Variables: Outputs the mean and log variance of the latent distribution.

Example: Encoder Implementation

import tensorflow as tf
from tensorflow.keras.layers import Input, Dense, Lambda, Layer
from tensorflow.keras.models import Model
from tensorflow.keras import backend as K

# Define the sampling layer using the reparameterization trick
class Sampling(Layer):
    def call(self, inputs):
        z_mean, z_log_var = inputs
        batch = tf.shape(z_mean)[0]
        dim = tf.shape(z_mean)[1]
        epsilon = K.random_normal(shape=(batch, dim))
        return z_mean + K.exp(0.5 * z_log_var) * epsilon

# Build the encoder network
def build_encoder(input_shape, latent_dim):
    inputs = Input(shape=input_shape)
    x = Dense(512, activation='relu')(inputs)
    x = Dense(256, activation='relu')(x)
    z_mean = Dense(latent_dim, name='z_mean')(x)
    z_log_var = Dense(latent_dim, name='z_log_var')(x)
    z = Sampling()([z_mean, z_log_var])
    return Model(inputs, [z_mean, z_log_var, z], name='encoder')

# Define the input shape and latent dimension
input_shape = (784,)
latent_dim = 2

# Build the encoder
encoder = build_encoder(input_shape, latent_dim)
encoder.summary()

This example code is using the TensorFlow library to create an encoder part of a Variational Autoencoder (VAE). It defines a Sampling layer, which uses the reparameterization trick to allow backpropagation through the random sampling operation.

The encoder network is built with dense layers and it generates two outputs, z_mean and z_log_var, which represent the parameters of the latent space distribution. The Sampling layer then uses these parameters to sample a point in the latent space. The encoder model is finally built using the defined input shape and latent dimension.

6.2.2 Defining the Decoder

The decoder reconstructs the input data from the latent variables. It maps the latent space back to the data space, generating new images that resemble the original input.

Key Components:

  • Latent Input: Receives the sampled latent variables.
  • Dense Layers: Transform the latent variables into the output data.
  • Output Layer: Outputs the reconstructed images, typically using a sigmoid activation for pixel values in [0, 1].

Example: Decoder Implementation

# Build the decoder network
def build_decoder(latent_dim, output_shape):
    latent_inputs = Input(shape=(latent_dim,))
    x = Dense(256, activation='relu')(latent_inputs)
    x = Dense(512, activation='relu')(x)
    outputs = Dense(output_shape, activation='sigmoid')(x)
    return Model(latent_inputs, outputs, name='decoder')

# Build the decoder
decoder = build_decoder(latent_dim, input_shape[0])
decoder.summary()

The decoder network is built using the Keras functional API. It starts with an input layer that takes in data of shape latent_dim. This is followed by two dense (or fully connected) layers with 256 and 512 neurons respectively, each using the ReLU (Rectified Linear Unit) activation function. The final layer is another dense layer with output_shape neurons and uses the sigmoid activation function.

After defining this decoder network structure in the build_decoder function, an instance of the decoder is built and its summary (a concise overview of the network's layers and parameters) is printed.

6.2.3 Combining the Encoder and Decoder

Next, we will combine the encoder and decoder to create the VAE model. The VAE takes an input image, encodes it into the latent space, and then decodes it back into an image. The VAE is trained to minimize the reconstruction loss and the KL divergence.

VAE Architecture:

  • Inputs: Original image data.
  • Encoder: Compresses the input data into latent variables.
  • Decoder: Reconstructs the input data from the latent variables.
  • Outputs: Reconstructed images.

Example: VAE Model Implementation

# Define the VAE model
inputs = Input(shape=input_shape)
z_mean, z_log_var, z = encoder(inputs)
outputs = decoder(z)
vae = Model(inputs, outputs, name='vae')
vae.summary()

This code starts defining the input shape, then it creates the encoder part of the model which takes the input and produces the mean, log variance, and a latent vector 'z'. Then, the decoder part of the model takes the latent vector 'z' and produces the output. These components are then combined to form the overall VAE model. The last line of the code displays the summary of the model.

6.2.4 Defining the Loss Function

The loss function for VAEs combines the reconstruction loss and the KL divergence. The reconstruction loss measures how well the decoder can reconstruct the input data, while the KL divergence measures the difference between the learned latent distribution and the prior distribution (usually a standard normal distribution).

Loss Function:

VAE Loss=Reconstruction Loss+KL Divergence

Reconstruction Loss:
Often measured using Binary Cross-Entropy (BCE) when the input data is normalized to [0, 1].

KL Divergence:
Measures the difference between the learned distribution and the prior distribution.

Example: Loss Function Implementation

# Define the VAE loss function
def vae_loss(inputs, outputs, z_mean, z_log_var):
    reconstruction_loss = tf.keras.losses.binary_crossentropy(inputs, outputs)
    reconstruction_loss *= input_shape[0]

    kl_loss = 1 + z_log_var - K.square(z_mean) - K.exp(z_log_var)
    kl_loss = K.sum(kl_loss, axis=-1)
    kl_loss *= -0.5

    return K.mean(reconstruction_loss + kl_loss)

# Compile the VAE model
vae.compile(optimizer='adam', loss=lambda x, y: vae_loss(x, y, z_mean, z_log_var))

The 'vae_loss' function calculates both the reconstruction loss and the KL divergence loss.

  • The 'reconstruction loss' measures how well the VAE can reproduce the input data after encoding and decoding it. It uses binary cross-entropy as the measure of difference between the original and reconstructed inputs.
  • The 'KL divergence loss' measures how much the learned latent variable distribution deviates from the prior distribution (which is a standard normal distribution in this case).

The VAE model is then compiled with the Adam optimizer and the defined loss function.

6.2.5 Training the VAE

Training the VAE involves minimizing the combined loss function using gradient descent. We will use the MNIST dataset to train the VAE, and monitor the training process to ensure the model learns effectively.

Example: Training the VAE

# Train the VAE model
vae.fit(x_train, x_train, epochs=50, batch_size=128, validation_data=(x_test, x_test))

The fit function is being used to train the model for 50 epochs (iterations over the entire dataset) with a batch size of 128 (the number of samples per gradient update). The same data is being used as both the input and target, which is typical for autoencoders. The model's performance is being validated using a separate test dataset.

Summary

In this section, we successfully created the Variational Autoencoder (VAE) model for generating handwritten digits. We defined the encoder and decoder networks, combined them to form the VAE, and implemented the reparameterization trick. We also defined the VAE loss function, which combines the reconstruction loss and the KL divergence, and trained the model using the MNIST dataset.

With the VAE model trained, we are ready to move on to the next step: generating new handwritten digits. 

6.2 Model Creation

In this section, we will focus on creating the Variational Autoencoder (VAE) model for generating handwritten digits. The model consists of two main components: the encoder and the decoder. The encoder maps input images to a latent space, while the decoder reconstructs images from the latent space. We will also implement the reparameterization trick to ensure that the model can be trained effectively using gradient descent.

6.2.1 Defining the Encoder

The encoder compresses the input data into a lower-dimensional latent space. It outputs the parameters of the latent distribution, typically the mean and the log variance.

Key Components:

  • Input Layer: Receives the original image data.
  • Dense Layers: Process the input data.
  • Latent Variables: Outputs the mean and log variance of the latent distribution.

Example: Encoder Implementation

import tensorflow as tf
from tensorflow.keras.layers import Input, Dense, Lambda, Layer
from tensorflow.keras.models import Model
from tensorflow.keras import backend as K

# Define the sampling layer using the reparameterization trick
class Sampling(Layer):
    def call(self, inputs):
        z_mean, z_log_var = inputs
        batch = tf.shape(z_mean)[0]
        dim = tf.shape(z_mean)[1]
        epsilon = K.random_normal(shape=(batch, dim))
        return z_mean + K.exp(0.5 * z_log_var) * epsilon

# Build the encoder network
def build_encoder(input_shape, latent_dim):
    inputs = Input(shape=input_shape)
    x = Dense(512, activation='relu')(inputs)
    x = Dense(256, activation='relu')(x)
    z_mean = Dense(latent_dim, name='z_mean')(x)
    z_log_var = Dense(latent_dim, name='z_log_var')(x)
    z = Sampling()([z_mean, z_log_var])
    return Model(inputs, [z_mean, z_log_var, z], name='encoder')

# Define the input shape and latent dimension
input_shape = (784,)
latent_dim = 2

# Build the encoder
encoder = build_encoder(input_shape, latent_dim)
encoder.summary()

This example code is using the TensorFlow library to create an encoder part of a Variational Autoencoder (VAE). It defines a Sampling layer, which uses the reparameterization trick to allow backpropagation through the random sampling operation.

The encoder network is built with dense layers and it generates two outputs, z_mean and z_log_var, which represent the parameters of the latent space distribution. The Sampling layer then uses these parameters to sample a point in the latent space. The encoder model is finally built using the defined input shape and latent dimension.

6.2.2 Defining the Decoder

The decoder reconstructs the input data from the latent variables. It maps the latent space back to the data space, generating new images that resemble the original input.

Key Components:

  • Latent Input: Receives the sampled latent variables.
  • Dense Layers: Transform the latent variables into the output data.
  • Output Layer: Outputs the reconstructed images, typically using a sigmoid activation for pixel values in [0, 1].

Example: Decoder Implementation

# Build the decoder network
def build_decoder(latent_dim, output_shape):
    latent_inputs = Input(shape=(latent_dim,))
    x = Dense(256, activation='relu')(latent_inputs)
    x = Dense(512, activation='relu')(x)
    outputs = Dense(output_shape, activation='sigmoid')(x)
    return Model(latent_inputs, outputs, name='decoder')

# Build the decoder
decoder = build_decoder(latent_dim, input_shape[0])
decoder.summary()

The decoder network is built using the Keras functional API. It starts with an input layer that takes in data of shape latent_dim. This is followed by two dense (or fully connected) layers with 256 and 512 neurons respectively, each using the ReLU (Rectified Linear Unit) activation function. The final layer is another dense layer with output_shape neurons and uses the sigmoid activation function.

After defining this decoder network structure in the build_decoder function, an instance of the decoder is built and its summary (a concise overview of the network's layers and parameters) is printed.

6.2.3 Combining the Encoder and Decoder

Next, we will combine the encoder and decoder to create the VAE model. The VAE takes an input image, encodes it into the latent space, and then decodes it back into an image. The VAE is trained to minimize the reconstruction loss and the KL divergence.

VAE Architecture:

  • Inputs: Original image data.
  • Encoder: Compresses the input data into latent variables.
  • Decoder: Reconstructs the input data from the latent variables.
  • Outputs: Reconstructed images.

Example: VAE Model Implementation

# Define the VAE model
inputs = Input(shape=input_shape)
z_mean, z_log_var, z = encoder(inputs)
outputs = decoder(z)
vae = Model(inputs, outputs, name='vae')
vae.summary()

This code starts defining the input shape, then it creates the encoder part of the model which takes the input and produces the mean, log variance, and a latent vector 'z'. Then, the decoder part of the model takes the latent vector 'z' and produces the output. These components are then combined to form the overall VAE model. The last line of the code displays the summary of the model.

6.2.4 Defining the Loss Function

The loss function for VAEs combines the reconstruction loss and the KL divergence. The reconstruction loss measures how well the decoder can reconstruct the input data, while the KL divergence measures the difference between the learned latent distribution and the prior distribution (usually a standard normal distribution).

Loss Function:

VAE Loss=Reconstruction Loss+KL Divergence

Reconstruction Loss:
Often measured using Binary Cross-Entropy (BCE) when the input data is normalized to [0, 1].

KL Divergence:
Measures the difference between the learned distribution and the prior distribution.

Example: Loss Function Implementation

# Define the VAE loss function
def vae_loss(inputs, outputs, z_mean, z_log_var):
    reconstruction_loss = tf.keras.losses.binary_crossentropy(inputs, outputs)
    reconstruction_loss *= input_shape[0]

    kl_loss = 1 + z_log_var - K.square(z_mean) - K.exp(z_log_var)
    kl_loss = K.sum(kl_loss, axis=-1)
    kl_loss *= -0.5

    return K.mean(reconstruction_loss + kl_loss)

# Compile the VAE model
vae.compile(optimizer='adam', loss=lambda x, y: vae_loss(x, y, z_mean, z_log_var))

The 'vae_loss' function calculates both the reconstruction loss and the KL divergence loss.

  • The 'reconstruction loss' measures how well the VAE can reproduce the input data after encoding and decoding it. It uses binary cross-entropy as the measure of difference between the original and reconstructed inputs.
  • The 'KL divergence loss' measures how much the learned latent variable distribution deviates from the prior distribution (which is a standard normal distribution in this case).

The VAE model is then compiled with the Adam optimizer and the defined loss function.

6.2.5 Training the VAE

Training the VAE involves minimizing the combined loss function using gradient descent. We will use the MNIST dataset to train the VAE, and monitor the training process to ensure the model learns effectively.

Example: Training the VAE

# Train the VAE model
vae.fit(x_train, x_train, epochs=50, batch_size=128, validation_data=(x_test, x_test))

The fit function is being used to train the model for 50 epochs (iterations over the entire dataset) with a batch size of 128 (the number of samples per gradient update). The same data is being used as both the input and target, which is typical for autoencoders. The model's performance is being validated using a separate test dataset.

Summary

In this section, we successfully created the Variational Autoencoder (VAE) model for generating handwritten digits. We defined the encoder and decoder networks, combined them to form the VAE, and implemented the reparameterization trick. We also defined the VAE loss function, which combines the reconstruction loss and the KL divergence, and trained the model using the MNIST dataset.

With the VAE model trained, we are ready to move on to the next step: generating new handwritten digits. 

6.2 Model Creation

In this section, we will focus on creating the Variational Autoencoder (VAE) model for generating handwritten digits. The model consists of two main components: the encoder and the decoder. The encoder maps input images to a latent space, while the decoder reconstructs images from the latent space. We will also implement the reparameterization trick to ensure that the model can be trained effectively using gradient descent.

6.2.1 Defining the Encoder

The encoder compresses the input data into a lower-dimensional latent space. It outputs the parameters of the latent distribution, typically the mean and the log variance.

Key Components:

  • Input Layer: Receives the original image data.
  • Dense Layers: Process the input data.
  • Latent Variables: Outputs the mean and log variance of the latent distribution.

Example: Encoder Implementation

import tensorflow as tf
from tensorflow.keras.layers import Input, Dense, Lambda, Layer
from tensorflow.keras.models import Model
from tensorflow.keras import backend as K

# Define the sampling layer using the reparameterization trick
class Sampling(Layer):
    def call(self, inputs):
        z_mean, z_log_var = inputs
        batch = tf.shape(z_mean)[0]
        dim = tf.shape(z_mean)[1]
        epsilon = K.random_normal(shape=(batch, dim))
        return z_mean + K.exp(0.5 * z_log_var) * epsilon

# Build the encoder network
def build_encoder(input_shape, latent_dim):
    inputs = Input(shape=input_shape)
    x = Dense(512, activation='relu')(inputs)
    x = Dense(256, activation='relu')(x)
    z_mean = Dense(latent_dim, name='z_mean')(x)
    z_log_var = Dense(latent_dim, name='z_log_var')(x)
    z = Sampling()([z_mean, z_log_var])
    return Model(inputs, [z_mean, z_log_var, z], name='encoder')

# Define the input shape and latent dimension
input_shape = (784,)
latent_dim = 2

# Build the encoder
encoder = build_encoder(input_shape, latent_dim)
encoder.summary()

This example code is using the TensorFlow library to create an encoder part of a Variational Autoencoder (VAE). It defines a Sampling layer, which uses the reparameterization trick to allow backpropagation through the random sampling operation.

The encoder network is built with dense layers and it generates two outputs, z_mean and z_log_var, which represent the parameters of the latent space distribution. The Sampling layer then uses these parameters to sample a point in the latent space. The encoder model is finally built using the defined input shape and latent dimension.

6.2.2 Defining the Decoder

The decoder reconstructs the input data from the latent variables. It maps the latent space back to the data space, generating new images that resemble the original input.

Key Components:

  • Latent Input: Receives the sampled latent variables.
  • Dense Layers: Transform the latent variables into the output data.
  • Output Layer: Outputs the reconstructed images, typically using a sigmoid activation for pixel values in [0, 1].

Example: Decoder Implementation

# Build the decoder network
def build_decoder(latent_dim, output_shape):
    latent_inputs = Input(shape=(latent_dim,))
    x = Dense(256, activation='relu')(latent_inputs)
    x = Dense(512, activation='relu')(x)
    outputs = Dense(output_shape, activation='sigmoid')(x)
    return Model(latent_inputs, outputs, name='decoder')

# Build the decoder
decoder = build_decoder(latent_dim, input_shape[0])
decoder.summary()

The decoder network is built using the Keras functional API. It starts with an input layer that takes in data of shape latent_dim. This is followed by two dense (or fully connected) layers with 256 and 512 neurons respectively, each using the ReLU (Rectified Linear Unit) activation function. The final layer is another dense layer with output_shape neurons and uses the sigmoid activation function.

After defining this decoder network structure in the build_decoder function, an instance of the decoder is built and its summary (a concise overview of the network's layers and parameters) is printed.

6.2.3 Combining the Encoder and Decoder

Next, we will combine the encoder and decoder to create the VAE model. The VAE takes an input image, encodes it into the latent space, and then decodes it back into an image. The VAE is trained to minimize the reconstruction loss and the KL divergence.

VAE Architecture:

  • Inputs: Original image data.
  • Encoder: Compresses the input data into latent variables.
  • Decoder: Reconstructs the input data from the latent variables.
  • Outputs: Reconstructed images.

Example: VAE Model Implementation

# Define the VAE model
inputs = Input(shape=input_shape)
z_mean, z_log_var, z = encoder(inputs)
outputs = decoder(z)
vae = Model(inputs, outputs, name='vae')
vae.summary()

This code starts defining the input shape, then it creates the encoder part of the model which takes the input and produces the mean, log variance, and a latent vector 'z'. Then, the decoder part of the model takes the latent vector 'z' and produces the output. These components are then combined to form the overall VAE model. The last line of the code displays the summary of the model.

6.2.4 Defining the Loss Function

The loss function for VAEs combines the reconstruction loss and the KL divergence. The reconstruction loss measures how well the decoder can reconstruct the input data, while the KL divergence measures the difference between the learned latent distribution and the prior distribution (usually a standard normal distribution).

Loss Function:

VAE Loss=Reconstruction Loss+KL Divergence

Reconstruction Loss:
Often measured using Binary Cross-Entropy (BCE) when the input data is normalized to [0, 1].

KL Divergence:
Measures the difference between the learned distribution and the prior distribution.

Example: Loss Function Implementation

# Define the VAE loss function
def vae_loss(inputs, outputs, z_mean, z_log_var):
    reconstruction_loss = tf.keras.losses.binary_crossentropy(inputs, outputs)
    reconstruction_loss *= input_shape[0]

    kl_loss = 1 + z_log_var - K.square(z_mean) - K.exp(z_log_var)
    kl_loss = K.sum(kl_loss, axis=-1)
    kl_loss *= -0.5

    return K.mean(reconstruction_loss + kl_loss)

# Compile the VAE model
vae.compile(optimizer='adam', loss=lambda x, y: vae_loss(x, y, z_mean, z_log_var))

The 'vae_loss' function calculates both the reconstruction loss and the KL divergence loss.

  • The 'reconstruction loss' measures how well the VAE can reproduce the input data after encoding and decoding it. It uses binary cross-entropy as the measure of difference between the original and reconstructed inputs.
  • The 'KL divergence loss' measures how much the learned latent variable distribution deviates from the prior distribution (which is a standard normal distribution in this case).

The VAE model is then compiled with the Adam optimizer and the defined loss function.

6.2.5 Training the VAE

Training the VAE involves minimizing the combined loss function using gradient descent. We will use the MNIST dataset to train the VAE, and monitor the training process to ensure the model learns effectively.

Example: Training the VAE

# Train the VAE model
vae.fit(x_train, x_train, epochs=50, batch_size=128, validation_data=(x_test, x_test))

The fit function is being used to train the model for 50 epochs (iterations over the entire dataset) with a batch size of 128 (the number of samples per gradient update). The same data is being used as both the input and target, which is typical for autoencoders. The model's performance is being validated using a separate test dataset.

Summary

In this section, we successfully created the Variational Autoencoder (VAE) model for generating handwritten digits. We defined the encoder and decoder networks, combined them to form the VAE, and implemented the reparameterization trick. We also defined the VAE loss function, which combines the reconstruction loss and the KL divergence, and trained the model using the MNIST dataset.

With the VAE model trained, we are ready to move on to the next step: generating new handwritten digits. 

6.2 Model Creation

In this section, we will focus on creating the Variational Autoencoder (VAE) model for generating handwritten digits. The model consists of two main components: the encoder and the decoder. The encoder maps input images to a latent space, while the decoder reconstructs images from the latent space. We will also implement the reparameterization trick to ensure that the model can be trained effectively using gradient descent.

6.2.1 Defining the Encoder

The encoder compresses the input data into a lower-dimensional latent space. It outputs the parameters of the latent distribution, typically the mean and the log variance.

Key Components:

  • Input Layer: Receives the original image data.
  • Dense Layers: Process the input data.
  • Latent Variables: Outputs the mean and log variance of the latent distribution.

Example: Encoder Implementation

import tensorflow as tf
from tensorflow.keras.layers import Input, Dense, Lambda, Layer
from tensorflow.keras.models import Model
from tensorflow.keras import backend as K

# Define the sampling layer using the reparameterization trick
class Sampling(Layer):
    def call(self, inputs):
        z_mean, z_log_var = inputs
        batch = tf.shape(z_mean)[0]
        dim = tf.shape(z_mean)[1]
        epsilon = K.random_normal(shape=(batch, dim))
        return z_mean + K.exp(0.5 * z_log_var) * epsilon

# Build the encoder network
def build_encoder(input_shape, latent_dim):
    inputs = Input(shape=input_shape)
    x = Dense(512, activation='relu')(inputs)
    x = Dense(256, activation='relu')(x)
    z_mean = Dense(latent_dim, name='z_mean')(x)
    z_log_var = Dense(latent_dim, name='z_log_var')(x)
    z = Sampling()([z_mean, z_log_var])
    return Model(inputs, [z_mean, z_log_var, z], name='encoder')

# Define the input shape and latent dimension
input_shape = (784,)
latent_dim = 2

# Build the encoder
encoder = build_encoder(input_shape, latent_dim)
encoder.summary()

This example code is using the TensorFlow library to create an encoder part of a Variational Autoencoder (VAE). It defines a Sampling layer, which uses the reparameterization trick to allow backpropagation through the random sampling operation.

The encoder network is built with dense layers and it generates two outputs, z_mean and z_log_var, which represent the parameters of the latent space distribution. The Sampling layer then uses these parameters to sample a point in the latent space. The encoder model is finally built using the defined input shape and latent dimension.

6.2.2 Defining the Decoder

The decoder reconstructs the input data from the latent variables. It maps the latent space back to the data space, generating new images that resemble the original input.

Key Components:

  • Latent Input: Receives the sampled latent variables.
  • Dense Layers: Transform the latent variables into the output data.
  • Output Layer: Outputs the reconstructed images, typically using a sigmoid activation for pixel values in [0, 1].

Example: Decoder Implementation

# Build the decoder network
def build_decoder(latent_dim, output_shape):
    latent_inputs = Input(shape=(latent_dim,))
    x = Dense(256, activation='relu')(latent_inputs)
    x = Dense(512, activation='relu')(x)
    outputs = Dense(output_shape, activation='sigmoid')(x)
    return Model(latent_inputs, outputs, name='decoder')

# Build the decoder
decoder = build_decoder(latent_dim, input_shape[0])
decoder.summary()

The decoder network is built using the Keras functional API. It starts with an input layer that takes in data of shape latent_dim. This is followed by two dense (or fully connected) layers with 256 and 512 neurons respectively, each using the ReLU (Rectified Linear Unit) activation function. The final layer is another dense layer with output_shape neurons and uses the sigmoid activation function.

After defining this decoder network structure in the build_decoder function, an instance of the decoder is built and its summary (a concise overview of the network's layers and parameters) is printed.

6.2.3 Combining the Encoder and Decoder

Next, we will combine the encoder and decoder to create the VAE model. The VAE takes an input image, encodes it into the latent space, and then decodes it back into an image. The VAE is trained to minimize the reconstruction loss and the KL divergence.

VAE Architecture:

  • Inputs: Original image data.
  • Encoder: Compresses the input data into latent variables.
  • Decoder: Reconstructs the input data from the latent variables.
  • Outputs: Reconstructed images.

Example: VAE Model Implementation

# Define the VAE model
inputs = Input(shape=input_shape)
z_mean, z_log_var, z = encoder(inputs)
outputs = decoder(z)
vae = Model(inputs, outputs, name='vae')
vae.summary()

This code starts defining the input shape, then it creates the encoder part of the model which takes the input and produces the mean, log variance, and a latent vector 'z'. Then, the decoder part of the model takes the latent vector 'z' and produces the output. These components are then combined to form the overall VAE model. The last line of the code displays the summary of the model.

6.2.4 Defining the Loss Function

The loss function for VAEs combines the reconstruction loss and the KL divergence. The reconstruction loss measures how well the decoder can reconstruct the input data, while the KL divergence measures the difference between the learned latent distribution and the prior distribution (usually a standard normal distribution).

Loss Function:

VAE Loss=Reconstruction Loss+KL Divergence

Reconstruction Loss:
Often measured using Binary Cross-Entropy (BCE) when the input data is normalized to [0, 1].

KL Divergence:
Measures the difference between the learned distribution and the prior distribution.

Example: Loss Function Implementation

# Define the VAE loss function
def vae_loss(inputs, outputs, z_mean, z_log_var):
    reconstruction_loss = tf.keras.losses.binary_crossentropy(inputs, outputs)
    reconstruction_loss *= input_shape[0]

    kl_loss = 1 + z_log_var - K.square(z_mean) - K.exp(z_log_var)
    kl_loss = K.sum(kl_loss, axis=-1)
    kl_loss *= -0.5

    return K.mean(reconstruction_loss + kl_loss)

# Compile the VAE model
vae.compile(optimizer='adam', loss=lambda x, y: vae_loss(x, y, z_mean, z_log_var))

The 'vae_loss' function calculates both the reconstruction loss and the KL divergence loss.

  • The 'reconstruction loss' measures how well the VAE can reproduce the input data after encoding and decoding it. It uses binary cross-entropy as the measure of difference between the original and reconstructed inputs.
  • The 'KL divergence loss' measures how much the learned latent variable distribution deviates from the prior distribution (which is a standard normal distribution in this case).

The VAE model is then compiled with the Adam optimizer and the defined loss function.

6.2.5 Training the VAE

Training the VAE involves minimizing the combined loss function using gradient descent. We will use the MNIST dataset to train the VAE, and monitor the training process to ensure the model learns effectively.

Example: Training the VAE

# Train the VAE model
vae.fit(x_train, x_train, epochs=50, batch_size=128, validation_data=(x_test, x_test))

The fit function is being used to train the model for 50 epochs (iterations over the entire dataset) with a batch size of 128 (the number of samples per gradient update). The same data is being used as both the input and target, which is typical for autoencoders. The model's performance is being validated using a separate test dataset.

Summary

In this section, we successfully created the Variational Autoencoder (VAE) model for generating handwritten digits. We defined the encoder and decoder networks, combined them to form the VAE, and implemented the reparameterization trick. We also defined the VAE loss function, which combines the reconstruction loss and the KL divergence, and trained the model using the MNIST dataset.

With the VAE model trained, we are ready to move on to the next step: generating new handwritten digits.