Chapter 5: Exploring Variational Autoencoders (VAEs)

5.3 Training VAEs

As we previously touched upon in section 5.1, the process of training a Variational Autoencoder (VAE), a type of generative model, involves several essential and carefully sequenced steps. These steps are dataset preparation, defining the architecture of the model, implementing the loss function, and performing model optimization.

In this section, we plan to explore each of these steps in greater depth, with the aim of providing you with a more comprehensive understanding of the training process. Firstly, we'll look at how to prepare the dataset, ensuring it's in the correct format and split into appropriate subsets for training and validation.

Next, we'll move onto the task of defining the model architecture. This step is all about designing the neural network structure, which includes deciding the number of layers, the types of layers (convolutional, fully connected, etc.), and the connections between them.

Following this, we will turn our attention to the implementation of the loss function. This step involves deciding on the right loss function that can accurately measure the discrepancy between the model's predictions and the actual data.

Finally, we'll dive into the intricacies of model optimization. This involves tuning the model parameters to minimize the loss function, a task often achieved through methods such as stochastic gradient descent or Adam optimization.

By the end of this section, our goal is for you to not only understand each step involved in training a VAE but to also have the necessary knowledge and code snippets to effectively train a VAE on any suitable dataset of your choice.

5.3.1 Preparing the Dataset

The very first and most critical phase in the complex process of training a Variational Autoencoder (VAE), revolves around the meticulous preparation of the dataset. The dataset, in essence, forms the very backbone of the training process. It is the raw material from which the model learns and develops its ability to perform tasks. For the purpose of illustrating this process in a practical context, we will be employing the use of the highly respected and widely recognized MNIST dataset.

The MNIST dataset is a comprehensive and extensive library of handwritten digits. It has, over time, gained substantial recognition and popularity within the machine learning community, particularly for its application in training systems that are geared towards the processing of images.

The MNIST dataset stands out due to its reliability, effectiveness, and the sheer volume of data it encompasses. These qualities make it an invaluable resource not only in the realm of machine learning but also in the broader field of image recognition, artificial intelligence, and computer vision.

Detailed Steps:

Begin by loading the dataset into your environment. This is the first step that will allow you to interact with the data.
Proceed to normalize the pixel values contained in the dataset. This step involves converting the pixel values so that they all fall within a specified range, in this case, between 0 and 1. Normalization is a crucial step as it helps to standardize the data, making it easier for the model to process.
Finally, reshape the data to ensure it aligns with the input requirements of the VAE. This step involves altering the structure of the dataset to ensure it can be effectively ingested by the VAE during the training process.

Example: Preparing the MNIST Dataset

import numpy as np
import tensorflow as tf

# Load the MNIST dataset
(x_train, _), (x_test, _) = tf.keras.datasets.mnist.load_data()

# Normalize the pixel values to the range [0, 1]
x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.

# Reshape the data to (num_samples, num_features)
x_train = x_train.reshape((x_train.shape[0], np.prod(x_train.shape[1:])))
x_test = x_test.reshape((x_test.shape[0], np.prod(x_test.shape[1:])))

print(f"Training data shape: {x_train.shape}")
print(f"Test data shape: {x_test.shape}")

In this example:

Firstly, the necessary libraries are imported. Numpy, a fundamental package for scientific computing with Python, is imported for numerical operations. TensorFlow, a powerful open-source library for machine learning and numerical computation, is also imported.

The next step is to load the MNIST dataset. The MNIST database (Modified National Institute of Standards and Technology database) is a large collection of handwritten digits that is widely used for training and testing in the field of machine learning. The dataset is loaded using the tf.keras.datasets.mnist.load_data() function. This function returns two tuples: one for the training dataset and the other for the test dataset. Each tuple contains a set of images and their corresponding labels. However, since we are only interested in the images (as VAEs are unsupervised learning models), the labels (denoted by underscores '_') are ignored.

Once the MNIST dataset is loaded, the pixel values of the images need to be normalized. Machine learning models often perform better on normalized data. Normalization is a scaling technique where values are shifted and rescaled so they end up ranging between 0 and 1. To normalize the pixel values to the range [0, 1], the code first converts the datatype of the image arrays to 'float32'. This is necessary as the original images are stored as 8-bit integers to save space, which allows pixel values between 0 and 255. By converting the datatype to 'float32', fractional values can be accommodated. The pixel values are then divided by 255 (the maximum possible value for an 8-bit integer), bringing all values within the range [0, 1].

The data is then reshaped. The original MNIST images are 28x28 pixels. However, the VAE expects input in the form of a 1-dimensional array. Therefore, the 2-dimensional images need to be reshaped (or "flattened") into a 1-dimensional array. So, the 28x28 images are reshaped into arrays of length 784.

Finally, the shapes of the training and test datasets are printed out using Python’s built-in print function. This is a useful step to verify that the data has been correctly loaded and preprocessed. It outputs the number of samples and the number of features for each dataset, which is important information to be aware of before training the model.

5.3.2 Defining the VAE Model Architecture

In the subsequent step, we proceed to define the structure of the Variational Autoencoder (VAE), which is predominantly made up of two essential parts - the encoder and the decoder networks. These two networks play crucial roles in the functioning of the VAE.

The encoder network takes in the input data and transforms it into a set of parameters in a latent space. This latent space is unique in that it represents the data not as discrete points, but as a probability distribution.

Following this, the decoder network acts on these parameters, reconstructing the original input data from the encoded form. The entire process allows for a compact and efficient representation of complex data.

Encoder: Compresses the input data into a latent space, producing the mean and log variance of the latent variables.

Decoder: Reconstructs the input data from the latent variables, generating data samples that resemble the original input.

Example: Defining the Encoder and Decoder

from tensorflow.keras.layers import Input, Dense, Lambda, Layer
from tensorflow.keras.models import Model
from tensorflow.keras import backend as K

# Sampling layer using the reparameterization trick
class Sampling(Layer):
    def call(self, inputs):
        z_mean, z_log_var = inputs
        batch = tf.shape(z_mean)[0]
        dim = tf.shape(z_mean)[1]
        epsilon = K.random_normal(shape=(batch, dim))
        return z_mean + K.exp(0.5 * z_log_var) * epsilon

# Encoder network
def build_encoder(input_shape, latent_dim):
    inputs = Input(shape=input_shape)
    x = Dense(512, activation='relu')(inputs)
    x = Dense(256, activation='relu')(x)
    z_mean = Dense(latent_dim, name='z_mean')(x)
    z_log_var = Dense(latent_dim, name='z_log_var')(x)
    z = Sampling()([z_mean, z_log_var])
    return Model(inputs, [z_mean, z_log_var, z], name='encoder')

# Decoder network
def build_decoder(latent_dim, output_shape):
    latent_inputs = Input(shape=(latent_dim,))
    x = Dense(256, activation='relu')(latent_inputs)
    x = Dense(512, activation='relu')(x)
    outputs = Dense(output_shape, activation='sigmoid')(x)
    return Model(latent_inputs, outputs, name='decoder')

# Define the input shape and latent dimension
input_shape = (784,)
latent_dim = 2

# Build the encoder and decoder
encoder = build_encoder(input_shape, latent_dim)
decoder = build_decoder(latent_dim, input_shape[0])

# Define the VAE model
inputs = Input(shape=input_shape)
z_mean, z_log_var, z = encoder(inputs)
outputs = decoder(z)
vae = Model(inputs, outputs, name='vae')
vae.summary()

In this example:

The code starts by importing the necessary modules from TensorFlow, Keras, and Keras backend.

The next section of the code defines a custom Keras Layer class called Sampling. The purpose of this class is to generate a sample from the latent space using the reparameterization trick, a technique used to allow backpropagation to pass through the random sampling process in VAEs.

The Sampling class defines a call method, which is a core method in Keras Layer classes. This method takes as input the mean and log variance of the latent space (represented as z_mean and z_log_var), generates a random tensor epsilon with the same shape as z_mean using Keras' random_normal function, and returns a sample from the latent space distribution using the reparameterization trick formula: z_mean + exp(0.5 * z_log_var) * epsilon.

Following the definition of the Sampling class, the code defines two functions: build_encoder and build_decoder.

The build_encoder function constructs the encoder part of the VAE. The encoder takes an input tensor of a given shape and maps it to a latent space. It consists of two fully connected (Dense) layers with ReLU activation, followed by two Dense layers without activation to output z_mean and z_log_var. These two outputs are then passed to a Sampling layer to generate a sample from the latent space.

In a similar fashion, the build_decoder function builds the decoder part of the VAE. The decoder takes a sample from the latent space and maps it back to the original input space. It consists of two fully connected (Dense) layers with ReLU activation, followed by a Dense layer with sigmoid activation to output the reconstructed input.

Once the Sampling class and the build_encoder and build_decoder functions are defined, the code sets the input shape and latent dimension, constructs the encoder and decoder using these parameters, and then combines them to form the complete VAE.

The VAE model takes an input tensor, passes it through the encoder to get z_mean, z_log_var, and a sample from the latent space (represented as z). This sample z is then passed through the decoder to get the reconstructed input. The entire VAE model is encapsulated as a Keras Model and its structure is printed out using the summary() method.

5.3.3 Implementing the VAE Loss Function

The loss function for Variational Autoencoders (VAEs), is a combination of two distinct components: the reconstruction loss and the Kullback-Leibler (KL) divergence. Each of these components plays a crucial role in the functioning of the VAE.

The reconstruction loss is responsible for measuring the capability of the decoder in reconstructing the original input data from the encoded latent space representation. Essentially, it quantifies the quality of the reconstructed data compared to the original input.

On the other hand, the KL divergence serves as a measure of the difference between the learned latent distribution, which is derived from the input data, and the prior distribution. The prior distribution is typically a standard normal distribution, which is a common choice due to its mathematical tractability and symmetry.

This part of the loss function encourages the learned latent distribution to resemble the prior distribution, which aids in ensuring a well-structured and continuous latent space.

Example: VAE Loss Function

# Define the VAE loss function
def vae_loss(inputs, outputs, z_mean, z_log_var):
    # Reconstruction loss
    reconstruction_loss = tf.keras.losses.binary_crossentropy(inputs, outputs)
    reconstruction_loss *= input_shape[0]

    # KL divergence
    kl_loss = 1 + z_log_var - K.square(z_mean) - K.exp(z_log_var)
    kl_loss = K.sum(kl_loss, axis=-1)
    kl_loss *= -0.5

    # Combine the reconstruction loss and the KL divergence
    return K.mean(reconstruction_loss + kl_loss)

# Compile the VAE model
vae.compile(optimizer='adam', loss=lambda x, y: vae_loss(x, y, z_mean, z_log_var))

In this example:

The VAE loss function defined here, vae_loss, consists of two main parts: the reconstruction_loss, and the kl_loss.

The reconstruction_loss is designed to evaluate how well the VAE's decoder can recreate the original input data. This part of the loss function uses binary cross-entropy as the metric for comparison between the original inputs and the outputs produced by the decoder. Binary cross-entropy is a popular loss function for tasks that involve binary classification, and in this context, it measures the difference between the original input and the reconstruction. The reconstruction loss is then scaled by the size of the input shape, which is represented by input_shape[0].

The kl_loss, on the other hand, represents the Kullback-Leibler divergence, a measure of how one probability distribution diverges from a second, expected probability distribution. In the context of VAEs, the KL divergence measures the difference between the learned latent distribution and the prior distribution, which is typically a standard normal distribution. The KL divergence is computed using the mean (z_mean) and log variance (z_log_var) of the latent distribution and is then scaled by -0.5.

The overall VAE loss is then calculated as the sum of the reconstruction loss and the KL divergence. This combined loss function ensures that the VAE learns to encode the input data in such a way that the decoder can accurately reconstruct the original data, while also ensuring that the learned latent distribution closely matches the prior distribution.

After the loss function definition, the VAE model is compiled using the Adam optimizer and the custom VAE loss function. The Adam optimizer is a popular choice for training deep learning models, known for its efficiency and low memory requirements. The use of a lambda function in the loss argument allows the model to use the custom VAE loss function that requires additional parameters beyond the default (y_true, y_pred) that Keras typically uses for its loss functions.

5.3.4 Training the VAE Model

Having diligently prepared our dataset, defined our model with precision, and meticulously implemented our loss function, we stand on the brink of training our Variational Autoencoder (VAE). This significant step in our process will be undertaken with the utmost care.

Our carefully curated training data will be employed to optimize the parameters of both the encoder and the decoder. This optimization is a crucial step, as it directly influences the performance of our model.

By minimizing the combined loss function, which we have implemented previously, we can ensure the most accurate possible representation of our data. This is the ultimate goal of our training process, and we are now ready to embark on this journey.

Example: Training the VAE

# Train the VAE model
vae.fit(x_train, x_train, epochs=50, batch_size=128, validation_data=(x_test, x_test))

In this example:

The 'fit' method is used to train the model for a specified number of epochs (iterations over the entire dataset), which is 50 in this case. The model is trained using 'x_train' as both the input data and the target output - this is typical for autoencoders, which are trying to reconstruct their input data. The batch size is set to 128, meaning the model weights will be updated after 128 samples. The validation data, used to evaluate the model's performance at the end of each epoch, is 'x_test'.

5.3.5 Monitoring Training Progress

Keeping a close eye on the progress of training is an essential step in the process of developing a machine learning model. By monitoring this, we can gain a clear understanding of how effectively the model is learning from the data and assimilating the patterns it's supposed to.

Not only does this give us insights into the model's current performance, but it also provides us with the information necessary to make any adjustments that might be required to improve its learning process. Among the tools that we can utilize to track the progress of the training are TensorBoard and other visualization tools.

These tools offer a visual representation of the training and validation losses over time, thus providing a more tangible and easy-to-understand overview of the model's learning progress. It's through this careful monitoring and adjustment process that we can ensure our model achieves the best performance possible.

Example: Using TensorBoard for Monitoring

import tensorflow as tf

# Define TensorBoard callback
tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir='./logs')

# Train the VAE model with TensorBoard callback
vae.fit(x_train, x_train, epochs=50, batch_size=128, validation_data=(x_test, x_test), callbacks=[tensorboard_callback])

In this example:

The script begins by importing TensorFlow, a powerful library for numerical computation, particularly suited for machine learning and deep learning tasks.

Next, the script defines a TensorBoard callback. TensorBoard is a tool provided with TensorFlow that allows users to visualize the training process of their models. It can display metrics such as loss and accuracy, as well as more complex visualizations like model graphs or histograms of weights and biases. The callback is defined with a log directory of './logs', meaning that TensorBoard will write metrics and other data to this directory during training.

The vae.fit function call is where the actual training of the VAE model takes place. The arguments to this function specify the details of the training process:

x_train: This is the training data that the model will learn from. In a VAE, the same data is used as both inputs and targets because the model is trying to learn to reconstruct its input data.
epochs=50: This specifies that the training process will consist of 50 epochs. An epoch is one complete pass through the entire training dataset.
batch_size=128: This sets the number of training examples used in one iteration of model weight updates. After the model has seen 128 examples, it will update its weights.
validation_data=(x_test, x_test): This is the data that the model will be evaluated on after each epoch. It's used to monitor the model's performance on data it hasn't been trained on.
callbacks=[tensorboard_callback]: This adds the TensorBoard callback to the training process. With this callback, TensorBoard will record metrics and other data during training, which can be visualized in the TensorBoard interface.

The output of this script will be a trained VAE model that has been monitored using TensorBoard. By using TensorBoard, the user can visualize how the model's loss (and potentially other metrics) changed over the course of training, which can be useful for understanding the model's learning process and diagnosing any potential issues.

5.3.6 Generating New Samples

Once the Variational Autoencoder (VAE) has been successfully trained, it then becomes possible to utilize the decoder component of the VAE to generate completely new samples. This is achieved by performing a sampling operation from the latent space, which is a key component of the VAE's structure.

These samples, which are gathered from the latent space, are then passed through the decoder. The decoder then acts on these samples to produce new, unique outputs. This process thus opens up a wide array of possibilities for generating new data based on the original input.

Example: Generating New Samples

import matplotlib.pyplot as plt
import numpy as np

# Function to generate new samples from the latent space
def generate_samples(decoder, latent_dim, n_samples=10):
    random_latent_vectors = np.random.normal(size=(n_samples, latent_dim))
    generated_images = decoder.predict(random_latent_vectors)
    generated_images = generated_images.reshape((n_samples, 28, 28))
    return generated_images

# Generate and plot new samples
generated_images = generate_samples(decoder, latent_dim)
plt.figure(figsize=(10, 2))
for i in range(generated_images.shape[0]):
    plt.subplot(1, generated_images.shape[0], i + 1)
    plt.imshow(generated_images[i], cmap='gray')
    plt.axis('off')
plt.show()

In this example:

The function 'generate_samples' in the code takes three parameters: a decoder, a latent dimension size, and an optional number of samples to generate (which defaults to 10 if not specified). The latent dimension refers to the abstract space in which the VAE represents the input data, and it is a crucial component of how VAEs function.

The function begins by generating a set of random latent vectors. This is done by drawing from a normal (Gaussian) distribution, using the NumPy function 'np.random.normal'. The size of the generated array is determined by the number of samples and the size of the latent dimension.

These random latent vectors are then passed through the decoder, which has been trained to transform points in the latent space back into images. This is done using the 'predict' function of the decoder. The decoder's output is an array of pixel values, which represent the generated images.

However, the generated images need to be reshaped into a 2D format to be properly displayed as images. This is done using the 'reshape' function from NumPy, transforming the 1D array of pixel values into a 2D array with dimensions 28x28 (the standard size for MNIST dataset images).

Finally, the generated images are displayed using Matplotlib. A figure is created, and for each generated image, a new subplot is added to the figure. The image is displayed in grayscale (as indicated by the 'cmap' parameter set to 'gray'), and the axes are turned off for a cleaner image display.

This code provides a clear example of how VAEs can be used to generate new data that resembles the data they were trained on. It demonstrates the process of sampling from the latent space and how the decoder transforms these samples back into interpretable data. As such, it provides a practical application of VAEs in the field of generative modeling.

Summary

Training Variational Autoencoders (VAEs) involves a series of steps, including preparing the dataset, defining the model architecture, implementing the loss function, and optimizing the model. By carefully following these steps, you can train a VAE to learn meaningful latent representations of data and generate new samples.

The process involves balancing the reconstruction loss and the KL divergence to ensure that the learned latent space is both useful and aligned with the prior distribution. Monitoring the training progress and fine-tuning the model as needed helps in achieving the best possible results.

With the knowledge and skills gained in this section, you are well-equipped to train VAEs on various datasets, unlocking the potential of generative modeling in your projects.

5.3 Training VAEs

As we previously touched upon in section 5.1, the process of training a Variational Autoencoder (VAE), a type of generative model, involves several essential and carefully sequenced steps. These steps are dataset preparation, defining the architecture of the model, implementing the loss function, and performing model optimization.

In this section, we plan to explore each of these steps in greater depth, with the aim of providing you with a more comprehensive understanding of the training process. Firstly, we'll look at how to prepare the dataset, ensuring it's in the correct format and split into appropriate subsets for training and validation.

Next, we'll move onto the task of defining the model architecture. This step is all about designing the neural network structure, which includes deciding the number of layers, the types of layers (convolutional, fully connected, etc.), and the connections between them.

Following this, we will turn our attention to the implementation of the loss function. This step involves deciding on the right loss function that can accurately measure the discrepancy between the model's predictions and the actual data.

Finally, we'll dive into the intricacies of model optimization. This involves tuning the model parameters to minimize the loss function, a task often achieved through methods such as stochastic gradient descent or Adam optimization.

By the end of this section, our goal is for you to not only understand each step involved in training a VAE but to also have the necessary knowledge and code snippets to effectively train a VAE on any suitable dataset of your choice.

5.3.1 Preparing the Dataset

The very first and most critical phase in the complex process of training a Variational Autoencoder (VAE), revolves around the meticulous preparation of the dataset. The dataset, in essence, forms the very backbone of the training process. It is the raw material from which the model learns and develops its ability to perform tasks. For the purpose of illustrating this process in a practical context, we will be employing the use of the highly respected and widely recognized MNIST dataset.

The MNIST dataset is a comprehensive and extensive library of handwritten digits. It has, over time, gained substantial recognition and popularity within the machine learning community, particularly for its application in training systems that are geared towards the processing of images.

The MNIST dataset stands out due to its reliability, effectiveness, and the sheer volume of data it encompasses. These qualities make it an invaluable resource not only in the realm of machine learning but also in the broader field of image recognition, artificial intelligence, and computer vision.

Detailed Steps:

Begin by loading the dataset into your environment. This is the first step that will allow you to interact with the data.
Proceed to normalize the pixel values contained in the dataset. This step involves converting the pixel values so that they all fall within a specified range, in this case, between 0 and 1. Normalization is a crucial step as it helps to standardize the data, making it easier for the model to process.
Finally, reshape the data to ensure it aligns with the input requirements of the VAE. This step involves altering the structure of the dataset to ensure it can be effectively ingested by the VAE during the training process.

Example: Preparing the MNIST Dataset

import numpy as np
import tensorflow as tf

# Load the MNIST dataset
(x_train, _), (x_test, _) = tf.keras.datasets.mnist.load_data()

# Normalize the pixel values to the range [0, 1]
x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.

# Reshape the data to (num_samples, num_features)
x_train = x_train.reshape((x_train.shape[0], np.prod(x_train.shape[1:])))
x_test = x_test.reshape((x_test.shape[0], np.prod(x_test.shape[1:])))

print(f"Training data shape: {x_train.shape}")
print(f"Test data shape: {x_test.shape}")

In this example:

Firstly, the necessary libraries are imported. Numpy, a fundamental package for scientific computing with Python, is imported for numerical operations. TensorFlow, a powerful open-source library for machine learning and numerical computation, is also imported.

The next step is to load the MNIST dataset. The MNIST database (Modified National Institute of Standards and Technology database) is a large collection of handwritten digits that is widely used for training and testing in the field of machine learning. The dataset is loaded using the tf.keras.datasets.mnist.load_data() function. This function returns two tuples: one for the training dataset and the other for the test dataset. Each tuple contains a set of images and their corresponding labels. However, since we are only interested in the images (as VAEs are unsupervised learning models), the labels (denoted by underscores '_') are ignored.

Once the MNIST dataset is loaded, the pixel values of the images need to be normalized. Machine learning models often perform better on normalized data. Normalization is a scaling technique where values are shifted and rescaled so they end up ranging between 0 and 1. To normalize the pixel values to the range [0, 1], the code first converts the datatype of the image arrays to 'float32'. This is necessary as the original images are stored as 8-bit integers to save space, which allows pixel values between 0 and 255. By converting the datatype to 'float32', fractional values can be accommodated. The pixel values are then divided by 255 (the maximum possible value for an 8-bit integer), bringing all values within the range [0, 1].

The data is then reshaped. The original MNIST images are 28x28 pixels. However, the VAE expects input in the form of a 1-dimensional array. Therefore, the 2-dimensional images need to be reshaped (or "flattened") into a 1-dimensional array. So, the 28x28 images are reshaped into arrays of length 784.

Finally, the shapes of the training and test datasets are printed out using Python’s built-in print function. This is a useful step to verify that the data has been correctly loaded and preprocessed. It outputs the number of samples and the number of features for each dataset, which is important information to be aware of before training the model.

5.3.2 Defining the VAE Model Architecture

In the subsequent step, we proceed to define the structure of the Variational Autoencoder (VAE), which is predominantly made up of two essential parts - the encoder and the decoder networks. These two networks play crucial roles in the functioning of the VAE.

The encoder network takes in the input data and transforms it into a set of parameters in a latent space. This latent space is unique in that it represents the data not as discrete points, but as a probability distribution.

Following this, the decoder network acts on these parameters, reconstructing the original input data from the encoded form. The entire process allows for a compact and efficient representation of complex data.

Encoder: Compresses the input data into a latent space, producing the mean and log variance of the latent variables.

Decoder: Reconstructs the input data from the latent variables, generating data samples that resemble the original input.

Example: Defining the Encoder and Decoder

from tensorflow.keras.layers import Input, Dense, Lambda, Layer
from tensorflow.keras.models import Model
from tensorflow.keras import backend as K

# Sampling layer using the reparameterization trick
class Sampling(Layer):
    def call(self, inputs):
        z_mean, z_log_var = inputs
        batch = tf.shape(z_mean)[0]
        dim = tf.shape(z_mean)[1]
        epsilon = K.random_normal(shape=(batch, dim))
        return z_mean + K.exp(0.5 * z_log_var) * epsilon

# Encoder network
def build_encoder(input_shape, latent_dim):
    inputs = Input(shape=input_shape)
    x = Dense(512, activation='relu')(inputs)
    x = Dense(256, activation='relu')(x)
    z_mean = Dense(latent_dim, name='z_mean')(x)
    z_log_var = Dense(latent_dim, name='z_log_var')(x)
    z = Sampling()([z_mean, z_log_var])
    return Model(inputs, [z_mean, z_log_var, z], name='encoder')

# Decoder network
def build_decoder(latent_dim, output_shape):
    latent_inputs = Input(shape=(latent_dim,))
    x = Dense(256, activation='relu')(latent_inputs)
    x = Dense(512, activation='relu')(x)
    outputs = Dense(output_shape, activation='sigmoid')(x)
    return Model(latent_inputs, outputs, name='decoder')

# Define the input shape and latent dimension
input_shape = (784,)
latent_dim = 2

# Build the encoder and decoder
encoder = build_encoder(input_shape, latent_dim)
decoder = build_decoder(latent_dim, input_shape[0])

# Define the VAE model
inputs = Input(shape=input_shape)
z_mean, z_log_var, z = encoder(inputs)
outputs = decoder(z)
vae = Model(inputs, outputs, name='vae')
vae.summary()