Menu iconMenu iconGenerative Deep Learning Updated Edition
Generative Deep Learning Updated Edition

Chapter 6: Project: Handwritten Digit Generation with VAEs

6.1 Data Collection and Preprocessing

In this chapter, we will undertake a practical project to generate handwritten digits using Variational Autoencoders (VAEs). This project will provide a hands-on experience with the entire VAE workflow, from data collection and preprocessing to building, training, and evaluating the model. By the end of this chapter, you will have a comprehensive understanding of how to apply VAEs to real-world data and generate high-quality images.

Our project will focus on the MNIST dataset, a benchmark dataset of handwritten digits commonly used in machine learning. The MNIST dataset contains 70,000 grayscale images of digits (0-9), each of size 28x28 pixels. We will leverage the power of VAEs to learn the underlying distribution of these digits and generate new, realistic samples.

We will cover the following topics in this chapter:

  1. Data Collection and Preprocessing
  2. Model Creation
  3. Training the VAE
  4. Generating New Handwritten Digits
  5. Evaluating the Model

Let's begin with the first step of our project: data collection and preprocessing.

Data collection and preprocessing are critical steps in any machine learning project. Properly prepared data ensures that the model can learn effectively and generalize well to new data. In this section, we will focus on collecting the MNIST dataset and preprocessing it to make it suitable for training our VAE.

6.1.1 Collecting the MNIST Dataset

The MNIST dataset is readily available in many machine learning libraries, including TensorFlow and Keras. We will use TensorFlow to download and load the dataset. The dataset is divided into a training set of 60,000 images and a test set of 10,000 images.

Example: Loading the MNIST Dataset

import tensorflow as tf

# Load the MNIST dataset
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

# Print the shape of the datasets
print(f"Training data shape: {x_train.shape}")
print(f"Test data shape: {x_test.shape}")

The example code is for importing the TensorFlow library, loading the MNIST dataset, and printing out the shapes of the training and test datasets. The MNIST dataset, commonly used for machine learning and computer vision tasks, is a large database of handwritten digits. The shapes of the datasets represent the dimensions of the data arrays, with the training set usually larger than the test set.

6.1.2 Preprocessing the Data

Preprocessing the data involves several steps:

  1. Normalizing the pixel values.
  2. Reshaping the data to fit the input requirements of the VAE.
  3. (Optional) Applying data augmentation techniques.

Normalization:

Normalization scales the pixel values to the range [0, 1], which helps the model converge faster and perform better.

Reshaping:

The VAE expects the input data to be in a specific shape. For the MNIST dataset, each image is 28x28 pixels. We need to flatten these images into vectors of length 784 (28 * 28).

Data Augmentation:

Data augmentation can enhance the dataset by creating modified versions of the existing images, such as rotated or shifted images. This step is optional but can improve the model's robustness.

Example: Preprocessing the Data

import numpy as np

# Normalize the pixel values to the range [0, 1]
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0

# Reshape the data to (num_samples, num_features)
x_train = x_train.reshape((x_train.shape[0], -1))
x_test = x_test.reshape((x_test.shape[0], -1))

# Print the shape of the reshaped datasets
print(f"Reshaped training data shape: {x_train.shape}")
print(f"Reshaped test data shape: {x_test.shape}")

It first imports the numpy library, which is used for numerical operations. Then it normalizes the pixel values of x_train and x_test datasets by converting the data type to 'float32' and dividing by 255. The normalization process ensures that the pixel values are within the range [0, 1], which is a common practice for image data before it is fed into a machine learning model.

The next step reshapes the data to two dimensions: (number of samples, number of features). This is done to prepare the data for a machine learning model that expects input in this shape. Finally, it prints the shape of the reshaped training and test datasets.

6.1.3 Data Augmentation (Optional)

Data augmentation can be performed using various techniques to create new training samples. This step is optional but recommended for improving model performance, especially when working with limited data. In this project, we will focus on basic preprocessing steps and not perform data augmentation.

Example: Data Augmentation (Optional)

from tensorflow.keras.preprocessing.image import ImageDataGenerator

# Create an image data generator with augmentation options
datagen = ImageDataGenerator(
    rotation_range=10,
    width_shift_range=0.1,
    height_shift_range=0.1
)

# Apply data augmentation to the training data
datagen.fit(x_train.reshape(-1, 28, 28, 1))

# Example of using the data generator
for x_batch, y_batch in datagen.flow(x_train.reshape(-1, 28, 28, 1), y_train, batch_size=32):
    # Visualize the augmented images
    for i in range(9):
        plt.subplot(3, 3, i+1)
        plt.imshow(x_batch[i].reshape(28, 28), cmap='gray')
        plt.axis('off')
    plt.show()
    break

In this code, an ImageDataGenerator is created with options for rotation and shifting the width and height. These options randomize transformations that will be applied to each image, helping the model generalize better.

The datagen.fit line applies the defined augmentation to the training data, 'x_train'.

The last part of the code is an example of how to use the data generator. For each batch of augmented images and their corresponding labels, it visualizes the first nine images. After displaying the first batch of augmented images, it breaks the loop.

Summary

In this section, we successfully collected and preprocessed the MNIST dataset. We normalized the pixel values to the range [0, 1] and reshaped the data to fit the input requirements of the VAE. We also discussed the optional step of data augmentation, which can help improve the model's robustness.

With our data prepared, we are ready to move on to the next step: creating the VAE model. 

6.1 Data Collection and Preprocessing

In this chapter, we will undertake a practical project to generate handwritten digits using Variational Autoencoders (VAEs). This project will provide a hands-on experience with the entire VAE workflow, from data collection and preprocessing to building, training, and evaluating the model. By the end of this chapter, you will have a comprehensive understanding of how to apply VAEs to real-world data and generate high-quality images.

Our project will focus on the MNIST dataset, a benchmark dataset of handwritten digits commonly used in machine learning. The MNIST dataset contains 70,000 grayscale images of digits (0-9), each of size 28x28 pixels. We will leverage the power of VAEs to learn the underlying distribution of these digits and generate new, realistic samples.

We will cover the following topics in this chapter:

  1. Data Collection and Preprocessing
  2. Model Creation
  3. Training the VAE
  4. Generating New Handwritten Digits
  5. Evaluating the Model

Let's begin with the first step of our project: data collection and preprocessing.

Data collection and preprocessing are critical steps in any machine learning project. Properly prepared data ensures that the model can learn effectively and generalize well to new data. In this section, we will focus on collecting the MNIST dataset and preprocessing it to make it suitable for training our VAE.

6.1.1 Collecting the MNIST Dataset

The MNIST dataset is readily available in many machine learning libraries, including TensorFlow and Keras. We will use TensorFlow to download and load the dataset. The dataset is divided into a training set of 60,000 images and a test set of 10,000 images.

Example: Loading the MNIST Dataset

import tensorflow as tf

# Load the MNIST dataset
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

# Print the shape of the datasets
print(f"Training data shape: {x_train.shape}")
print(f"Test data shape: {x_test.shape}")

The example code is for importing the TensorFlow library, loading the MNIST dataset, and printing out the shapes of the training and test datasets. The MNIST dataset, commonly used for machine learning and computer vision tasks, is a large database of handwritten digits. The shapes of the datasets represent the dimensions of the data arrays, with the training set usually larger than the test set.

6.1.2 Preprocessing the Data

Preprocessing the data involves several steps:

  1. Normalizing the pixel values.
  2. Reshaping the data to fit the input requirements of the VAE.
  3. (Optional) Applying data augmentation techniques.

Normalization:

Normalization scales the pixel values to the range [0, 1], which helps the model converge faster and perform better.

Reshaping:

The VAE expects the input data to be in a specific shape. For the MNIST dataset, each image is 28x28 pixels. We need to flatten these images into vectors of length 784 (28 * 28).

Data Augmentation:

Data augmentation can enhance the dataset by creating modified versions of the existing images, such as rotated or shifted images. This step is optional but can improve the model's robustness.

Example: Preprocessing the Data

import numpy as np

# Normalize the pixel values to the range [0, 1]
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0

# Reshape the data to (num_samples, num_features)
x_train = x_train.reshape((x_train.shape[0], -1))
x_test = x_test.reshape((x_test.shape[0], -1))

# Print the shape of the reshaped datasets
print(f"Reshaped training data shape: {x_train.shape}")
print(f"Reshaped test data shape: {x_test.shape}")

It first imports the numpy library, which is used for numerical operations. Then it normalizes the pixel values of x_train and x_test datasets by converting the data type to 'float32' and dividing by 255. The normalization process ensures that the pixel values are within the range [0, 1], which is a common practice for image data before it is fed into a machine learning model.

The next step reshapes the data to two dimensions: (number of samples, number of features). This is done to prepare the data for a machine learning model that expects input in this shape. Finally, it prints the shape of the reshaped training and test datasets.

6.1.3 Data Augmentation (Optional)

Data augmentation can be performed using various techniques to create new training samples. This step is optional but recommended for improving model performance, especially when working with limited data. In this project, we will focus on basic preprocessing steps and not perform data augmentation.

Example: Data Augmentation (Optional)

from tensorflow.keras.preprocessing.image import ImageDataGenerator

# Create an image data generator with augmentation options
datagen = ImageDataGenerator(
    rotation_range=10,
    width_shift_range=0.1,
    height_shift_range=0.1
)

# Apply data augmentation to the training data
datagen.fit(x_train.reshape(-1, 28, 28, 1))

# Example of using the data generator
for x_batch, y_batch in datagen.flow(x_train.reshape(-1, 28, 28, 1), y_train, batch_size=32):
    # Visualize the augmented images
    for i in range(9):
        plt.subplot(3, 3, i+1)
        plt.imshow(x_batch[i].reshape(28, 28), cmap='gray')
        plt.axis('off')
    plt.show()
    break

In this code, an ImageDataGenerator is created with options for rotation and shifting the width and height. These options randomize transformations that will be applied to each image, helping the model generalize better.

The datagen.fit line applies the defined augmentation to the training data, 'x_train'.

The last part of the code is an example of how to use the data generator. For each batch of augmented images and their corresponding labels, it visualizes the first nine images. After displaying the first batch of augmented images, it breaks the loop.

Summary

In this section, we successfully collected and preprocessed the MNIST dataset. We normalized the pixel values to the range [0, 1] and reshaped the data to fit the input requirements of the VAE. We also discussed the optional step of data augmentation, which can help improve the model's robustness.

With our data prepared, we are ready to move on to the next step: creating the VAE model. 

6.1 Data Collection and Preprocessing

In this chapter, we will undertake a practical project to generate handwritten digits using Variational Autoencoders (VAEs). This project will provide a hands-on experience with the entire VAE workflow, from data collection and preprocessing to building, training, and evaluating the model. By the end of this chapter, you will have a comprehensive understanding of how to apply VAEs to real-world data and generate high-quality images.

Our project will focus on the MNIST dataset, a benchmark dataset of handwritten digits commonly used in machine learning. The MNIST dataset contains 70,000 grayscale images of digits (0-9), each of size 28x28 pixels. We will leverage the power of VAEs to learn the underlying distribution of these digits and generate new, realistic samples.

We will cover the following topics in this chapter:

  1. Data Collection and Preprocessing
  2. Model Creation
  3. Training the VAE
  4. Generating New Handwritten Digits
  5. Evaluating the Model

Let's begin with the first step of our project: data collection and preprocessing.

Data collection and preprocessing are critical steps in any machine learning project. Properly prepared data ensures that the model can learn effectively and generalize well to new data. In this section, we will focus on collecting the MNIST dataset and preprocessing it to make it suitable for training our VAE.

6.1.1 Collecting the MNIST Dataset

The MNIST dataset is readily available in many machine learning libraries, including TensorFlow and Keras. We will use TensorFlow to download and load the dataset. The dataset is divided into a training set of 60,000 images and a test set of 10,000 images.

Example: Loading the MNIST Dataset

import tensorflow as tf

# Load the MNIST dataset
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

# Print the shape of the datasets
print(f"Training data shape: {x_train.shape}")
print(f"Test data shape: {x_test.shape}")

The example code is for importing the TensorFlow library, loading the MNIST dataset, and printing out the shapes of the training and test datasets. The MNIST dataset, commonly used for machine learning and computer vision tasks, is a large database of handwritten digits. The shapes of the datasets represent the dimensions of the data arrays, with the training set usually larger than the test set.

6.1.2 Preprocessing the Data

Preprocessing the data involves several steps:

  1. Normalizing the pixel values.
  2. Reshaping the data to fit the input requirements of the VAE.
  3. (Optional) Applying data augmentation techniques.

Normalization:

Normalization scales the pixel values to the range [0, 1], which helps the model converge faster and perform better.

Reshaping:

The VAE expects the input data to be in a specific shape. For the MNIST dataset, each image is 28x28 pixels. We need to flatten these images into vectors of length 784 (28 * 28).

Data Augmentation:

Data augmentation can enhance the dataset by creating modified versions of the existing images, such as rotated or shifted images. This step is optional but can improve the model's robustness.

Example: Preprocessing the Data

import numpy as np

# Normalize the pixel values to the range [0, 1]
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0

# Reshape the data to (num_samples, num_features)
x_train = x_train.reshape((x_train.shape[0], -1))
x_test = x_test.reshape((x_test.shape[0], -1))

# Print the shape of the reshaped datasets
print(f"Reshaped training data shape: {x_train.shape}")
print(f"Reshaped test data shape: {x_test.shape}")

It first imports the numpy library, which is used for numerical operations. Then it normalizes the pixel values of x_train and x_test datasets by converting the data type to 'float32' and dividing by 255. The normalization process ensures that the pixel values are within the range [0, 1], which is a common practice for image data before it is fed into a machine learning model.

The next step reshapes the data to two dimensions: (number of samples, number of features). This is done to prepare the data for a machine learning model that expects input in this shape. Finally, it prints the shape of the reshaped training and test datasets.

6.1.3 Data Augmentation (Optional)

Data augmentation can be performed using various techniques to create new training samples. This step is optional but recommended for improving model performance, especially when working with limited data. In this project, we will focus on basic preprocessing steps and not perform data augmentation.

Example: Data Augmentation (Optional)

from tensorflow.keras.preprocessing.image import ImageDataGenerator

# Create an image data generator with augmentation options
datagen = ImageDataGenerator(
    rotation_range=10,
    width_shift_range=0.1,
    height_shift_range=0.1
)

# Apply data augmentation to the training data
datagen.fit(x_train.reshape(-1, 28, 28, 1))

# Example of using the data generator
for x_batch, y_batch in datagen.flow(x_train.reshape(-1, 28, 28, 1), y_train, batch_size=32):
    # Visualize the augmented images
    for i in range(9):
        plt.subplot(3, 3, i+1)
        plt.imshow(x_batch[i].reshape(28, 28), cmap='gray')
        plt.axis('off')
    plt.show()
    break

In this code, an ImageDataGenerator is created with options for rotation and shifting the width and height. These options randomize transformations that will be applied to each image, helping the model generalize better.

The datagen.fit line applies the defined augmentation to the training data, 'x_train'.

The last part of the code is an example of how to use the data generator. For each batch of augmented images and their corresponding labels, it visualizes the first nine images. After displaying the first batch of augmented images, it breaks the loop.

Summary

In this section, we successfully collected and preprocessed the MNIST dataset. We normalized the pixel values to the range [0, 1] and reshaped the data to fit the input requirements of the VAE. We also discussed the optional step of data augmentation, which can help improve the model's robustness.

With our data prepared, we are ready to move on to the next step: creating the VAE model. 

6.1 Data Collection and Preprocessing

In this chapter, we will undertake a practical project to generate handwritten digits using Variational Autoencoders (VAEs). This project will provide a hands-on experience with the entire VAE workflow, from data collection and preprocessing to building, training, and evaluating the model. By the end of this chapter, you will have a comprehensive understanding of how to apply VAEs to real-world data and generate high-quality images.

Our project will focus on the MNIST dataset, a benchmark dataset of handwritten digits commonly used in machine learning. The MNIST dataset contains 70,000 grayscale images of digits (0-9), each of size 28x28 pixels. We will leverage the power of VAEs to learn the underlying distribution of these digits and generate new, realistic samples.

We will cover the following topics in this chapter:

  1. Data Collection and Preprocessing
  2. Model Creation
  3. Training the VAE
  4. Generating New Handwritten Digits
  5. Evaluating the Model

Let's begin with the first step of our project: data collection and preprocessing.

Data collection and preprocessing are critical steps in any machine learning project. Properly prepared data ensures that the model can learn effectively and generalize well to new data. In this section, we will focus on collecting the MNIST dataset and preprocessing it to make it suitable for training our VAE.

6.1.1 Collecting the MNIST Dataset

The MNIST dataset is readily available in many machine learning libraries, including TensorFlow and Keras. We will use TensorFlow to download and load the dataset. The dataset is divided into a training set of 60,000 images and a test set of 10,000 images.

Example: Loading the MNIST Dataset

import tensorflow as tf

# Load the MNIST dataset
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

# Print the shape of the datasets
print(f"Training data shape: {x_train.shape}")
print(f"Test data shape: {x_test.shape}")

The example code is for importing the TensorFlow library, loading the MNIST dataset, and printing out the shapes of the training and test datasets. The MNIST dataset, commonly used for machine learning and computer vision tasks, is a large database of handwritten digits. The shapes of the datasets represent the dimensions of the data arrays, with the training set usually larger than the test set.

6.1.2 Preprocessing the Data

Preprocessing the data involves several steps:

  1. Normalizing the pixel values.
  2. Reshaping the data to fit the input requirements of the VAE.
  3. (Optional) Applying data augmentation techniques.

Normalization:

Normalization scales the pixel values to the range [0, 1], which helps the model converge faster and perform better.

Reshaping:

The VAE expects the input data to be in a specific shape. For the MNIST dataset, each image is 28x28 pixels. We need to flatten these images into vectors of length 784 (28 * 28).

Data Augmentation:

Data augmentation can enhance the dataset by creating modified versions of the existing images, such as rotated or shifted images. This step is optional but can improve the model's robustness.

Example: Preprocessing the Data

import numpy as np

# Normalize the pixel values to the range [0, 1]
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0

# Reshape the data to (num_samples, num_features)
x_train = x_train.reshape((x_train.shape[0], -1))
x_test = x_test.reshape((x_test.shape[0], -1))

# Print the shape of the reshaped datasets
print(f"Reshaped training data shape: {x_train.shape}")
print(f"Reshaped test data shape: {x_test.shape}")

It first imports the numpy library, which is used for numerical operations. Then it normalizes the pixel values of x_train and x_test datasets by converting the data type to 'float32' and dividing by 255. The normalization process ensures that the pixel values are within the range [0, 1], which is a common practice for image data before it is fed into a machine learning model.

The next step reshapes the data to two dimensions: (number of samples, number of features). This is done to prepare the data for a machine learning model that expects input in this shape. Finally, it prints the shape of the reshaped training and test datasets.

6.1.3 Data Augmentation (Optional)

Data augmentation can be performed using various techniques to create new training samples. This step is optional but recommended for improving model performance, especially when working with limited data. In this project, we will focus on basic preprocessing steps and not perform data augmentation.

Example: Data Augmentation (Optional)

from tensorflow.keras.preprocessing.image import ImageDataGenerator

# Create an image data generator with augmentation options
datagen = ImageDataGenerator(
    rotation_range=10,
    width_shift_range=0.1,
    height_shift_range=0.1
)

# Apply data augmentation to the training data
datagen.fit(x_train.reshape(-1, 28, 28, 1))

# Example of using the data generator
for x_batch, y_batch in datagen.flow(x_train.reshape(-1, 28, 28, 1), y_train, batch_size=32):
    # Visualize the augmented images
    for i in range(9):
        plt.subplot(3, 3, i+1)
        plt.imshow(x_batch[i].reshape(28, 28), cmap='gray')
        plt.axis('off')
    plt.show()
    break

In this code, an ImageDataGenerator is created with options for rotation and shifting the width and height. These options randomize transformations that will be applied to each image, helping the model generalize better.

The datagen.fit line applies the defined augmentation to the training data, 'x_train'.

The last part of the code is an example of how to use the data generator. For each batch of augmented images and their corresponding labels, it visualizes the first nine images. After displaying the first batch of augmented images, it breaks the loop.

Summary

In this section, we successfully collected and preprocessed the MNIST dataset. We normalized the pixel values to the range [0, 1] and reshaped the data to fit the input requirements of the VAE. We also discussed the optional step of data augmentation, which can help improve the model's robustness.

With our data prepared, we are ready to move on to the next step: creating the VAE model.