Code icon

The App is Under a Quick Maintenance

We apologize for the inconvenience. Please come back later

Menu iconMenu iconGenerative Deep Learning with Python
Generative Deep Learning with Python

Chapter 6: Project: Handwritten Digit Generation with VAEs

6.1 Data Collection and Preprocessing

Welcome to Chapter 6! In this chapter, we will put theory into practice by working on a project to generate handwritten digits using Variational Autoencoders (VAEs). This project will serve as a hands-on guide and reinforcement of the concepts we have learned in the previous chapter. 

Creating a model capable of generating realistic handwritten digits could find applications in a number of areas, from data augmentation for improving performance of other machine learning models, to solving CAPTCHA challenges, to simply demonstrating the power of generative models in a visually impressive way.

The chapter will proceed as follows:

  1. Data Collection and Preprocessing
  2. Model Creation
  3. Training the VAE
  4. Generating New Digits
  5. Evaluation and Conclusion

Let's get started!

6.1.1 Dataset Selection

For the purpose of this project, we will use the MNIST (Modified National Institute of Standards and Technology) dataset. This is a widely used dataset for machine learning projects and consists of 70,000 28x28 grayscale images of the ten digits, along with their corresponding labels.

The MNIST dataset is easily accessible through several deep-learning libraries such as TensorFlow and PyTorch.

Here's how to load it using TensorFlow:

from tensorflow.keras.datasets import mnist

# the data, split between train and test sets
(x_train, y_train), (x_test, y_test) = mnist.load_data()

6.1.2 Data Preprocessing

The images in the MNIST dataset are grayscale, with pixel values ranging from 0 to 255. Before feeding them into our VAE, we need to normalize these values to a range between 0 and 1. This can easily be achieved by dividing the images by 255.

Additionally, as we will be working with a VAE, we do not need the labels that come with the dataset, so we can safely ignore them.

# Reshape and normalize
x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.

Since the VAE architecture we'll implement takes inputs in a flat, vectorized format, we also need to reshape each image from its original 2D form (28x28 pixels) to a 1D form (784 pixels).

x_train = x_train.reshape((len(x_train), np.prod(x_train.shape[1:])))
x_test = x_test.reshape((len(x_test), np.prod(x_test.shape[1:]))) 

After this preprocessing, our data is ready to be fed into a VAE. In the next sections, we will look into how to design the VAE architecture, train the model, and finally, generate new handwritten digits. 

6.1 Data Collection and Preprocessing

Welcome to Chapter 6! In this chapter, we will put theory into practice by working on a project to generate handwritten digits using Variational Autoencoders (VAEs). This project will serve as a hands-on guide and reinforcement of the concepts we have learned in the previous chapter. 

Creating a model capable of generating realistic handwritten digits could find applications in a number of areas, from data augmentation for improving performance of other machine learning models, to solving CAPTCHA challenges, to simply demonstrating the power of generative models in a visually impressive way.

The chapter will proceed as follows:

  1. Data Collection and Preprocessing
  2. Model Creation
  3. Training the VAE
  4. Generating New Digits
  5. Evaluation and Conclusion

Let's get started!

6.1.1 Dataset Selection

For the purpose of this project, we will use the MNIST (Modified National Institute of Standards and Technology) dataset. This is a widely used dataset for machine learning projects and consists of 70,000 28x28 grayscale images of the ten digits, along with their corresponding labels.

The MNIST dataset is easily accessible through several deep-learning libraries such as TensorFlow and PyTorch.

Here's how to load it using TensorFlow:

from tensorflow.keras.datasets import mnist

# the data, split between train and test sets
(x_train, y_train), (x_test, y_test) = mnist.load_data()

6.1.2 Data Preprocessing

The images in the MNIST dataset are grayscale, with pixel values ranging from 0 to 255. Before feeding them into our VAE, we need to normalize these values to a range between 0 and 1. This can easily be achieved by dividing the images by 255.

Additionally, as we will be working with a VAE, we do not need the labels that come with the dataset, so we can safely ignore them.

# Reshape and normalize
x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.

Since the VAE architecture we'll implement takes inputs in a flat, vectorized format, we also need to reshape each image from its original 2D form (28x28 pixels) to a 1D form (784 pixels).

x_train = x_train.reshape((len(x_train), np.prod(x_train.shape[1:])))
x_test = x_test.reshape((len(x_test), np.prod(x_test.shape[1:]))) 

After this preprocessing, our data is ready to be fed into a VAE. In the next sections, we will look into how to design the VAE architecture, train the model, and finally, generate new handwritten digits. 

6.1 Data Collection and Preprocessing

Welcome to Chapter 6! In this chapter, we will put theory into practice by working on a project to generate handwritten digits using Variational Autoencoders (VAEs). This project will serve as a hands-on guide and reinforcement of the concepts we have learned in the previous chapter. 

Creating a model capable of generating realistic handwritten digits could find applications in a number of areas, from data augmentation for improving performance of other machine learning models, to solving CAPTCHA challenges, to simply demonstrating the power of generative models in a visually impressive way.

The chapter will proceed as follows:

  1. Data Collection and Preprocessing
  2. Model Creation
  3. Training the VAE
  4. Generating New Digits
  5. Evaluation and Conclusion

Let's get started!

6.1.1 Dataset Selection

For the purpose of this project, we will use the MNIST (Modified National Institute of Standards and Technology) dataset. This is a widely used dataset for machine learning projects and consists of 70,000 28x28 grayscale images of the ten digits, along with their corresponding labels.

The MNIST dataset is easily accessible through several deep-learning libraries such as TensorFlow and PyTorch.

Here's how to load it using TensorFlow:

from tensorflow.keras.datasets import mnist

# the data, split between train and test sets
(x_train, y_train), (x_test, y_test) = mnist.load_data()

6.1.2 Data Preprocessing

The images in the MNIST dataset are grayscale, with pixel values ranging from 0 to 255. Before feeding them into our VAE, we need to normalize these values to a range between 0 and 1. This can easily be achieved by dividing the images by 255.

Additionally, as we will be working with a VAE, we do not need the labels that come with the dataset, so we can safely ignore them.

# Reshape and normalize
x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.

Since the VAE architecture we'll implement takes inputs in a flat, vectorized format, we also need to reshape each image from its original 2D form (28x28 pixels) to a 1D form (784 pixels).

x_train = x_train.reshape((len(x_train), np.prod(x_train.shape[1:])))
x_test = x_test.reshape((len(x_test), np.prod(x_test.shape[1:]))) 

After this preprocessing, our data is ready to be fed into a VAE. In the next sections, we will look into how to design the VAE architecture, train the model, and finally, generate new handwritten digits. 

6.1 Data Collection and Preprocessing

Welcome to Chapter 6! In this chapter, we will put theory into practice by working on a project to generate handwritten digits using Variational Autoencoders (VAEs). This project will serve as a hands-on guide and reinforcement of the concepts we have learned in the previous chapter. 

Creating a model capable of generating realistic handwritten digits could find applications in a number of areas, from data augmentation for improving performance of other machine learning models, to solving CAPTCHA challenges, to simply demonstrating the power of generative models in a visually impressive way.

The chapter will proceed as follows:

  1. Data Collection and Preprocessing
  2. Model Creation
  3. Training the VAE
  4. Generating New Digits
  5. Evaluation and Conclusion

Let's get started!

6.1.1 Dataset Selection

For the purpose of this project, we will use the MNIST (Modified National Institute of Standards and Technology) dataset. This is a widely used dataset for machine learning projects and consists of 70,000 28x28 grayscale images of the ten digits, along with their corresponding labels.

The MNIST dataset is easily accessible through several deep-learning libraries such as TensorFlow and PyTorch.

Here's how to load it using TensorFlow:

from tensorflow.keras.datasets import mnist

# the data, split between train and test sets
(x_train, y_train), (x_test, y_test) = mnist.load_data()

6.1.2 Data Preprocessing

The images in the MNIST dataset are grayscale, with pixel values ranging from 0 to 255. Before feeding them into our VAE, we need to normalize these values to a range between 0 and 1. This can easily be achieved by dividing the images by 255.

Additionally, as we will be working with a VAE, we do not need the labels that come with the dataset, so we can safely ignore them.

# Reshape and normalize
x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.

Since the VAE architecture we'll implement takes inputs in a flat, vectorized format, we also need to reshape each image from its original 2D form (28x28 pixels) to a 1D form (784 pixels).

x_train = x_train.reshape((len(x_train), np.prod(x_train.shape[1:])))
x_test = x_test.reshape((len(x_test), np.prod(x_test.shape[1:]))) 

After this preprocessing, our data is ready to be fed into a VAE. In the next sections, we will look into how to design the VAE architecture, train the model, and finally, generate new handwritten digits.