Menu iconMenu iconGenerative Deep Learning with Python
Generative Deep Learning with Python

Chapter 6: Project: Handwritten Digit Generation with VAEs

6.5 Evaluating the Model

Evaluating generative models like Variational Autoencoders (VAEs) is not as straightforward as evaluating other types of models such as classifiers. With classifiers, we can use metrics like accuracy, precision, and recall because we have a clear target to compare our model's predictions with. But with generative models, we usually don't have a clear target to compare our generated data to.

However, we can still evaluate our model qualitatively and quantitatively. 

6.5.1 Qualitative Evaluation

The easiest way to evaluate our model is by visually inspecting the images it generates. We can generate a few samples and see if they look like plausible handwritten digits.

import matplotlib.pyplot as plt

# Generate 25 random digits
digits = generate_digits(decoder, latent_dim, 25)

# Plot the generated digits
fig, axes = plt.subplots(5, 5, figsize=(10,10))

for i, ax in enumerate(axes.flat):
    ax.imshow(digits[i].reshape(28, 28), cmap='gray')
    ax.axis('off')

plt.show()

The above code generates 25 random handwritten digits using our model and then plots them using matplotlib. This gives us a quick and easy way to inspect our generated digits and see if they look realistic.

6.5.2 Quantitative Evaluation

For a more objective evaluation, we can use metrics such as the Frechet Inception Distance (FID). The FID measures the distance between the distribution of the generated images and the distribution of the real images. A lower FID indicates that the two distributions are closer, which means our generated images are more similar to the real images.

Here's a simple implementation of FID:

import numpy as np
from scipy.linalg import sqrtm
from keras.applications.inception_v3 import InceptionV3, preprocess_input
from keras.datasets import mnist
from skimage.transform import resize
from numpy import cov, trace, iscomplexobj, asarray

# scale an array of images to a new size using bilinear interpolation
def scale_images(images, new_shape):
    images_list = list()
    for image in images:
        # resize with bilinear interpolation
        new_image = resize(image, new_shape, order=1)
        # store
        images_list.append(new_image)
    return asarray(images_list)

# calculate frechet inception distance
def calculate_fid(model, images1, images2):
    # preprocess images
    images1 = preprocess_input(images1)
    images2 = preprocess_input(images2)
    
    # calculate activations
    act1 = model.predict(images1)
    act2 = model.predict(images2)
    # calculate mean and covariance statistics
    mu1, sigma1 = act1.mean(axis=0), cov(act1, rowvar=False)
    mu2, sigma2 = act2.mean(axis=0), cov(act2, rowvar=False)
    # calculate sum squared difference between means
    ssdiff = np.sum((mu1 - mu2)**2.0)
    # calculate sqrt of product between cov
    covmean = sqrtm(sigma1.dot(sigma2))
    # check and correct imaginary numbers from sqrt
    if iscomplexobj(covmean):
        covmean = covmean.real
    # calculate score
    fid = ssdiff + trace(sigma1 + sigma2 - 2.0 * covmean)
    return fid

In the above code, calculate_fid is the main function that calculates the FID between two sets of images. It first calculates the activations of the images using the InceptionV3 model. Then, it calculates the mean and covariance of the activations

6.5 Evaluating the Model

Evaluating generative models like Variational Autoencoders (VAEs) is not as straightforward as evaluating other types of models such as classifiers. With classifiers, we can use metrics like accuracy, precision, and recall because we have a clear target to compare our model's predictions with. But with generative models, we usually don't have a clear target to compare our generated data to.

However, we can still evaluate our model qualitatively and quantitatively. 

6.5.1 Qualitative Evaluation

The easiest way to evaluate our model is by visually inspecting the images it generates. We can generate a few samples and see if they look like plausible handwritten digits.

import matplotlib.pyplot as plt

# Generate 25 random digits
digits = generate_digits(decoder, latent_dim, 25)

# Plot the generated digits
fig, axes = plt.subplots(5, 5, figsize=(10,10))

for i, ax in enumerate(axes.flat):
    ax.imshow(digits[i].reshape(28, 28), cmap='gray')
    ax.axis('off')

plt.show()

The above code generates 25 random handwritten digits using our model and then plots them using matplotlib. This gives us a quick and easy way to inspect our generated digits and see if they look realistic.

6.5.2 Quantitative Evaluation

For a more objective evaluation, we can use metrics such as the Frechet Inception Distance (FID). The FID measures the distance between the distribution of the generated images and the distribution of the real images. A lower FID indicates that the two distributions are closer, which means our generated images are more similar to the real images.

Here's a simple implementation of FID:

import numpy as np
from scipy.linalg import sqrtm
from keras.applications.inception_v3 import InceptionV3, preprocess_input
from keras.datasets import mnist
from skimage.transform import resize
from numpy import cov, trace, iscomplexobj, asarray

# scale an array of images to a new size using bilinear interpolation
def scale_images(images, new_shape):
    images_list = list()
    for image in images:
        # resize with bilinear interpolation
        new_image = resize(image, new_shape, order=1)
        # store
        images_list.append(new_image)
    return asarray(images_list)

# calculate frechet inception distance
def calculate_fid(model, images1, images2):
    # preprocess images
    images1 = preprocess_input(images1)
    images2 = preprocess_input(images2)
    
    # calculate activations
    act1 = model.predict(images1)
    act2 = model.predict(images2)
    # calculate mean and covariance statistics
    mu1, sigma1 = act1.mean(axis=0), cov(act1, rowvar=False)
    mu2, sigma2 = act2.mean(axis=0), cov(act2, rowvar=False)
    # calculate sum squared difference between means
    ssdiff = np.sum((mu1 - mu2)**2.0)
    # calculate sqrt of product between cov
    covmean = sqrtm(sigma1.dot(sigma2))
    # check and correct imaginary numbers from sqrt
    if iscomplexobj(covmean):
        covmean = covmean.real
    # calculate score
    fid = ssdiff + trace(sigma1 + sigma2 - 2.0 * covmean)
    return fid

In the above code, calculate_fid is the main function that calculates the FID between two sets of images. It first calculates the activations of the images using the InceptionV3 model. Then, it calculates the mean and covariance of the activations

6.5 Evaluating the Model

Evaluating generative models like Variational Autoencoders (VAEs) is not as straightforward as evaluating other types of models such as classifiers. With classifiers, we can use metrics like accuracy, precision, and recall because we have a clear target to compare our model's predictions with. But with generative models, we usually don't have a clear target to compare our generated data to.

However, we can still evaluate our model qualitatively and quantitatively. 

6.5.1 Qualitative Evaluation

The easiest way to evaluate our model is by visually inspecting the images it generates. We can generate a few samples and see if they look like plausible handwritten digits.

import matplotlib.pyplot as plt

# Generate 25 random digits
digits = generate_digits(decoder, latent_dim, 25)

# Plot the generated digits
fig, axes = plt.subplots(5, 5, figsize=(10,10))

for i, ax in enumerate(axes.flat):
    ax.imshow(digits[i].reshape(28, 28), cmap='gray')
    ax.axis('off')

plt.show()

The above code generates 25 random handwritten digits using our model and then plots them using matplotlib. This gives us a quick and easy way to inspect our generated digits and see if they look realistic.

6.5.2 Quantitative Evaluation

For a more objective evaluation, we can use metrics such as the Frechet Inception Distance (FID). The FID measures the distance between the distribution of the generated images and the distribution of the real images. A lower FID indicates that the two distributions are closer, which means our generated images are more similar to the real images.

Here's a simple implementation of FID:

import numpy as np
from scipy.linalg import sqrtm
from keras.applications.inception_v3 import InceptionV3, preprocess_input
from keras.datasets import mnist
from skimage.transform import resize
from numpy import cov, trace, iscomplexobj, asarray

# scale an array of images to a new size using bilinear interpolation
def scale_images(images, new_shape):
    images_list = list()
    for image in images:
        # resize with bilinear interpolation
        new_image = resize(image, new_shape, order=1)
        # store
        images_list.append(new_image)
    return asarray(images_list)

# calculate frechet inception distance
def calculate_fid(model, images1, images2):
    # preprocess images
    images1 = preprocess_input(images1)
    images2 = preprocess_input(images2)
    
    # calculate activations
    act1 = model.predict(images1)
    act2 = model.predict(images2)
    # calculate mean and covariance statistics
    mu1, sigma1 = act1.mean(axis=0), cov(act1, rowvar=False)
    mu2, sigma2 = act2.mean(axis=0), cov(act2, rowvar=False)
    # calculate sum squared difference between means
    ssdiff = np.sum((mu1 - mu2)**2.0)
    # calculate sqrt of product between cov
    covmean = sqrtm(sigma1.dot(sigma2))
    # check and correct imaginary numbers from sqrt
    if iscomplexobj(covmean):
        covmean = covmean.real
    # calculate score
    fid = ssdiff + trace(sigma1 + sigma2 - 2.0 * covmean)
    return fid

In the above code, calculate_fid is the main function that calculates the FID between two sets of images. It first calculates the activations of the images using the InceptionV3 model. Then, it calculates the mean and covariance of the activations

6.5 Evaluating the Model

Evaluating generative models like Variational Autoencoders (VAEs) is not as straightforward as evaluating other types of models such as classifiers. With classifiers, we can use metrics like accuracy, precision, and recall because we have a clear target to compare our model's predictions with. But with generative models, we usually don't have a clear target to compare our generated data to.

However, we can still evaluate our model qualitatively and quantitatively. 

6.5.1 Qualitative Evaluation

The easiest way to evaluate our model is by visually inspecting the images it generates. We can generate a few samples and see if they look like plausible handwritten digits.

import matplotlib.pyplot as plt

# Generate 25 random digits
digits = generate_digits(decoder, latent_dim, 25)

# Plot the generated digits
fig, axes = plt.subplots(5, 5, figsize=(10,10))

for i, ax in enumerate(axes.flat):
    ax.imshow(digits[i].reshape(28, 28), cmap='gray')
    ax.axis('off')

plt.show()

The above code generates 25 random handwritten digits using our model and then plots them using matplotlib. This gives us a quick and easy way to inspect our generated digits and see if they look realistic.

6.5.2 Quantitative Evaluation

For a more objective evaluation, we can use metrics such as the Frechet Inception Distance (FID). The FID measures the distance between the distribution of the generated images and the distribution of the real images. A lower FID indicates that the two distributions are closer, which means our generated images are more similar to the real images.

Here's a simple implementation of FID:

import numpy as np
from scipy.linalg import sqrtm
from keras.applications.inception_v3 import InceptionV3, preprocess_input
from keras.datasets import mnist
from skimage.transform import resize
from numpy import cov, trace, iscomplexobj, asarray

# scale an array of images to a new size using bilinear interpolation
def scale_images(images, new_shape):
    images_list = list()
    for image in images:
        # resize with bilinear interpolation
        new_image = resize(image, new_shape, order=1)
        # store
        images_list.append(new_image)
    return asarray(images_list)

# calculate frechet inception distance
def calculate_fid(model, images1, images2):
    # preprocess images
    images1 = preprocess_input(images1)
    images2 = preprocess_input(images2)
    
    # calculate activations
    act1 = model.predict(images1)
    act2 = model.predict(images2)
    # calculate mean and covariance statistics
    mu1, sigma1 = act1.mean(axis=0), cov(act1, rowvar=False)
    mu2, sigma2 = act2.mean(axis=0), cov(act2, rowvar=False)
    # calculate sum squared difference between means
    ssdiff = np.sum((mu1 - mu2)**2.0)
    # calculate sqrt of product between cov
    covmean = sqrtm(sigma1.dot(sigma2))
    # check and correct imaginary numbers from sqrt
    if iscomplexobj(covmean):
        covmean = covmean.real
    # calculate score
    fid = ssdiff + trace(sigma1 + sigma2 - 2.0 * covmean)
    return fid

In the above code, calculate_fid is the main function that calculates the FID between two sets of images. It first calculates the activations of the images using the InceptionV3 model. Then, it calculates the mean and covariance of the activations