Chapter 6: Project: Handwritten Digit Generation with VAEs
6.5 Evaluating the Model
Evaluating generative models like Variational Autoencoders (VAEs) is not as straightforward as evaluating other types of models such as classifiers. With classifiers, we can use metrics like accuracy, precision, and recall because we have a clear target to compare our model's predictions with. But with generative models, we usually don't have a clear target to compare our generated data to.
However, we can still evaluate our model qualitatively and quantitatively.
6.5.1 Qualitative Evaluation
The easiest way to evaluate our model is by visually inspecting the images it generates. We can generate a few samples and see if they look like plausible handwritten digits.
import matplotlib.pyplot as plt
# Generate 25 random digits
digits = generate_digits(decoder, latent_dim, 25)
# Plot the generated digits
fig, axes = plt.subplots(5, 5, figsize=(10,10))
for i, ax in enumerate(axes.flat):
ax.imshow(digits[i].reshape(28, 28), cmap='gray')
ax.axis('off')
plt.show()
The above code generates 25 random handwritten digits using our model and then plots them using matplotlib
. This gives us a quick and easy way to inspect our generated digits and see if they look realistic.
6.5.2 Quantitative Evaluation
For a more objective evaluation, we can use metrics such as the Frechet Inception Distance (FID). The FID measures the distance between the distribution of the generated images and the distribution of the real images. A lower FID indicates that the two distributions are closer, which means our generated images are more similar to the real images.
Here's a simple implementation of FID:
import numpy as np
from scipy.linalg import sqrtm
from keras.applications.inception_v3 import InceptionV3, preprocess_input
from keras.datasets import mnist
from skimage.transform import resize
from numpy import cov, trace, iscomplexobj, asarray
# scale an array of images to a new size using bilinear interpolation
def scale_images(images, new_shape):
images_list = list()
for image in images:
# resize with bilinear interpolation
new_image = resize(image, new_shape, order=1)
# store
images_list.append(new_image)
return asarray(images_list)
# calculate frechet inception distance
def calculate_fid(model, images1, images2):
# preprocess images
images1 = preprocess_input(images1)
images2 = preprocess_input(images2)
# calculate activations
act1 = model.predict(images1)
act2 = model.predict(images2)
# calculate mean and covariance statistics
mu1, sigma1 = act1.mean(axis=0), cov(act1, rowvar=False)
mu2, sigma2 = act2.mean(axis=0), cov(act2, rowvar=False)
# calculate sum squared difference between means
ssdiff = np.sum((mu1 - mu2)**2.0)
# calculate sqrt of product between cov
covmean = sqrtm(sigma1.dot(sigma2))
# check and correct imaginary numbers from sqrt
if iscomplexobj(covmean):
covmean = covmean.real
# calculate score
fid = ssdiff + trace(sigma1 + sigma2 - 2.0 * covmean)
return fid
In the above code, calculate_fid
is the main function that calculates the FID between two sets of images. It first calculates the activations of the images using the InceptionV3 model. Then, it calculates the mean and covariance of the activations
6.5 Evaluating the Model
Evaluating generative models like Variational Autoencoders (VAEs) is not as straightforward as evaluating other types of models such as classifiers. With classifiers, we can use metrics like accuracy, precision, and recall because we have a clear target to compare our model's predictions with. But with generative models, we usually don't have a clear target to compare our generated data to.
However, we can still evaluate our model qualitatively and quantitatively.
6.5.1 Qualitative Evaluation
The easiest way to evaluate our model is by visually inspecting the images it generates. We can generate a few samples and see if they look like plausible handwritten digits.
import matplotlib.pyplot as plt
# Generate 25 random digits
digits = generate_digits(decoder, latent_dim, 25)
# Plot the generated digits
fig, axes = plt.subplots(5, 5, figsize=(10,10))
for i, ax in enumerate(axes.flat):
ax.imshow(digits[i].reshape(28, 28), cmap='gray')
ax.axis('off')
plt.show()
The above code generates 25 random handwritten digits using our model and then plots them using matplotlib
. This gives us a quick and easy way to inspect our generated digits and see if they look realistic.
6.5.2 Quantitative Evaluation
For a more objective evaluation, we can use metrics such as the Frechet Inception Distance (FID). The FID measures the distance between the distribution of the generated images and the distribution of the real images. A lower FID indicates that the two distributions are closer, which means our generated images are more similar to the real images.
Here's a simple implementation of FID:
import numpy as np
from scipy.linalg import sqrtm
from keras.applications.inception_v3 import InceptionV3, preprocess_input
from keras.datasets import mnist
from skimage.transform import resize
from numpy import cov, trace, iscomplexobj, asarray
# scale an array of images to a new size using bilinear interpolation
def scale_images(images, new_shape):
images_list = list()
for image in images:
# resize with bilinear interpolation
new_image = resize(image, new_shape, order=1)
# store
images_list.append(new_image)
return asarray(images_list)
# calculate frechet inception distance
def calculate_fid(model, images1, images2):
# preprocess images
images1 = preprocess_input(images1)
images2 = preprocess_input(images2)
# calculate activations
act1 = model.predict(images1)
act2 = model.predict(images2)
# calculate mean and covariance statistics
mu1, sigma1 = act1.mean(axis=0), cov(act1, rowvar=False)
mu2, sigma2 = act2.mean(axis=0), cov(act2, rowvar=False)
# calculate sum squared difference between means
ssdiff = np.sum((mu1 - mu2)**2.0)
# calculate sqrt of product between cov
covmean = sqrtm(sigma1.dot(sigma2))
# check and correct imaginary numbers from sqrt
if iscomplexobj(covmean):
covmean = covmean.real
# calculate score
fid = ssdiff + trace(sigma1 + sigma2 - 2.0 * covmean)
return fid
In the above code, calculate_fid
is the main function that calculates the FID between two sets of images. It first calculates the activations of the images using the InceptionV3 model. Then, it calculates the mean and covariance of the activations
6.5 Evaluating the Model
Evaluating generative models like Variational Autoencoders (VAEs) is not as straightforward as evaluating other types of models such as classifiers. With classifiers, we can use metrics like accuracy, precision, and recall because we have a clear target to compare our model's predictions with. But with generative models, we usually don't have a clear target to compare our generated data to.
However, we can still evaluate our model qualitatively and quantitatively.
6.5.1 Qualitative Evaluation
The easiest way to evaluate our model is by visually inspecting the images it generates. We can generate a few samples and see if they look like plausible handwritten digits.
import matplotlib.pyplot as plt
# Generate 25 random digits
digits = generate_digits(decoder, latent_dim, 25)
# Plot the generated digits
fig, axes = plt.subplots(5, 5, figsize=(10,10))
for i, ax in enumerate(axes.flat):
ax.imshow(digits[i].reshape(28, 28), cmap='gray')
ax.axis('off')
plt.show()
The above code generates 25 random handwritten digits using our model and then plots them using matplotlib
. This gives us a quick and easy way to inspect our generated digits and see if they look realistic.
6.5.2 Quantitative Evaluation
For a more objective evaluation, we can use metrics such as the Frechet Inception Distance (FID). The FID measures the distance between the distribution of the generated images and the distribution of the real images. A lower FID indicates that the two distributions are closer, which means our generated images are more similar to the real images.
Here's a simple implementation of FID:
import numpy as np
from scipy.linalg import sqrtm
from keras.applications.inception_v3 import InceptionV3, preprocess_input
from keras.datasets import mnist
from skimage.transform import resize
from numpy import cov, trace, iscomplexobj, asarray
# scale an array of images to a new size using bilinear interpolation
def scale_images(images, new_shape):
images_list = list()
for image in images:
# resize with bilinear interpolation
new_image = resize(image, new_shape, order=1)
# store
images_list.append(new_image)
return asarray(images_list)
# calculate frechet inception distance
def calculate_fid(model, images1, images2):
# preprocess images
images1 = preprocess_input(images1)
images2 = preprocess_input(images2)
# calculate activations
act1 = model.predict(images1)
act2 = model.predict(images2)
# calculate mean and covariance statistics
mu1, sigma1 = act1.mean(axis=0), cov(act1, rowvar=False)
mu2, sigma2 = act2.mean(axis=0), cov(act2, rowvar=False)
# calculate sum squared difference between means
ssdiff = np.sum((mu1 - mu2)**2.0)
# calculate sqrt of product between cov
covmean = sqrtm(sigma1.dot(sigma2))
# check and correct imaginary numbers from sqrt
if iscomplexobj(covmean):
covmean = covmean.real
# calculate score
fid = ssdiff + trace(sigma1 + sigma2 - 2.0 * covmean)
return fid
In the above code, calculate_fid
is the main function that calculates the FID between two sets of images. It first calculates the activations of the images using the InceptionV3 model. Then, it calculates the mean and covariance of the activations
6.5 Evaluating the Model
Evaluating generative models like Variational Autoencoders (VAEs) is not as straightforward as evaluating other types of models such as classifiers. With classifiers, we can use metrics like accuracy, precision, and recall because we have a clear target to compare our model's predictions with. But with generative models, we usually don't have a clear target to compare our generated data to.
However, we can still evaluate our model qualitatively and quantitatively.
6.5.1 Qualitative Evaluation
The easiest way to evaluate our model is by visually inspecting the images it generates. We can generate a few samples and see if they look like plausible handwritten digits.
import matplotlib.pyplot as plt
# Generate 25 random digits
digits = generate_digits(decoder, latent_dim, 25)
# Plot the generated digits
fig, axes = plt.subplots(5, 5, figsize=(10,10))
for i, ax in enumerate(axes.flat):
ax.imshow(digits[i].reshape(28, 28), cmap='gray')
ax.axis('off')
plt.show()
The above code generates 25 random handwritten digits using our model and then plots them using matplotlib
. This gives us a quick and easy way to inspect our generated digits and see if they look realistic.
6.5.2 Quantitative Evaluation
For a more objective evaluation, we can use metrics such as the Frechet Inception Distance (FID). The FID measures the distance between the distribution of the generated images and the distribution of the real images. A lower FID indicates that the two distributions are closer, which means our generated images are more similar to the real images.
Here's a simple implementation of FID:
import numpy as np
from scipy.linalg import sqrtm
from keras.applications.inception_v3 import InceptionV3, preprocess_input
from keras.datasets import mnist
from skimage.transform import resize
from numpy import cov, trace, iscomplexobj, asarray
# scale an array of images to a new size using bilinear interpolation
def scale_images(images, new_shape):
images_list = list()
for image in images:
# resize with bilinear interpolation
new_image = resize(image, new_shape, order=1)
# store
images_list.append(new_image)
return asarray(images_list)
# calculate frechet inception distance
def calculate_fid(model, images1, images2):
# preprocess images
images1 = preprocess_input(images1)
images2 = preprocess_input(images2)
# calculate activations
act1 = model.predict(images1)
act2 = model.predict(images2)
# calculate mean and covariance statistics
mu1, sigma1 = act1.mean(axis=0), cov(act1, rowvar=False)
mu2, sigma2 = act2.mean(axis=0), cov(act2, rowvar=False)
# calculate sum squared difference between means
ssdiff = np.sum((mu1 - mu2)**2.0)
# calculate sqrt of product between cov
covmean = sqrtm(sigma1.dot(sigma2))
# check and correct imaginary numbers from sqrt
if iscomplexobj(covmean):
covmean = covmean.real
# calculate score
fid = ssdiff + trace(sigma1 + sigma2 - 2.0 * covmean)
return fid
In the above code, calculate_fid
is the main function that calculates the FID between two sets of images. It first calculates the activations of the images using the InceptionV3 model. Then, it calculates the mean and covariance of the activations