# Chapter 6: Project: Handwritten Digit Generation with VAEs

## 6.5 Evaluating the Model

Evaluating generative models like Variational Autoencoders (VAEs) is not as straightforward as evaluating other types of models such as classifiers. With classifiers, we can use metrics like accuracy, precision, and recall because we have a clear target to compare our model's predictions with. But with generative models, we usually don't have a clear target to compare our generated data to.

However, we can still evaluate our model qualitatively and quantitatively.

**6.5.1 Qualitative Evaluation**

The easiest way to evaluate our model is by visually inspecting the images it generates. We can generate a few samples and see if they look like plausible handwritten digits.

`import matplotlib.pyplot as plt`

# Generate 25 random digits

digits = generate_digits(decoder, latent_dim, 25)

# Plot the generated digits

fig, axes = plt.subplots(5, 5, figsize=(10,10))

for i, ax in enumerate(axes.flat):

ax.imshow(digits[i].reshape(28, 28), cmap='gray')

ax.axis('off')

plt.show()

The above code generates 25 random handwritten digits using our model and then plots them using

. This gives us a quick and easy way to inspect our generated digits and see if they look realistic.**matplotlib**

**6.5.2 Quantitative Evaluation**

For a more objective evaluation, we can use metrics such as the Frechet Inception Distance (FID). The FID measures the distance between the distribution of the generated images and the distribution of the real images. A lower FID indicates that the two distributions are closer, which means our generated images are more similar to the real images.

Here's a simple implementation of FID:

`import numpy as np`

from scipy.linalg import sqrtm

from keras.applications.inception_v3 import InceptionV3, preprocess_input

from keras.datasets import mnist

from skimage.transform import resize

from numpy import cov, trace, iscomplexobj, asarray

# scale an array of images to a new size using bilinear interpolation

def scale_images(images, new_shape):

images_list = list()

for image in images:

# resize with bilinear interpolation

new_image = resize(image, new_shape, order=1)

# store

images_list.append(new_image)

return asarray(images_list)

# calculate frechet inception distance

def calculate_fid(model, images1, images2):

# preprocess images

images1 = preprocess_input(images1)

images2 = preprocess_input(images2)

# calculate activations

act1 = model.predict(images1)

act2 = model.predict(images2)

# calculate mean and covariance statistics

mu1, sigma1 = act1.mean(axis=0), cov(act1, rowvar=False)

mu2, sigma2 = act2.mean(axis=0), cov(act2, rowvar=False)

# calculate sum squared difference between means

ssdiff = np.sum((mu1 - mu2)**2.0)

# calculate sqrt of product between cov

covmean = sqrtm(sigma1.dot(sigma2))

# check and correct imaginary numbers from sqrt

if iscomplexobj(covmean):

covmean = covmean.real

# calculate score

fid = ssdiff + trace(sigma1 + sigma2 - 2.0 * covmean)

return fid

In the above code,

is the main function that calculates the FID between two sets of images. It first calculates the activations of the images using the InceptionV3 model. Then, it calculates the mean and covariance of the activations**calculate_fid**

## 6.5 Evaluating the Model

Evaluating generative models like Variational Autoencoders (VAEs) is not as straightforward as evaluating other types of models such as classifiers. With classifiers, we can use metrics like accuracy, precision, and recall because we have a clear target to compare our model's predictions with. But with generative models, we usually don't have a clear target to compare our generated data to.

However, we can still evaluate our model qualitatively and quantitatively.

**6.5.1 Qualitative Evaluation**

The easiest way to evaluate our model is by visually inspecting the images it generates. We can generate a few samples and see if they look like plausible handwritten digits.

`import matplotlib.pyplot as plt`

# Generate 25 random digits

digits = generate_digits(decoder, latent_dim, 25)

# Plot the generated digits

fig, axes = plt.subplots(5, 5, figsize=(10,10))

for i, ax in enumerate(axes.flat):

ax.imshow(digits[i].reshape(28, 28), cmap='gray')

ax.axis('off')

plt.show()

The above code generates 25 random handwritten digits using our model and then plots them using

. This gives us a quick and easy way to inspect our generated digits and see if they look realistic.**matplotlib**

**6.5.2 Quantitative Evaluation**

For a more objective evaluation, we can use metrics such as the Frechet Inception Distance (FID). The FID measures the distance between the distribution of the generated images and the distribution of the real images. A lower FID indicates that the two distributions are closer, which means our generated images are more similar to the real images.

Here's a simple implementation of FID:

`import numpy as np`

from scipy.linalg import sqrtm

from keras.applications.inception_v3 import InceptionV3, preprocess_input

from keras.datasets import mnist

from skimage.transform import resize

from numpy import cov, trace, iscomplexobj, asarray

# scale an array of images to a new size using bilinear interpolation

def scale_images(images, new_shape):

images_list = list()

for image in images:

# resize with bilinear interpolation

new_image = resize(image, new_shape, order=1)

# store

images_list.append(new_image)

return asarray(images_list)

# calculate frechet inception distance

def calculate_fid(model, images1, images2):

# preprocess images

images1 = preprocess_input(images1)

images2 = preprocess_input(images2)

# calculate activations

act1 = model.predict(images1)

act2 = model.predict(images2)

# calculate mean and covariance statistics

mu1, sigma1 = act1.mean(axis=0), cov(act1, rowvar=False)

mu2, sigma2 = act2.mean(axis=0), cov(act2, rowvar=False)

# calculate sum squared difference between means

ssdiff = np.sum((mu1 - mu2)**2.0)

# calculate sqrt of product between cov

covmean = sqrtm(sigma1.dot(sigma2))

# check and correct imaginary numbers from sqrt

if iscomplexobj(covmean):

covmean = covmean.real

# calculate score

fid = ssdiff + trace(sigma1 + sigma2 - 2.0 * covmean)

return fid

In the above code,

is the main function that calculates the FID between two sets of images. It first calculates the activations of the images using the InceptionV3 model. Then, it calculates the mean and covariance of the activations**calculate_fid**

## 6.5 Evaluating the Model

Evaluating generative models like Variational Autoencoders (VAEs) is not as straightforward as evaluating other types of models such as classifiers. With classifiers, we can use metrics like accuracy, precision, and recall because we have a clear target to compare our model's predictions with. But with generative models, we usually don't have a clear target to compare our generated data to.

However, we can still evaluate our model qualitatively and quantitatively.

**6.5.1 Qualitative Evaluation**

The easiest way to evaluate our model is by visually inspecting the images it generates. We can generate a few samples and see if they look like plausible handwritten digits.

`import matplotlib.pyplot as plt`

# Generate 25 random digits

digits = generate_digits(decoder, latent_dim, 25)

# Plot the generated digits

fig, axes = plt.subplots(5, 5, figsize=(10,10))

for i, ax in enumerate(axes.flat):

ax.imshow(digits[i].reshape(28, 28), cmap='gray')

ax.axis('off')

plt.show()

The above code generates 25 random handwritten digits using our model and then plots them using

. This gives us a quick and easy way to inspect our generated digits and see if they look realistic.**matplotlib**

**6.5.2 Quantitative Evaluation**

For a more objective evaluation, we can use metrics such as the Frechet Inception Distance (FID). The FID measures the distance between the distribution of the generated images and the distribution of the real images. A lower FID indicates that the two distributions are closer, which means our generated images are more similar to the real images.

Here's a simple implementation of FID:

`import numpy as np`

from scipy.linalg import sqrtm

from keras.applications.inception_v3 import InceptionV3, preprocess_input

from keras.datasets import mnist

from skimage.transform import resize

from numpy import cov, trace, iscomplexobj, asarray

# scale an array of images to a new size using bilinear interpolation

def scale_images(images, new_shape):

images_list = list()

for image in images:

# resize with bilinear interpolation

new_image = resize(image, new_shape, order=1)

# store

images_list.append(new_image)

return asarray(images_list)

# calculate frechet inception distance

def calculate_fid(model, images1, images2):

# preprocess images

images1 = preprocess_input(images1)

images2 = preprocess_input(images2)

# calculate activations

act1 = model.predict(images1)

act2 = model.predict(images2)

# calculate mean and covariance statistics

mu1, sigma1 = act1.mean(axis=0), cov(act1, rowvar=False)

mu2, sigma2 = act2.mean(axis=0), cov(act2, rowvar=False)

# calculate sum squared difference between means

ssdiff = np.sum((mu1 - mu2)**2.0)

# calculate sqrt of product between cov

covmean = sqrtm(sigma1.dot(sigma2))

# check and correct imaginary numbers from sqrt

if iscomplexobj(covmean):

covmean = covmean.real

# calculate score

fid = ssdiff + trace(sigma1 + sigma2 - 2.0 * covmean)

return fid

In the above code,

is the main function that calculates the FID between two sets of images. It first calculates the activations of the images using the InceptionV3 model. Then, it calculates the mean and covariance of the activations**calculate_fid**

## 6.5 Evaluating the Model

However, we can still evaluate our model qualitatively and quantitatively.

**6.5.1 Qualitative Evaluation**

`import matplotlib.pyplot as plt`

# Generate 25 random digits

digits = generate_digits(decoder, latent_dim, 25)

# Plot the generated digits

fig, axes = plt.subplots(5, 5, figsize=(10,10))

for i, ax in enumerate(axes.flat):

ax.imshow(digits[i].reshape(28, 28), cmap='gray')

ax.axis('off')

plt.show()

. This gives us a quick and easy way to inspect our generated digits and see if they look realistic.**matplotlib**

**6.5.2 Quantitative Evaluation**

Here's a simple implementation of FID:

`import numpy as np`

from scipy.linalg import sqrtm

from keras.applications.inception_v3 import InceptionV3, preprocess_input

from keras.datasets import mnist

from skimage.transform import resize

from numpy import cov, trace, iscomplexobj, asarray

# scale an array of images to a new size using bilinear interpolation

def scale_images(images, new_shape):

images_list = list()

for image in images:

# resize with bilinear interpolation

new_image = resize(image, new_shape, order=1)

# store

images_list.append(new_image)

return asarray(images_list)

# calculate frechet inception distance

def calculate_fid(model, images1, images2):

# preprocess images

images1 = preprocess_input(images1)

images2 = preprocess_input(images2)

# calculate activations

act1 = model.predict(images1)

act2 = model.predict(images2)

# calculate mean and covariance statistics

mu1, sigma1 = act1.mean(axis=0), cov(act1, rowvar=False)

mu2, sigma2 = act2.mean(axis=0), cov(act2, rowvar=False)

# calculate sum squared difference between means

ssdiff = np.sum((mu1 - mu2)**2.0)

# calculate sqrt of product between cov

covmean = sqrtm(sigma1.dot(sigma2))

# check and correct imaginary numbers from sqrt

if iscomplexobj(covmean):

covmean = covmean.real

# calculate score

fid = ssdiff + trace(sigma1 + sigma2 - 2.0 * covmean)

return fid

is the main function that calculates the FID between two sets of images. It first calculates the activations of the images using the InceptionV3 model. Then, it calculates the mean and covariance of the activations**calculate_fid**