Chapter 6: Project: Handwritten Digit Generation with VAEs
6.4 Evaluating the Model
Evaluating the performance of a Variational Autoencoder (VAE) is crucial to ensure that it has learned meaningful latent representations and can generate high-quality images. In this section, we will discuss various methods to evaluate our VAE, including quantitative metrics and qualitative assessments. We will also provide example codes to demonstrate these evaluation techniques.
6.4.1 Quantitative Evaluation Metrics
Quantitative metrics provide objective measures of the model's performance. For VAEs, some common metrics include Reconstruction Loss, KL Divergence, Inception Score (IS), and Fréchet Inception Distance (FID).
Reconstruction Loss
Reconstruction loss measures how well the decoder can reconstruct the input images from the latent variables. A lower reconstruction loss indicates that the model is able to generate images that closely resemble the original input.
Example: Calculating Reconstruction Loss
import numpy as np
from tensorflow.keras.losses import binary_crossentropy
# Calculate reconstruction loss for the test set
reconstructed_images = vae.predict(x_test)
reconstruction_loss = np.mean(binary_crossentropy(x_test, reconstructed_images))
print(f"Reconstruction Loss: {reconstruction_loss}")
The script first loads the necessary libraries. Then, it uses the trained VAE model to create reconstructed images from the test dataset. The reconstruction loss, which measures the difference between the original and reconstructed images, is then calculated using the binary cross-entropy loss function. Finally, the reconstruction loss is printed out.
KL Divergence
KL Divergence measures the difference between the learned latent distribution and the prior distribution (usually a standard normal distribution). A lower KL divergence indicates that the latent distribution is closer to the desired prior distribution.
Example: Calculating KL Divergence
# Calculate KL Divergence for the test set
def calculate_kl_divergence(encoder, x_test):
z_mean, z_log_var, _ = encoder.predict(x_test)
kl_divergence = 1 + z_log_var - np.square(z_mean) - np.exp(z_log_var)
kl_divergence = np.sum(kl_divergence, axis=-1)
kl_divergence *= -0.5
return np.mean(kl_divergence)
kl_divergence = calculate_kl_divergence(encoder, x_test)
print(f"KL Divergence: {kl_divergence}")
The function calculate_kl_divergence
takes an encoder
and x_test
as inputs. The encoder
predicts the mean and log variance (z_mean
and z_log_var
) and these are used to calculate the KL divergence. The KL divergence is calculated for each data point in the test set, and then the mean KL divergence across the entire set is returned.
Finally, the KL divergence is calculated using this function and printed to the console.
Inception Score (IS)
Inception Score evaluates the quality and diversity of generated images. It uses a pre-trained Inception network to classify the generated images and calculates the KL divergence between the conditional label distribution and the marginal label distribution.
Example: Calculating Inception Score
from tensorflow.keras.applications.inception_v3 import InceptionV3, preprocess_input
from scipy.stats import entropy
# Function to calculate Inception Score
def calculate_inception_score(images, n_split=10, eps=1E-16):
model = InceptionV3(include_top=False, pooling='avg', input_shape=(299, 299, 3))
images_resized = tf.image.resize(images, (299, 299))
images_preprocessed = preprocess_input(images_resized)
preds = model.predict(images_preprocessed)
split_scores = []
for i in range(n_split):
part = preds[i * preds.shape[0] // n_split: (i + 1) * preds.shape[0] // n_split]
py = np.mean(part, axis=0)
scores = []
for p in part:
scores.append(entropy(p, py))
split_scores.append(np.exp(np.mean(scores)))
return np.mean(split_scores), np.std(split_scores)
# Generate images for evaluation
n_samples = 1000
random_latent_vectors = np.random.normal(size=(n_samples, latent_dim))
generated_images = decoder.predict(random_latent_vectors)
generated_images = generated_images.reshape((n_samples, 28, 28, 1))
# Calculate Inception Score
is_mean, is_std = calculate_inception_score(generated_images)
print(f"Inception Score: {is_mean} ± {is_std}")
The function calculate_inception_score
takes a set of images as input, resizes the images to the appropriate size for the InceptionV3 model, and preprocesses the images. Then, it uses the InceptionV3 model to make predictions on the preprocessed images.
The function calculates the Inception Score by splitting the predictions into parts, calculating the entropy between each part and the mean of all parts, and then averaging the exponentiated entropy scores across all parts.
Finally, it generates a set of images from random latent vectors using a decoder (presumably from a GAN), reshapes the images, and calculates the Inception Score for the generated images. The mean and standard deviation of the Inception Score are then printed.
Fréchet Inception Distance (FID)
FID measures the distance between the distributions of real and generated images. Lower FID scores indicate that the generated images are more similar to the real images.
Example: Calculating FID
from numpy import cov, trace, iscomplexobj
from scipy.linalg import sqrtm
# Function to calculate FID
def calculate_fid(real_images, generated_images):
model = InceptionV3(include_top=False, pooling='avg', input_shape=(299, 299, 3))
real_images_resized = tf.image.resize(real_images, (299, 299))
generated_images_resized = tf.image.resize(generated_images, (299, 299))
real_images_preprocessed = preprocess_input(real_images_resized)
generated_images_preprocessed = preprocess_input(generated_images_resized)
act1 = model.predict(real_images_preprocessed)
act2 = model.predict(generated_images_preprocessed)
mu1, sigma1 = act1.mean(axis=0), cov(act1, rowvar=False)
mu2, sigma2 = act2.mean(axis=0), cov(act2, rowvar=False)
ssdiff = np.sum((mu1 - mu2) ** 2.0)
covmean = sqrtm(sigma1.dot(sigma2))
if iscomplexobj(covmean):
covmean = covmean.real
fid = ssdiff + trace(sigma1 + sigma2 - 2.0 * covmean)
return fid
# Sample real images
real_images = x_test[:n_samples].reshape((n_samples, 28, 28, 1))
# Calculate FID
fid_score = calculate_fid(real_images, generated_images)
print(f"FID Score: {fid_score}")
The function calculate_fid
takes two parameters: real_images
and generated_images
. It first resizes the images to the dimensions expected by the InceptionV3 model (299x299 pixels). The images are then preprocessed and fed into the model to obtain their activations.
The mean and covariance of the activations are computed which are then used to calculate the FID score. The FID score is a measure of similarity between the two sets of images; lower scores indicate more similar or better-quality generated images.
Finally, the FID score between a sample of real images and the generated images is calculated and printed.
6.4.2 Qualitative Evaluation
Qualitative evaluation involves visually inspecting the generated images to assess their quality and diversity. This method is subjective but provides valuable insights into the model's performance.
Visual Inspection
Visual inspection involves generating a set of images and examining them for realism and diversity. This helps identify any obvious issues such as blurriness, artifacts, or mode collapse.
Example: Visualizing Generated Images
# Function to visualize generated images
def visualize_generated_images(decoder, latent_dim, n_samples=10):
random_latent_vectors = np.random.normal(size=(n_samples, latent_dim))
generated_images = decoder.predict(random_latent_vectors)
generated_images = generated_images.reshape((n_samples, 28, 28))
plt.figure(figsize=(10, 2))
for i in range(n_samples):
plt.subplot(1, n_samples, i + 1)
plt.imshow(generated_images[i], cmap='gray')
plt.axis('off')
plt.show()
# Visualize generated images
visualize_generated_images(decoder, latent_dim)
This example defines a function, visualize_generated_images
, and uses it. This function generates images from random latent vectors (a type of data representation) using a given decoder model. It then reshapes the generated images and visualizes them in a 1 by n_samples subplot grid. After defining the function, the script calls it to visualize some images generated by the 'decoder' model with a specified 'latent_dim' (latent dimension).
Latent Space Traversal
Latent space traversal involves interpolating between points in the latent space and generating images at each step. This technique helps visualize how smoothly the VAE transitions between different data points and can reveal the structure of the latent space.
Example: Latent Space Traversal
# Function to perform latent space traversal
def latent_space_traversal(decoder, latent_dim, n_steps=10):
start_point = np.random.normal(size=(1, latent_dim))
end_point = np.random.normal(size=(1, latent_dim))
interpolation = np.linspace(start_point, end_point, n_steps)
generated_images = decoder.predict(interpolation)
generated_images = generated_images.reshape((n_steps, 28, 28))
plt.figure(figsize=(15, 2))
for i in range(n_steps):
plt.subplot(1, n_steps, i + 1)
plt.imshow(generated_images[i], cmap='gray')
plt.axis('off')
plt.show()
# Perform latent space traversal
latent_space_traversal(decoder, latent_dim)
This is a function named 'latent_space_traversal'. It takes in three parameters: a decoder, a latent dimension, and an optional number of steps which is set to 10 by default. The function generates two random points in the latent space, which are the start and end points. Then, it creates a linear interpolation between these two points.
The generated points in the latent space are then passed through the decoder to generate images. These images are then reshaped into a 28x28 pixel format (common for MNIST dataset images) and displayed in a plot. The final line of code calls and executes the function.
6.4.3 Evaluating Specific Features
By exploring different regions of the latent space, we can generate digits with specific features and evaluate how well the VAE has learned to represent these features.
Example: Exploring Specific Latent Features
# Function to explore specific latent features
def explore_latent_features(decoder, latent_dim, feature_vector, variation_range=(-3, 3), n_variations=10):
feature_variations = np.linspace(variation_range[0], variation_range[1], n_variations)
latent_vectors = np.zeros((n_variations, latent_dim))
for i, variation in enumerate(feature_variations):
latent_vectors[i] = feature_vector
latent_vectors[i, 0] = variation # Vary the first feature for demonstration
generated_images = decoder.predict(latent_vectors)
generated_images = generated_images.reshape((n_variations, 28, 28))
plt.figure(figsize=(15, 2))
for i in range(n_variations):
plt.subplot(1, n_variations, i + 1)
plt.imshow(generated_images[i], cmap='gray')
plt.axis('off')
plt.show()
# Example feature vector
example_feature_vector = np.random.normal(size=(latent_dim,))
# Explore specific latent features
explore_latent_features(decoder, latent_dim, example_feature_vector)
This example code is for a function named explore_latent_features
. This function is used to explore and visualize the effects of varying specific latent features in a generative model, such as a Variational Autoencoder (VAE). The function takes a decoder model, the dimensionality of the latent space (latent_dim
), a feature vector (feature_vector
), and parameters for the range and number of variations to apply to the feature vector.
The function first generates a set of new latent vectors by applying a range of variations to the input feature vector. It then uses the decoder model to generate images from these latent vectors and reshapes the images for visualization.
Next, it plots the generated images in a row, showing the effects of varying the specific latent feature on the generated images. It uses an example feature vector randomly generated from a normal distribution for demonstration.
In the example, the first feature is varied for demonstration. However, you can modify the index to explore other latent features.
6.4 Evaluating the Model
Evaluating the performance of a Variational Autoencoder (VAE) is crucial to ensure that it has learned meaningful latent representations and can generate high-quality images. In this section, we will discuss various methods to evaluate our VAE, including quantitative metrics and qualitative assessments. We will also provide example codes to demonstrate these evaluation techniques.
6.4.1 Quantitative Evaluation Metrics
Quantitative metrics provide objective measures of the model's performance. For VAEs, some common metrics include Reconstruction Loss, KL Divergence, Inception Score (IS), and Fréchet Inception Distance (FID).
Reconstruction Loss
Reconstruction loss measures how well the decoder can reconstruct the input images from the latent variables. A lower reconstruction loss indicates that the model is able to generate images that closely resemble the original input.
Example: Calculating Reconstruction Loss
import numpy as np
from tensorflow.keras.losses import binary_crossentropy
# Calculate reconstruction loss for the test set
reconstructed_images = vae.predict(x_test)
reconstruction_loss = np.mean(binary_crossentropy(x_test, reconstructed_images))
print(f"Reconstruction Loss: {reconstruction_loss}")
The script first loads the necessary libraries. Then, it uses the trained VAE model to create reconstructed images from the test dataset. The reconstruction loss, which measures the difference between the original and reconstructed images, is then calculated using the binary cross-entropy loss function. Finally, the reconstruction loss is printed out.
KL Divergence
KL Divergence measures the difference between the learned latent distribution and the prior distribution (usually a standard normal distribution). A lower KL divergence indicates that the latent distribution is closer to the desired prior distribution.
Example: Calculating KL Divergence
# Calculate KL Divergence for the test set
def calculate_kl_divergence(encoder, x_test):
z_mean, z_log_var, _ = encoder.predict(x_test)
kl_divergence = 1 + z_log_var - np.square(z_mean) - np.exp(z_log_var)
kl_divergence = np.sum(kl_divergence, axis=-1)
kl_divergence *= -0.5
return np.mean(kl_divergence)
kl_divergence = calculate_kl_divergence(encoder, x_test)
print(f"KL Divergence: {kl_divergence}")
The function calculate_kl_divergence
takes an encoder
and x_test
as inputs. The encoder
predicts the mean and log variance (z_mean
and z_log_var
) and these are used to calculate the KL divergence. The KL divergence is calculated for each data point in the test set, and then the mean KL divergence across the entire set is returned.
Finally, the KL divergence is calculated using this function and printed to the console.
Inception Score (IS)
Inception Score evaluates the quality and diversity of generated images. It uses a pre-trained Inception network to classify the generated images and calculates the KL divergence between the conditional label distribution and the marginal label distribution.
Example: Calculating Inception Score
from tensorflow.keras.applications.inception_v3 import InceptionV3, preprocess_input
from scipy.stats import entropy
# Function to calculate Inception Score
def calculate_inception_score(images, n_split=10, eps=1E-16):
model = InceptionV3(include_top=False, pooling='avg', input_shape=(299, 299, 3))
images_resized = tf.image.resize(images, (299, 299))
images_preprocessed = preprocess_input(images_resized)
preds = model.predict(images_preprocessed)
split_scores = []
for i in range(n_split):
part = preds[i * preds.shape[0] // n_split: (i + 1) * preds.shape[0] // n_split]
py = np.mean(part, axis=0)
scores = []
for p in part:
scores.append(entropy(p, py))
split_scores.append(np.exp(np.mean(scores)))
return np.mean(split_scores), np.std(split_scores)
# Generate images for evaluation
n_samples = 1000
random_latent_vectors = np.random.normal(size=(n_samples, latent_dim))
generated_images = decoder.predict(random_latent_vectors)
generated_images = generated_images.reshape((n_samples, 28, 28, 1))
# Calculate Inception Score
is_mean, is_std = calculate_inception_score(generated_images)
print(f"Inception Score: {is_mean} ± {is_std}")
The function calculate_inception_score
takes a set of images as input, resizes the images to the appropriate size for the InceptionV3 model, and preprocesses the images. Then, it uses the InceptionV3 model to make predictions on the preprocessed images.
The function calculates the Inception Score by splitting the predictions into parts, calculating the entropy between each part and the mean of all parts, and then averaging the exponentiated entropy scores across all parts.
Finally, it generates a set of images from random latent vectors using a decoder (presumably from a GAN), reshapes the images, and calculates the Inception Score for the generated images. The mean and standard deviation of the Inception Score are then printed.
Fréchet Inception Distance (FID)
FID measures the distance between the distributions of real and generated images. Lower FID scores indicate that the generated images are more similar to the real images.
Example: Calculating FID
from numpy import cov, trace, iscomplexobj
from scipy.linalg import sqrtm
# Function to calculate FID
def calculate_fid(real_images, generated_images):
model = InceptionV3(include_top=False, pooling='avg', input_shape=(299, 299, 3))
real_images_resized = tf.image.resize(real_images, (299, 299))
generated_images_resized = tf.image.resize(generated_images, (299, 299))
real_images_preprocessed = preprocess_input(real_images_resized)
generated_images_preprocessed = preprocess_input(generated_images_resized)
act1 = model.predict(real_images_preprocessed)
act2 = model.predict(generated_images_preprocessed)
mu1, sigma1 = act1.mean(axis=0), cov(act1, rowvar=False)
mu2, sigma2 = act2.mean(axis=0), cov(act2, rowvar=False)
ssdiff = np.sum((mu1 - mu2) ** 2.0)
covmean = sqrtm(sigma1.dot(sigma2))
if iscomplexobj(covmean):
covmean = covmean.real
fid = ssdiff + trace(sigma1 + sigma2 - 2.0 * covmean)
return fid
# Sample real images
real_images = x_test[:n_samples].reshape((n_samples, 28, 28, 1))
# Calculate FID
fid_score = calculate_fid(real_images, generated_images)
print(f"FID Score: {fid_score}")
The function calculate_fid
takes two parameters: real_images
and generated_images
. It first resizes the images to the dimensions expected by the InceptionV3 model (299x299 pixels). The images are then preprocessed and fed into the model to obtain their activations.
The mean and covariance of the activations are computed which are then used to calculate the FID score. The FID score is a measure of similarity between the two sets of images; lower scores indicate more similar or better-quality generated images.
Finally, the FID score between a sample of real images and the generated images is calculated and printed.
6.4.2 Qualitative Evaluation
Qualitative evaluation involves visually inspecting the generated images to assess their quality and diversity. This method is subjective but provides valuable insights into the model's performance.
Visual Inspection
Visual inspection involves generating a set of images and examining them for realism and diversity. This helps identify any obvious issues such as blurriness, artifacts, or mode collapse.
Example: Visualizing Generated Images
# Function to visualize generated images
def visualize_generated_images(decoder, latent_dim, n_samples=10):
random_latent_vectors = np.random.normal(size=(n_samples, latent_dim))
generated_images = decoder.predict(random_latent_vectors)
generated_images = generated_images.reshape((n_samples, 28, 28))
plt.figure(figsize=(10, 2))
for i in range(n_samples):
plt.subplot(1, n_samples, i + 1)
plt.imshow(generated_images[i], cmap='gray')
plt.axis('off')
plt.show()
# Visualize generated images
visualize_generated_images(decoder, latent_dim)
This example defines a function, visualize_generated_images
, and uses it. This function generates images from random latent vectors (a type of data representation) using a given decoder model. It then reshapes the generated images and visualizes them in a 1 by n_samples subplot grid. After defining the function, the script calls it to visualize some images generated by the 'decoder' model with a specified 'latent_dim' (latent dimension).
Latent Space Traversal
Latent space traversal involves interpolating between points in the latent space and generating images at each step. This technique helps visualize how smoothly the VAE transitions between different data points and can reveal the structure of the latent space.
Example: Latent Space Traversal
# Function to perform latent space traversal
def latent_space_traversal(decoder, latent_dim, n_steps=10):
start_point = np.random.normal(size=(1, latent_dim))
end_point = np.random.normal(size=(1, latent_dim))
interpolation = np.linspace(start_point, end_point, n_steps)
generated_images = decoder.predict(interpolation)
generated_images = generated_images.reshape((n_steps, 28, 28))
plt.figure(figsize=(15, 2))
for i in range(n_steps):
plt.subplot(1, n_steps, i + 1)
plt.imshow(generated_images[i], cmap='gray')
plt.axis('off')
plt.show()
# Perform latent space traversal
latent_space_traversal(decoder, latent_dim)
This is a function named 'latent_space_traversal'. It takes in three parameters: a decoder, a latent dimension, and an optional number of steps which is set to 10 by default. The function generates two random points in the latent space, which are the start and end points. Then, it creates a linear interpolation between these two points.
The generated points in the latent space are then passed through the decoder to generate images. These images are then reshaped into a 28x28 pixel format (common for MNIST dataset images) and displayed in a plot. The final line of code calls and executes the function.
6.4.3 Evaluating Specific Features
By exploring different regions of the latent space, we can generate digits with specific features and evaluate how well the VAE has learned to represent these features.
Example: Exploring Specific Latent Features
# Function to explore specific latent features
def explore_latent_features(decoder, latent_dim, feature_vector, variation_range=(-3, 3), n_variations=10):
feature_variations = np.linspace(variation_range[0], variation_range[1], n_variations)
latent_vectors = np.zeros((n_variations, latent_dim))
for i, variation in enumerate(feature_variations):
latent_vectors[i] = feature_vector
latent_vectors[i, 0] = variation # Vary the first feature for demonstration
generated_images = decoder.predict(latent_vectors)
generated_images = generated_images.reshape((n_variations, 28, 28))
plt.figure(figsize=(15, 2))
for i in range(n_variations):
plt.subplot(1, n_variations, i + 1)
plt.imshow(generated_images[i], cmap='gray')
plt.axis('off')
plt.show()
# Example feature vector
example_feature_vector = np.random.normal(size=(latent_dim,))
# Explore specific latent features
explore_latent_features(decoder, latent_dim, example_feature_vector)
This example code is for a function named explore_latent_features
. This function is used to explore and visualize the effects of varying specific latent features in a generative model, such as a Variational Autoencoder (VAE). The function takes a decoder model, the dimensionality of the latent space (latent_dim
), a feature vector (feature_vector
), and parameters for the range and number of variations to apply to the feature vector.
The function first generates a set of new latent vectors by applying a range of variations to the input feature vector. It then uses the decoder model to generate images from these latent vectors and reshapes the images for visualization.
Next, it plots the generated images in a row, showing the effects of varying the specific latent feature on the generated images. It uses an example feature vector randomly generated from a normal distribution for demonstration.
In the example, the first feature is varied for demonstration. However, you can modify the index to explore other latent features.
6.4 Evaluating the Model
Evaluating the performance of a Variational Autoencoder (VAE) is crucial to ensure that it has learned meaningful latent representations and can generate high-quality images. In this section, we will discuss various methods to evaluate our VAE, including quantitative metrics and qualitative assessments. We will also provide example codes to demonstrate these evaluation techniques.
6.4.1 Quantitative Evaluation Metrics
Quantitative metrics provide objective measures of the model's performance. For VAEs, some common metrics include Reconstruction Loss, KL Divergence, Inception Score (IS), and Fréchet Inception Distance (FID).
Reconstruction Loss
Reconstruction loss measures how well the decoder can reconstruct the input images from the latent variables. A lower reconstruction loss indicates that the model is able to generate images that closely resemble the original input.
Example: Calculating Reconstruction Loss
import numpy as np
from tensorflow.keras.losses import binary_crossentropy
# Calculate reconstruction loss for the test set
reconstructed_images = vae.predict(x_test)
reconstruction_loss = np.mean(binary_crossentropy(x_test, reconstructed_images))
print(f"Reconstruction Loss: {reconstruction_loss}")
The script first loads the necessary libraries. Then, it uses the trained VAE model to create reconstructed images from the test dataset. The reconstruction loss, which measures the difference between the original and reconstructed images, is then calculated using the binary cross-entropy loss function. Finally, the reconstruction loss is printed out.
KL Divergence
KL Divergence measures the difference between the learned latent distribution and the prior distribution (usually a standard normal distribution). A lower KL divergence indicates that the latent distribution is closer to the desired prior distribution.
Example: Calculating KL Divergence
# Calculate KL Divergence for the test set
def calculate_kl_divergence(encoder, x_test):
z_mean, z_log_var, _ = encoder.predict(x_test)
kl_divergence = 1 + z_log_var - np.square(z_mean) - np.exp(z_log_var)
kl_divergence = np.sum(kl_divergence, axis=-1)
kl_divergence *= -0.5
return np.mean(kl_divergence)
kl_divergence = calculate_kl_divergence(encoder, x_test)
print(f"KL Divergence: {kl_divergence}")
The function calculate_kl_divergence
takes an encoder
and x_test
as inputs. The encoder
predicts the mean and log variance (z_mean
and z_log_var
) and these are used to calculate the KL divergence. The KL divergence is calculated for each data point in the test set, and then the mean KL divergence across the entire set is returned.
Finally, the KL divergence is calculated using this function and printed to the console.
Inception Score (IS)
Inception Score evaluates the quality and diversity of generated images. It uses a pre-trained Inception network to classify the generated images and calculates the KL divergence between the conditional label distribution and the marginal label distribution.
Example: Calculating Inception Score
from tensorflow.keras.applications.inception_v3 import InceptionV3, preprocess_input
from scipy.stats import entropy
# Function to calculate Inception Score
def calculate_inception_score(images, n_split=10, eps=1E-16):
model = InceptionV3(include_top=False, pooling='avg', input_shape=(299, 299, 3))
images_resized = tf.image.resize(images, (299, 299))
images_preprocessed = preprocess_input(images_resized)
preds = model.predict(images_preprocessed)
split_scores = []
for i in range(n_split):
part = preds[i * preds.shape[0] // n_split: (i + 1) * preds.shape[0] // n_split]
py = np.mean(part, axis=0)
scores = []
for p in part:
scores.append(entropy(p, py))
split_scores.append(np.exp(np.mean(scores)))
return np.mean(split_scores), np.std(split_scores)
# Generate images for evaluation
n_samples = 1000
random_latent_vectors = np.random.normal(size=(n_samples, latent_dim))
generated_images = decoder.predict(random_latent_vectors)
generated_images = generated_images.reshape((n_samples, 28, 28, 1))
# Calculate Inception Score
is_mean, is_std = calculate_inception_score(generated_images)
print(f"Inception Score: {is_mean} ± {is_std}")
The function calculate_inception_score
takes a set of images as input, resizes the images to the appropriate size for the InceptionV3 model, and preprocesses the images. Then, it uses the InceptionV3 model to make predictions on the preprocessed images.
The function calculates the Inception Score by splitting the predictions into parts, calculating the entropy between each part and the mean of all parts, and then averaging the exponentiated entropy scores across all parts.
Finally, it generates a set of images from random latent vectors using a decoder (presumably from a GAN), reshapes the images, and calculates the Inception Score for the generated images. The mean and standard deviation of the Inception Score are then printed.
Fréchet Inception Distance (FID)
FID measures the distance between the distributions of real and generated images. Lower FID scores indicate that the generated images are more similar to the real images.
Example: Calculating FID
from numpy import cov, trace, iscomplexobj
from scipy.linalg import sqrtm
# Function to calculate FID
def calculate_fid(real_images, generated_images):
model = InceptionV3(include_top=False, pooling='avg', input_shape=(299, 299, 3))
real_images_resized = tf.image.resize(real_images, (299, 299))
generated_images_resized = tf.image.resize(generated_images, (299, 299))
real_images_preprocessed = preprocess_input(real_images_resized)
generated_images_preprocessed = preprocess_input(generated_images_resized)
act1 = model.predict(real_images_preprocessed)
act2 = model.predict(generated_images_preprocessed)
mu1, sigma1 = act1.mean(axis=0), cov(act1, rowvar=False)
mu2, sigma2 = act2.mean(axis=0), cov(act2, rowvar=False)
ssdiff = np.sum((mu1 - mu2) ** 2.0)
covmean = sqrtm(sigma1.dot(sigma2))
if iscomplexobj(covmean):
covmean = covmean.real
fid = ssdiff + trace(sigma1 + sigma2 - 2.0 * covmean)
return fid
# Sample real images
real_images = x_test[:n_samples].reshape((n_samples, 28, 28, 1))
# Calculate FID
fid_score = calculate_fid(real_images, generated_images)
print(f"FID Score: {fid_score}")
The function calculate_fid
takes two parameters: real_images
and generated_images
. It first resizes the images to the dimensions expected by the InceptionV3 model (299x299 pixels). The images are then preprocessed and fed into the model to obtain their activations.
The mean and covariance of the activations are computed which are then used to calculate the FID score. The FID score is a measure of similarity between the two sets of images; lower scores indicate more similar or better-quality generated images.
Finally, the FID score between a sample of real images and the generated images is calculated and printed.
6.4.2 Qualitative Evaluation
Qualitative evaluation involves visually inspecting the generated images to assess their quality and diversity. This method is subjective but provides valuable insights into the model's performance.
Visual Inspection
Visual inspection involves generating a set of images and examining them for realism and diversity. This helps identify any obvious issues such as blurriness, artifacts, or mode collapse.
Example: Visualizing Generated Images
# Function to visualize generated images
def visualize_generated_images(decoder, latent_dim, n_samples=10):
random_latent_vectors = np.random.normal(size=(n_samples, latent_dim))
generated_images = decoder.predict(random_latent_vectors)
generated_images = generated_images.reshape((n_samples, 28, 28))
plt.figure(figsize=(10, 2))
for i in range(n_samples):
plt.subplot(1, n_samples, i + 1)
plt.imshow(generated_images[i], cmap='gray')
plt.axis('off')
plt.show()
# Visualize generated images
visualize_generated_images(decoder, latent_dim)
This example defines a function, visualize_generated_images
, and uses it. This function generates images from random latent vectors (a type of data representation) using a given decoder model. It then reshapes the generated images and visualizes them in a 1 by n_samples subplot grid. After defining the function, the script calls it to visualize some images generated by the 'decoder' model with a specified 'latent_dim' (latent dimension).
Latent Space Traversal
Latent space traversal involves interpolating between points in the latent space and generating images at each step. This technique helps visualize how smoothly the VAE transitions between different data points and can reveal the structure of the latent space.
Example: Latent Space Traversal
# Function to perform latent space traversal
def latent_space_traversal(decoder, latent_dim, n_steps=10):
start_point = np.random.normal(size=(1, latent_dim))
end_point = np.random.normal(size=(1, latent_dim))
interpolation = np.linspace(start_point, end_point, n_steps)
generated_images = decoder.predict(interpolation)
generated_images = generated_images.reshape((n_steps, 28, 28))
plt.figure(figsize=(15, 2))
for i in range(n_steps):
plt.subplot(1, n_steps, i + 1)
plt.imshow(generated_images[i], cmap='gray')
plt.axis('off')
plt.show()
# Perform latent space traversal
latent_space_traversal(decoder, latent_dim)
This is a function named 'latent_space_traversal'. It takes in three parameters: a decoder, a latent dimension, and an optional number of steps which is set to 10 by default. The function generates two random points in the latent space, which are the start and end points. Then, it creates a linear interpolation between these two points.
The generated points in the latent space are then passed through the decoder to generate images. These images are then reshaped into a 28x28 pixel format (common for MNIST dataset images) and displayed in a plot. The final line of code calls and executes the function.
6.4.3 Evaluating Specific Features
By exploring different regions of the latent space, we can generate digits with specific features and evaluate how well the VAE has learned to represent these features.
Example: Exploring Specific Latent Features
# Function to explore specific latent features
def explore_latent_features(decoder, latent_dim, feature_vector, variation_range=(-3, 3), n_variations=10):
feature_variations = np.linspace(variation_range[0], variation_range[1], n_variations)
latent_vectors = np.zeros((n_variations, latent_dim))
for i, variation in enumerate(feature_variations):
latent_vectors[i] = feature_vector
latent_vectors[i, 0] = variation # Vary the first feature for demonstration
generated_images = decoder.predict(latent_vectors)
generated_images = generated_images.reshape((n_variations, 28, 28))
plt.figure(figsize=(15, 2))
for i in range(n_variations):
plt.subplot(1, n_variations, i + 1)
plt.imshow(generated_images[i], cmap='gray')
plt.axis('off')
plt.show()
# Example feature vector
example_feature_vector = np.random.normal(size=(latent_dim,))
# Explore specific latent features
explore_latent_features(decoder, latent_dim, example_feature_vector)
This example code is for a function named explore_latent_features
. This function is used to explore and visualize the effects of varying specific latent features in a generative model, such as a Variational Autoencoder (VAE). The function takes a decoder model, the dimensionality of the latent space (latent_dim
), a feature vector (feature_vector
), and parameters for the range and number of variations to apply to the feature vector.
The function first generates a set of new latent vectors by applying a range of variations to the input feature vector. It then uses the decoder model to generate images from these latent vectors and reshapes the images for visualization.
Next, it plots the generated images in a row, showing the effects of varying the specific latent feature on the generated images. It uses an example feature vector randomly generated from a normal distribution for demonstration.
In the example, the first feature is varied for demonstration. However, you can modify the index to explore other latent features.
6.4 Evaluating the Model
Evaluating the performance of a Variational Autoencoder (VAE) is crucial to ensure that it has learned meaningful latent representations and can generate high-quality images. In this section, we will discuss various methods to evaluate our VAE, including quantitative metrics and qualitative assessments. We will also provide example codes to demonstrate these evaluation techniques.
6.4.1 Quantitative Evaluation Metrics
Quantitative metrics provide objective measures of the model's performance. For VAEs, some common metrics include Reconstruction Loss, KL Divergence, Inception Score (IS), and Fréchet Inception Distance (FID).
Reconstruction Loss
Reconstruction loss measures how well the decoder can reconstruct the input images from the latent variables. A lower reconstruction loss indicates that the model is able to generate images that closely resemble the original input.
Example: Calculating Reconstruction Loss
import numpy as np
from tensorflow.keras.losses import binary_crossentropy
# Calculate reconstruction loss for the test set
reconstructed_images = vae.predict(x_test)
reconstruction_loss = np.mean(binary_crossentropy(x_test, reconstructed_images))
print(f"Reconstruction Loss: {reconstruction_loss}")
The script first loads the necessary libraries. Then, it uses the trained VAE model to create reconstructed images from the test dataset. The reconstruction loss, which measures the difference between the original and reconstructed images, is then calculated using the binary cross-entropy loss function. Finally, the reconstruction loss is printed out.
KL Divergence
KL Divergence measures the difference between the learned latent distribution and the prior distribution (usually a standard normal distribution). A lower KL divergence indicates that the latent distribution is closer to the desired prior distribution.
Example: Calculating KL Divergence
# Calculate KL Divergence for the test set
def calculate_kl_divergence(encoder, x_test):
z_mean, z_log_var, _ = encoder.predict(x_test)
kl_divergence = 1 + z_log_var - np.square(z_mean) - np.exp(z_log_var)
kl_divergence = np.sum(kl_divergence, axis=-1)
kl_divergence *= -0.5
return np.mean(kl_divergence)
kl_divergence = calculate_kl_divergence(encoder, x_test)
print(f"KL Divergence: {kl_divergence}")
The function calculate_kl_divergence
takes an encoder
and x_test
as inputs. The encoder
predicts the mean and log variance (z_mean
and z_log_var
) and these are used to calculate the KL divergence. The KL divergence is calculated for each data point in the test set, and then the mean KL divergence across the entire set is returned.
Finally, the KL divergence is calculated using this function and printed to the console.
Inception Score (IS)
Inception Score evaluates the quality and diversity of generated images. It uses a pre-trained Inception network to classify the generated images and calculates the KL divergence between the conditional label distribution and the marginal label distribution.
Example: Calculating Inception Score
from tensorflow.keras.applications.inception_v3 import InceptionV3, preprocess_input
from scipy.stats import entropy
# Function to calculate Inception Score
def calculate_inception_score(images, n_split=10, eps=1E-16):
model = InceptionV3(include_top=False, pooling='avg', input_shape=(299, 299, 3))
images_resized = tf.image.resize(images, (299, 299))
images_preprocessed = preprocess_input(images_resized)
preds = model.predict(images_preprocessed)
split_scores = []
for i in range(n_split):
part = preds[i * preds.shape[0] // n_split: (i + 1) * preds.shape[0] // n_split]
py = np.mean(part, axis=0)
scores = []
for p in part:
scores.append(entropy(p, py))
split_scores.append(np.exp(np.mean(scores)))
return np.mean(split_scores), np.std(split_scores)
# Generate images for evaluation
n_samples = 1000
random_latent_vectors = np.random.normal(size=(n_samples, latent_dim))
generated_images = decoder.predict(random_latent_vectors)
generated_images = generated_images.reshape((n_samples, 28, 28, 1))
# Calculate Inception Score
is_mean, is_std = calculate_inception_score(generated_images)
print(f"Inception Score: {is_mean} ± {is_std}")
The function calculate_inception_score
takes a set of images as input, resizes the images to the appropriate size for the InceptionV3 model, and preprocesses the images. Then, it uses the InceptionV3 model to make predictions on the preprocessed images.
The function calculates the Inception Score by splitting the predictions into parts, calculating the entropy between each part and the mean of all parts, and then averaging the exponentiated entropy scores across all parts.
Finally, it generates a set of images from random latent vectors using a decoder (presumably from a GAN), reshapes the images, and calculates the Inception Score for the generated images. The mean and standard deviation of the Inception Score are then printed.
Fréchet Inception Distance (FID)
FID measures the distance between the distributions of real and generated images. Lower FID scores indicate that the generated images are more similar to the real images.
Example: Calculating FID
from numpy import cov, trace, iscomplexobj
from scipy.linalg import sqrtm
# Function to calculate FID
def calculate_fid(real_images, generated_images):
model = InceptionV3(include_top=False, pooling='avg', input_shape=(299, 299, 3))
real_images_resized = tf.image.resize(real_images, (299, 299))
generated_images_resized = tf.image.resize(generated_images, (299, 299))
real_images_preprocessed = preprocess_input(real_images_resized)
generated_images_preprocessed = preprocess_input(generated_images_resized)
act1 = model.predict(real_images_preprocessed)
act2 = model.predict(generated_images_preprocessed)
mu1, sigma1 = act1.mean(axis=0), cov(act1, rowvar=False)
mu2, sigma2 = act2.mean(axis=0), cov(act2, rowvar=False)
ssdiff = np.sum((mu1 - mu2) ** 2.0)
covmean = sqrtm(sigma1.dot(sigma2))
if iscomplexobj(covmean):
covmean = covmean.real
fid = ssdiff + trace(sigma1 + sigma2 - 2.0 * covmean)
return fid
# Sample real images
real_images = x_test[:n_samples].reshape((n_samples, 28, 28, 1))
# Calculate FID
fid_score = calculate_fid(real_images, generated_images)
print(f"FID Score: {fid_score}")
The function calculate_fid
takes two parameters: real_images
and generated_images
. It first resizes the images to the dimensions expected by the InceptionV3 model (299x299 pixels). The images are then preprocessed and fed into the model to obtain their activations.
The mean and covariance of the activations are computed which are then used to calculate the FID score. The FID score is a measure of similarity between the two sets of images; lower scores indicate more similar or better-quality generated images.
Finally, the FID score between a sample of real images and the generated images is calculated and printed.
6.4.2 Qualitative Evaluation
Qualitative evaluation involves visually inspecting the generated images to assess their quality and diversity. This method is subjective but provides valuable insights into the model's performance.
Visual Inspection
Visual inspection involves generating a set of images and examining them for realism and diversity. This helps identify any obvious issues such as blurriness, artifacts, or mode collapse.
Example: Visualizing Generated Images
# Function to visualize generated images
def visualize_generated_images(decoder, latent_dim, n_samples=10):
random_latent_vectors = np.random.normal(size=(n_samples, latent_dim))
generated_images = decoder.predict(random_latent_vectors)
generated_images = generated_images.reshape((n_samples, 28, 28))
plt.figure(figsize=(10, 2))
for i in range(n_samples):
plt.subplot(1, n_samples, i + 1)
plt.imshow(generated_images[i], cmap='gray')
plt.axis('off')
plt.show()
# Visualize generated images
visualize_generated_images(decoder, latent_dim)
This example defines a function, visualize_generated_images
, and uses it. This function generates images from random latent vectors (a type of data representation) using a given decoder model. It then reshapes the generated images and visualizes them in a 1 by n_samples subplot grid. After defining the function, the script calls it to visualize some images generated by the 'decoder' model with a specified 'latent_dim' (latent dimension).
Latent Space Traversal
Latent space traversal involves interpolating between points in the latent space and generating images at each step. This technique helps visualize how smoothly the VAE transitions between different data points and can reveal the structure of the latent space.
Example: Latent Space Traversal
# Function to perform latent space traversal
def latent_space_traversal(decoder, latent_dim, n_steps=10):
start_point = np.random.normal(size=(1, latent_dim))
end_point = np.random.normal(size=(1, latent_dim))
interpolation = np.linspace(start_point, end_point, n_steps)
generated_images = decoder.predict(interpolation)
generated_images = generated_images.reshape((n_steps, 28, 28))
plt.figure(figsize=(15, 2))
for i in range(n_steps):
plt.subplot(1, n_steps, i + 1)
plt.imshow(generated_images[i], cmap='gray')
plt.axis('off')
plt.show()
# Perform latent space traversal
latent_space_traversal(decoder, latent_dim)
This is a function named 'latent_space_traversal'. It takes in three parameters: a decoder, a latent dimension, and an optional number of steps which is set to 10 by default. The function generates two random points in the latent space, which are the start and end points. Then, it creates a linear interpolation between these two points.
The generated points in the latent space are then passed through the decoder to generate images. These images are then reshaped into a 28x28 pixel format (common for MNIST dataset images) and displayed in a plot. The final line of code calls and executes the function.
6.4.3 Evaluating Specific Features
By exploring different regions of the latent space, we can generate digits with specific features and evaluate how well the VAE has learned to represent these features.
Example: Exploring Specific Latent Features
# Function to explore specific latent features
def explore_latent_features(decoder, latent_dim, feature_vector, variation_range=(-3, 3), n_variations=10):
feature_variations = np.linspace(variation_range[0], variation_range[1], n_variations)
latent_vectors = np.zeros((n_variations, latent_dim))
for i, variation in enumerate(feature_variations):
latent_vectors[i] = feature_vector
latent_vectors[i, 0] = variation # Vary the first feature for demonstration
generated_images = decoder.predict(latent_vectors)
generated_images = generated_images.reshape((n_variations, 28, 28))
plt.figure(figsize=(15, 2))
for i in range(n_variations):
plt.subplot(1, n_variations, i + 1)
plt.imshow(generated_images[i], cmap='gray')
plt.axis('off')
plt.show()
# Example feature vector
example_feature_vector = np.random.normal(size=(latent_dim,))
# Explore specific latent features
explore_latent_features(decoder, latent_dim, example_feature_vector)
This example code is for a function named explore_latent_features
. This function is used to explore and visualize the effects of varying specific latent features in a generative model, such as a Variational Autoencoder (VAE). The function takes a decoder model, the dimensionality of the latent space (latent_dim
), a feature vector (feature_vector
), and parameters for the range and number of variations to apply to the feature vector.
The function first generates a set of new latent vectors by applying a range of variations to the input feature vector. It then uses the decoder model to generate images from these latent vectors and reshapes the images for visualization.
Next, it plots the generated images in a row, showing the effects of varying the specific latent feature on the generated images. It uses an example feature vector randomly generated from a normal distribution for demonstration.
In the example, the first feature is varied for demonstration. However, you can modify the index to explore other latent features.