Chapter 3: Deep Learning with Keras

3.1 Introduction to Keras API in TensorFlow 2.x

Keras has emerged as a cornerstone in the machine learning community, garnering widespread adoption among researchers and developers for its intuitive interface and user-friendly design. As an integral component of the TensorFlow 2.x ecosystem, Keras provides a highly efficient API that simplifies the process of constructing and training neural networks.

By abstracting away complex low-level operations such as tensor management and computational graph handling, Keras enables users to focus on the higher-level architecture of their models. This abstraction not only accelerates the prototyping phase of machine learning projects but also facilitates seamless deployment in production environments, making Keras an invaluable tool for both experimental and real-world applications.

In this comprehensive chapter, we will delve into the core functionalities of the Keras API, exploring its robust suite of layers and model-building tools. You will gain hands-on experience in constructing, training, and evaluating deep learning models using Keras. Furthermore, we will cover advanced techniques for fine-tuning these models to achieve optimal performance across various tasks and datasets. By the end of this chapter, you will have a solid foundation in utilizing Keras for a wide range of deep learning applications, from basic neural networks to sophisticated architectures.

The Keras API in TensorFlow 2.x offers a highly intuitive and user-friendly interface for constructing neural networks. By abstracting away the intricate details of model construction, training, and evaluation, Keras allows developers to concentrate on the higher-level aspects of their network architecture and performance optimization. This abstraction significantly reduces the learning curve for newcomers to deep learning while providing advanced practitioners with powerful tools to build sophisticated models efficiently.

Keras supports two main approaches for model building: the Sequential API and the Functional API. The Sequential API is ideal for straightforward, linear stack of layers, making it perfect for beginners or simple model architectures. On the other hand, the Functional API offers greater flexibility, enabling the creation of complex model topologies with multiple inputs, outputs, or branching layers. This versatility allows developers to implement a wide range of advanced architectures, from basic feedforward networks to sophisticated models like ResNet or Inception.

In TensorFlow 2.x, Keras has been deeply integrated as the default high-level API for deep learning. This integration brings several advantages, including seamless compatibility with TensorFlow's core features. For instance, eager execution in TensorFlow 2.x allows for immediate operation evaluation, making debugging and prototyping significantly easier.

The built-in model saving and loading capabilities ensure that trained models can be easily persisted and reused across different environments. Furthermore, Keras in TensorFlow 2.x supports distributed training out of the box, enabling developers to leverage multiple GPUs or even TPUs for accelerated model training without the need for extensive low-level coding.

Whether you're a novice taking your first steps in machine learning or an experienced data scientist working on cutting-edge projects, Keras simplifies the process of building robust and scalable machine learning models. Its intuitive design philosophy, coupled with the powerful backing of TensorFlow, makes it an invaluable tool in the modern deep learning ecosystem.

By providing a high-level interface without sacrificing flexibility or performance, Keras empowers developers to rapidly prototype ideas, experiment with different architectures, and deploy production-ready models with confidence.

3.1.1 Key Features of Keras API

Ease of use: Keras provides a clear and intuitive API that simplifies the process of building neural networks. Its user-friendly syntax allows developers to quickly prototype and experiment with different model architectures, making it accessible for both beginners and experienced practitioners. The straightforward way of defining layers and connecting them reduces the learning curve and accelerates the development process.
Modularity: Keras embraces a modular design philosophy, allowing models to be constructed as either a sequence of layers or a more complex graph of interconnected components. This flexibility enables the creation of a wide range of architectures, from simple feedforward networks to sophisticated models with multiple inputs and outputs. Each layer and component in Keras is fully customizable, giving developers fine-grained control over their model's behavior and structure.
Support for multiple backends: While Keras is now tightly integrated with TensorFlow, it was originally designed to be backend-agnostic. This means it can run on different computational backends, including Theano and CNTK. This flexibility allows developers to choose the backend that best suits their needs, whether for performance reasons or compatibility with existing infrastructure. Although TensorFlow is now the primary backend, the multi-backend support showcases Keras' versatility and adaptability.
Extensibility: Keras provides a rich set of built-in layers, loss functions, and optimizers, but it also allows for extensive customization. Developers can create custom layers to implement novel architectures or specialized operations not available in the standard library. Similarly, custom loss functions can be defined to optimize models for specific tasks or metrics. The ability to create custom optimizers enables fine-tuning of the learning process. This extensibility makes Keras suitable for cutting-edge research and unique application requirements.
Built-in support for multiple GPU/TPU training: Keras seamlessly integrates with TensorFlow's distributed training capabilities, allowing models to be trained across multiple GPUs or TPUs without requiring significant code changes. This feature is crucial for scaling up to large datasets and complex models, significantly reducing training times. The built-in support simplifies the process of leveraging parallel computing resources, making it accessible to developers who may not have expertise in distributed systems.

3.1.2 Keras Model Types: Sequential vs. Functional API

Keras offers two primary approaches for constructing neural network models, each with its own strengths and use cases:

Sequential API: This API is designed for building straightforward, linear models where layers are stacked sequentially. It's ideal for:
- Beginners who are just starting with deep learning
- Simple feedforward neural networks
- Models where each layer has exactly one input tensor and one output tensor
- Quick prototyping of basic architectures
Functional API: This more advanced API provides greater flexibility and power, allowing for the creation of complex model architectures. It's suitable for:
- Experienced developers working on sophisticated neural network designs
- Models with multiple inputs or outputs
- Models with shared layers (where a single layer is used at multiple points in the network)
- Models with non-linear topology (e.g., residual connections, concatenations)
- Implementing advanced architectures like inception networks or siamese networks

The choice between these APIs depends on the complexity of your model and your specific requirements. While the Sequential API is more beginner-friendly and sufficient for many common tasks, the Functional API opens up possibilities for creating highly customized and intricate neural network architectures.

Sequential API

The Sequential API is the most straightforward and intuitive way to build a neural network in Keras. This API allows you to construct models by stacking layers one by one in a linear sequence, which is ideal for the majority of basic machine learning tasks. The simplicity of the Sequential API makes it particularly well-suited for beginners who are just starting their journey in deep learning.

With the Sequential API, you create a model by instantiating a Sequential object and then adding layers to it in the order you want them to be executed. This approach mirrors the conceptual process of designing a neural network, where you typically think about the flow of data from the input layer through various hidden layers to the output layer.

The linear nature of the Sequential API makes it perfect for a wide range of common model architectures, including:

Simple feedforward neural networks
Convolutional Neural Networks (CNNs) for image processing tasks
Recurrent Neural Networks (RNNs) for sequence data
Basic autoencoders for dimensionality reduction

While the Sequential API is powerful enough for many applications, it's important to note that it has limitations when it comes to more complex model architectures. For instance, models with multiple inputs or outputs, shared layers, or non-linear topology (like residual connections) are better suited for the Functional API. However, for most basic models and many intermediate-level tasks, the Sequential API provides a clean, readable, and efficient way to define and train neural networks.

Example: Building a Neural Network with the Sequential API

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras.datasets import mnist
import matplotlib.pyplot as plt

# Load and preprocess the MNIST dataset
(X_train, y_train), (X_test, y_test) = mnist.load_data()
X_train, X_test = X_train / 255.0, X_test / 255.0  # Normalize pixel values to [0, 1]

# Define a simple feedforward neural network using the Sequential API
model = Sequential([
    Flatten(input_shape=(28, 28)),  # Flatten 28x28 images to a 1D vector of 784 elements
    Dense(128, activation='relu'),  # Hidden layer with 128 units and ReLU activation
    Dense(64, activation='relu'),   # Second hidden layer with 64 units and ReLU activation
    Dense(10, activation='softmax') # Output layer with 10 units for classification (0-9 digits)
])

# Compile the model
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

# Display the model summary
model.summary()

# Train the model
history = model.fit(X_train, y_train, epochs=10, validation_split=0.2, batch_size=32, verbose=1)

# Evaluate the model on the test set
test_loss, test_accuracy = model.evaluate(X_test, y_test, verbose=0)
print(f"Test accuracy: {test_accuracy:.4f}")

# Plot training history
plt.figure(figsize=(12, 4))
plt.subplot(1, 2, 1)
plt.plot(history.history['accuracy'], label='Training Accuracy')
plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
plt.title('Model Accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()

plt.subplot(1, 2, 2)
plt.plot(history.history['loss'], label='Training Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.title('Model Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()

plt.tight_layout()
plt.show()

# Make predictions on a few test images
predictions = model.predict(X_test[:5])
predicted_labels = tf.argmax(predictions, axis=1)

# Display the images and predictions
fig, axes = plt.subplots(1, 5, figsize=(15, 3))
for i, ax in enumerate(axes):
    ax.imshow(X_test[i], cmap='gray')
    ax.set_title(f"Predicted: {predicted_labels[i]}\nActual: {y_test[i]}")
    ax.axis('off')
plt.tight_layout()
plt.show()

Code Breakdown:

Importing Libraries:
- We import TensorFlow, Keras modules, and Matplotlib for visualization.
Loading and Preprocessing Data:
- The MNIST dataset is loaded using mnist.load_data().
- Input images are normalized by dividing by 255 to scale pixel values to [0, 1].
Model Architecture:
- We use the Sequential API to build a simple feedforward neural network.
- Flatten(input_shape=(28, 28)): Converts 28x28 images into 1D vectors of 784 elements.
- Dense(128, activation='relu'): First hidden layer with 128 neurons and ReLU activation.
- Dense(64, activation='relu'): Second hidden layer with 64 neurons and ReLU activation.
- Dense(10, activation='softmax'): Output layer with 10 neurons (one for each digit) and softmax activation for multi-class classification.
Model Compilation:
- Optimizer: 'adam' - An efficient stochastic gradient descent algorithm.
- Loss function: 'sparse_categorical_crossentropy' - Suitable for multi-class classification with integer labels.
- Metrics: 'accuracy' - To monitor the model's performance during training.
Model Summary:
- model.summary() displays a summary of the model architecture, including the number of parameters in each layer.
Model Training:
- model.fit() trains the model for 10 epochs.
- 20% of the training data is used for validation (validation_split=0.2).
- Batch size of 32 is used for mini-batch gradient descent.
Model Evaluation:
- The trained model is evaluated on the test set to assess its generalization performance.
Visualizing Training History:
- Two plots are created to visualize the training and validation accuracy and loss over epochs.
- This helps in identifying overfitting or underfitting issues.
Making Predictions:
- The model makes predictions on the first 5 test images.
- tf.argmax() is used to convert softmax probabilities to class labels.
Displaying Results:
- The first 5 test images are displayed along with their predicted and actual labels.
- This provides a visual confirmation of the model's performance on individual examples.

This example provides a comprehensive view of the entire machine learning workflow, from data preparation to model evaluation and result visualization, using the Keras Sequential API.

Functional API

The Functional API offers significantly more flexibility and power compared to the Sequential API. It enables developers to create sophisticated model architectures where layers can be connected in complex, non-linear ways. This flexibility is crucial for implementing advanced deep learning concepts such as:

Shared layers: The ability to use the same layer multiple times in a model, which can reduce the number of parameters and enforce feature sharing across different parts of the network.
Skip connections: Also known as shortcut connections, these allow information to bypass one or more layers, which can help mitigate the vanishing gradient problem in very deep networks.
Multi-input and multi-output models: The Functional API allows for models that can process multiple input sources or produce multiple outputs, which is essential for tasks that require integrating diverse data types or predicting multiple related targets.

The Functional API is indispensable for constructing state-of-the-art architectures such as:

ResNet (Residual Networks): These networks use skip connections to enable training of very deep networks, sometimes hundreds of layers deep, which was previously challenging due to the vanishing gradient problem.
Inception: This architecture uses parallel convolutional layers with different filter sizes, allowing the network to capture features at multiple scales simultaneously.
Siamese networks: These are twin networks that share weights and are used for tasks like similarity comparison or one-shot learning.

Moreover, the Functional API facilitates the creation of custom layers and the implementation of novel architectural ideas, making it an essential tool for researchers pushing the boundaries of deep learning. Its flexibility allows for rapid prototyping and experimentation with complex model designs, which is crucial for tackling challenging problems in computer vision, natural language processing, and other domains of artificial intelligence.

Example: Building a Neural Network with the Functional API

import tensorflow as tf
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Dense, Flatten, Dropout
from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical
import matplotlib.pyplot as plt

# Load and preprocess the MNIST dataset
(X_train, y_train), (X_test, y_test) = mnist.load_data()
X_train, X_test = X_train / 255.0, X_test / 255.0  # Normalize pixel values to [0, 1]

# Convert labels to one-hot encoding
y_train = to_categorical(y_train, 10)
y_test = to_categorical(y_test, 10)

# Define the input layer
inputs = Input(shape=(28, 28))

# Add a Flatten layer and Dense layers with Dropout
x = Flatten()(inputs)
x = Dense(256, activation='relu')(x)
x = Dropout(0.3)(x)
x = Dense(128, activation='relu')(x)
x = Dropout(0.3)(x)
x = Dense(64, activation='relu')(x)

# Define the output layer
outputs = Dense(10, activation='softmax')(x)

# Create the model
model = Model(inputs=inputs, outputs=outputs)

# Compile the model
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# Display the model summary
model.summary()

# Train the model
history = model.fit(X_train, y_train, epochs=20, batch_size=128, validation_split=0.2, verbose=1)

# Evaluate the model on the test set
test_loss, test_accuracy = model.evaluate(X_test, y_test, verbose=0)
print(f"Test accuracy: {test_accuracy:.4f}")

# Plot training history
plt.figure(figsize=(12, 4))
plt.subplot(1, 2, 1)
plt.plot(history.history['accuracy'], label='Training Accuracy')
plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
plt.title('Model Accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()

plt.subplot(1, 2, 2)
plt.plot(history.history['loss'], label='Training Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.title('Model Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()

plt.tight_layout()
plt.show()

# Make predictions on a few test images
predictions = model.predict(X_test[:5])
predicted_labels = tf.argmax(predictions, axis=1)

# Display the images and predictions
fig, axes = plt.subplots(1, 5, figsize=(15, 3))
for i, ax in enumerate(axes):
    ax.imshow(X_test[i].reshape(28, 28), cmap='gray')
    ax.set_title(f"Predicted: {predicted_labels[i]}\nActual: {tf.argmax(y_test[i])}")
    ax.axis('off')
plt.tight_layout()
plt.show()

Code Breakdown:

Importing Libraries: We import TensorFlow, Keras modules, and Matplotlib for visualization.
Loading and Preprocessing Data:
- The MNIST dataset is loaded using mnist.load_data().
- Input images are normalized by dividing by 255 to scale pixel values to [0, 1].
- Labels are converted to one-hot encoding using to_categorical().
Model Architecture:
- We use the Functional API to build a more complex neural network.
- Input(shape=(28, 28)): Defines the input shape for 28x28 images.
- Flatten(): Converts 28x28 images into 1D vectors of 784 elements.
- Three Dense layers with ReLU activation (256, 128, and 64 neurons).
- Dropout layers (with rate 0.3) are added after the first two Dense layers to prevent overfitting.
- Output layer: Dense(10, activation='softmax') for 10-class classification.
Model Compilation:
- Optimizer: 'adam' - An efficient stochastic gradient descent algorithm.
- Loss function: 'categorical_crossentropy' - Suitable for multi-class classification with one-hot encoded labels.
- Metrics: 'accuracy' - To monitor the model's performance during training.
Model Summary: model.summary() displays a summary of the model architecture, including the number of parameters in each layer.
Model Training:
- model.fit() trains the model for 20 epochs.
- 20% of the training data is used for validation (validation_split=0.2).
- Batch size of 128 is used for mini-batch gradient descent.
Model Evaluation: The trained model is evaluated on the test set to assess its generalization performance.
Visualizing Training History:
- Two plots are created to visualize the training and validation accuracy and loss over epochs.
- This helps in identifying overfitting or underfitting issues.
Making Predictions:
- The model makes predictions on the first 5 test images.
- tf.argmax() is used to convert one-hot encoded predictions and labels back to class indices.
Displaying Results:
- The first 5 test images are displayed along with their predicted and actual labels.
- This provides a visual confirmation of the model's performance on individual examples.

This comprehensive example demonstrates the full workflow of building, training, evaluating, and visualizing a neural network using the Keras Functional API. It includes additional features like dropout for regularization, visualization of training history, and display of model predictions, providing a more complete picture of the deep learning process.

3.1.3 Compiling and Training the Model

Once you've defined the architecture of your model, the next step is to compile it. This crucial step prepares your model for training by setting up the learning process. Compilation involves specifying three key components:

The optimizer: This crucial component governs the model's weight adjustment process during training. Popular choices include Adam, which adapts learning rates for each parameter; SGD (Stochastic Gradient Descent), known for its simplicity and effectiveness; and RMSprop, which excels in handling non-stationary objectives. The selection of an optimizer can significantly impact the model's convergence speed and final performance.
The loss function: This mathematical measure quantifies the disparity between predicted and actual values, serving as a compass for the model's performance. The choice of loss function is task-dependent: binary crossentropy is ideal for binary classification tasks, categorical crossentropy suits multi-class problems, while mean squared error is the go-to for regression scenarios. Selecting an appropriate loss function is crucial for guiding the model towards optimal performance.
The metrics: These evaluation tools provide tangible insights into the model's performance during both training and testing phases. While the loss function steers the learning process, metrics offer more interpretable measures of model efficacy. For classification tasks, accuracy is a common metric, while regression problems often employ mean absolute error or root mean squared error. These metrics help data scientists and stakeholders gauge the model's real-world applicability and track improvements over time.

After compiling the model, you can proceed to train it using the fit() function. This function is where the actual learning takes place. It takes in the training data and iteratively adjusts the model's parameters to minimize the loss function. The training process occurs over several epochs, where an epoch represents one complete pass through the entire training dataset.

During each epoch, the model makes predictions on the training data, calculates the loss, and updates its weights based on the chosen optimizer. The fit() function also allows you to specify various training parameters, such as batch size (the number of samples processed before the model is updated) and validation data (used to monitor the model's performance on unseen data during training).

Example: Compiling and Training a Model

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.losses import SparseCategoricalCrossentropy
from tensorflow.keras.metrics import SparseCategoricalAccuracy
import matplotlib.pyplot as plt

# Assume X_train, y_train, X_test, y_test are prepared

# Define the model
model = Sequential([
    Flatten(input_shape=(28, 28)),
    Dense(128, activation='relu'),
    Dense(64, activation='relu'),
    Dense(10, activation='softmax')
])

# Compile the model with Adam optimizer and sparse categorical crossentropy loss
model.compile(
    optimizer=Adam(learning_rate=0.001),
    loss=SparseCategoricalCrossentropy(from_logits=False),
    metrics=[SparseCategoricalAccuracy()]
)

# Display model summary
model.summary()

# Train the model on training data
history = model.fit(
    X_train, y_train,
    epochs=10,
    batch_size=32,
    validation_data=(X_test, y_test),
    verbose=1
)

# Evaluate the model on test data
test_loss, test_accuracy = model.evaluate(X_test, y_test, verbose=0)
print(f"Test accuracy: {test_accuracy:.4f}")

# Plot training history
plt.figure(figsize=(12, 4))
plt.subplot(1, 2, 1)
plt.plot(history.history['sparse_categorical_accuracy'], label='Training Accuracy')
plt.plot(history.history['val_sparse_categorical_accuracy'], label='Validation Accuracy')
plt.title('Model Accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()

plt.subplot(1, 2, 2)
plt.plot(history.history['loss'], label='Training Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.title('Model Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()

plt.tight_layout()
plt.show()

Code Breakdown:

Imports: We import necessary modules from TensorFlow and Keras, including specific optimizer, loss, and metric classes. Matplotlib is imported for visualization.
Model Definition: A Sequential model is created with a Flatten layer and three Dense layers. The Flatten layer converts the 2D input (28x28 image) to a 1D array. The two hidden layers use ReLU activation, while the output layer uses softmax for multi-class classification.
Model Compilation:
- Optimizer: We use Adam optimizer with a specified learning rate of 0.001.
- Loss: SparseCategoricalCrossentropy is used as it's suitable for multi-class classification when labels are integers.
- Metrics: SparseCategoricalAccuracy is used to monitor the model's performance during training.
Model Summary: Displays a summary of the model architecture, including the number of parameters in each layer.
Model Training:
- The fit() method is called with training data (X_train, y_train).
- Training runs for 10 epochs with a batch size of 32.
- Validation data (X_test, y_test) is provided to monitor performance on unseen data.
- verbose=1 ensures that training progress is displayed.
Model Evaluation: After training, the model is evaluated on the test set to assess its generalization performance.
Visualization: Two plots are created to visualize the training history:
- The first plot shows training and validation accuracy over epochs.
- The second plot shows training and validation loss over epochs.
- These plots help in identifying overfitting or underfitting issues.

This example offers a comprehensive view of the model training process. It covers model definition, compilation with specific parameters, training with validation, evaluation, and visualization of training history. By showcasing these steps, it exemplifies best practices in deep learning model development and analysis.

3.1.4 Evaluating and Testing the Model

After training the model, it's crucial to assess its performance on unseen data to gauge its generalization capability. This evaluation is typically done using a separate test dataset that the model hasn't encountered during training. Keras provides a convenient evaluate() method for this purpose. This method takes the test data as input and returns two key metrics:

Loss: This value quantifies the model's prediction error on the test set. A lower loss indicates better performance.

Accuracy: This metric represents the proportion of correct predictions made by the model on the test set. It's expressed as a value between 0 and 1, where 1 indicates perfect accuracy.

By examining these metrics, you can gain valuable insights into how well your model is likely to perform on new, unseen data in real-world scenarios. This evaluation step is critical for assessing the model's practical utility and identifying potential issues like overfitting or underfitting.

Example: Evaluating the Model

import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix, classification_report

# Evaluate the model on test data
test_loss, test_accuracy = model.evaluate(X_test, y_test, verbose=0)
print(f"Test Loss: {test_loss:.4f}")
print(f"Test Accuracy: {test_accuracy:.4f}")

# Make predictions on test data
y_pred = model.predict(X_test)
y_pred_classes = np.argmax(y_pred, axis=1)
y_true = y_test  # Corrected: No need for np.argmax since y_test contains integer labels

# Compute confusion matrix
cm = confusion_matrix(y_true, y_pred_classes)

# Plot confusion matrix
plt.figure(figsize=(10, 8))
plt.imshow(cm, interpolation='nearest', cmap=plt.cm.Blues)
plt.title('Confusion Matrix')
plt.colorbar()
tick_marks = np.arange(10)
plt.xticks(tick_marks, range(10))
plt.yticks(tick_marks, range(10))
plt.xlabel('Predicted Label')
plt.ylabel('True Label')

# Add text annotations to the confusion matrix
thresh = cm.max() / 2.
for i, j in np.ndindex(cm.shape):
    plt.text(j, i, format(cm[i, j], 'd'),
             horizontalalignment="center",
             color="white" if cm[i, j] > thresh else "black")

plt.tight_layout()
plt.show()

# Print classification report
print("\nClassification Report:")
print(classification_report(y_true, y_pred_classes))

# Visualize some predictions
n_samples = 5
sample_indices = np.random.choice(len(X_test), n_samples, replace=False)

plt.figure(figsize=(15, 3))
for i, idx in enumerate(sample_indices):
    plt.subplot(1, n_samples, i + 1)
    plt.imshow(X_test[idx].reshape(28, 28), cmap='gray')
    plt.title(f"True: {y_true[idx]}\nPred: {y_pred_classes[idx]}")
    plt.axis('off')

plt.tight_layout()
plt.show()

This code example provides a comprehensive evaluation of the model's performance.

Here's a breakdown of the additions:

Importing necessary libraries: We import numpy for numerical operations, matplotlib for plotting, and sklearn.metrics for evaluation metrics.
Model Evaluation: We use model.evaluate() to get the test loss and accuracy, printing both with more decimal places for precision.
Predictions: We generate predictions for the entire test set using model.predict() and convert both predictions and true labels from one-hot encoded form to class indices.
Confusion Matrix: We compute and visualize the confusion matrix using sklearn's confusion_matrix and matplotlib. This shows how well the model distinguishes between classes.
Classification Report: We print a detailed classification report using sklearn's classification_report, which provides precision, recall, and F1-score for each class.
Sample Predictions Visualization: We randomly select and display a few test images along with their true and predicted labels. This gives a qualitative sense of the model's performance.

This comprehensive evaluation provides both quantitative metrics (accuracy, precision, recall) and qualitative insights (confusion matrix, sample predictions) into the model's performance, allowing for a more thorough understanding of its strengths and weaknesses across different classes.