Chapter 5: Convolutional Neural Networks (CNNs)
5.2 Implementing CNNs with TensorFlow, Keras, and PyTorch
Convolutional Neural Networks (CNNs) can be implemented using various deep learning frameworks, with TensorFlow, Keras, and PyTorch being among the most popular and versatile options. Each framework offers unique advantages:
- TensorFlow provides a robust and highly scalable infrastructure for deep learning, making it suitable for large-scale deployments and production environments.
- Keras offers a user-friendly API that simplifies model development, making it an excellent choice for beginners and rapid prototyping.
- PyTorch stands out for its dynamic computation graph and Pythonic interface, offering greater flexibility and ease of debugging, which is particularly advantageous in research settings.
To illustrate the implementation of CNNs across these frameworks, we will focus on developing a model for the MNIST dataset. This classic dataset consists of handwritten digits ranging from 0 to 9, serving as an ideal benchmark for image classification tasks. By building and training the same network architecture using TensorFlow, Keras, and PyTorch, we can compare and contrast the syntax, workflow, and unique features of each framework.
This comparative approach will provide valuable insights into the strengths and characteristics of each platform, helping you choose the most suitable framework for your specific deep learning projects.
5.2.1 Implementing CNN with TensorFlow
TensorFlow is a powerful and scalable deep learning framework that has gained widespread adoption in both research and production environments. Developed by Google, TensorFlow offers a comprehensive ecosystem for building and deploying machine learning models, with particular strengths in neural networks and deep learning.
Key features of TensorFlow include:
- Flexible architecture: TensorFlow supports both eager execution for immediate operation evaluation and graph-based execution for optimized performance.
- Scalability: It can run on various platforms, from mobile devices to large-scale distributed systems, making it suitable for a wide range of applications.
- Rich ecosystem: TensorFlow comes with a vast library of pre-built models, tools for visualization (TensorBoard), and extensions for specific domains like TensorFlow Lite for mobile and edge devices.
- Strong community support: With a large and active community, TensorFlow benefits from continuous improvements and a wealth of resources for developers.
Let's explore how to implement a Convolutional Neural Network (CNN) using TensorFlow's low-level API. This approach provides greater control over the model architecture and training process, allowing for fine-grained customization and optimization.
Example: CNN in TensorFlow
import tensorflow as tf
from tensorflow.keras import datasets, layers, models
import matplotlib.pyplot as plt
# Load the MNIST dataset
(X_train, y_train), (X_test, y_test) = datasets.mnist.load_data()
# Preprocess the data (reshape and normalize)
X_train = X_train.reshape(-1, 28, 28, 1).astype('float32') / 255.0
X_test = X_test.reshape(-1, 28, 28, 1).astype('float32') / 255.0
# Define the CNN model
model = models.Sequential([
layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)),
layers.MaxPooling2D((2, 2)),
layers.Conv2D(64, (3, 3), activation='relu'),
layers.MaxPooling2D((2, 2)),
layers.Conv2D(64, (3, 3), activation='relu'),
layers.Flatten(),
layers.Dense(64, activation='relu'),
layers.Dropout(0.5),
layers.Dense(10, activation='softmax')
])
# Compile the model
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
# Train the model
history = model.fit(X_train, y_train, epochs=10,
validation_data=(X_test, y_test),
batch_size=64)
# Evaluate the model on the test set
test_loss, test_acc = model.evaluate(X_test, y_test, verbose=2)
print(f"Test Accuracy: {test_acc:.4f}")
# Plot training history
plt.figure(figsize=(12, 4))
plt.subplot(1, 2, 1)
plt.plot(history.history['accuracy'], label='Training Accuracy')
plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
plt.title('Model Accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()
plt.subplot(1, 2, 2)
plt.plot(history.history['loss'], label='Training Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.title('Model Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.tight_layout()
plt.show()
# Make predictions on test data
predictions = model.predict(X_test)
# Display some test images and their predictions
fig, axes = plt.subplots(3, 3, figsize=(12, 12))
for i, ax in enumerate(axes.flat):
ax.imshow(X_test[i].reshape(28, 28), cmap='gray')
ax.set_title(f"True: {y_test[i]}, Predicted: {predictions[i].argmax()}")
ax.axis('off')
plt.tight_layout()
plt.show()
Breakdown of the CNN Implementation:
- Imports and Data Preparation
- TensorFlow, Keras components, and Matplotlib are imported for model creation and visualization.
- The MNIST dataset is loaded, images are reshaped to
(28, 28, 1)
, and pixel values are normalized to the[0, 1]
range to improve training efficiency.
- CNN Model Definition
- The model is defined using
tf.keras.Sequential
, which simplifies the layer stacking process. - It consists of three convolutional layers (
Conv2D
), two max pooling layers (MaxPooling2D
), a flattening layer, one dense layer withReLU
activation, a dropout layer to prevent overfitting, and a final dense layer withsoftmax
for classification.
- The model is defined using
- Model Compilation
- The Adam optimizer is used for efficient learning.
- Sparse categorical cross-entropy is chosen as the loss function, since the labels are integers.
- Accuracy is used as the evaluation metric.
- Model Training
- The model is trained for 10 epochs with a batch size of 64.
- The
validation_data
parameter is set to evaluate the model on the test set during training, allowing us to monitor potential overfitting.
- Model Evaluation
- The trained model is evaluated on the test set using
model.evaluate()
, and the final test accuracy is printed.
- The trained model is evaluated on the test set using
- Visualization of Training History
- Training and validation accuracy and loss are plotted over epochs.
- This helps analyze how well the model is learning and if any overfitting is occurring.
- Making Predictions and Visualizing Results
- The trained model is used to make predictions on the test set.
- A 3x3 grid of test images is displayed with their true labels and predicted classes.
This implementation provides a structured approach to training a CNN on MNIST, covering data preparation, model definition, training, evaluation, and visualization of results. The use of Sequential()
simplifies model creation, and dropout is included to enhance generalization.
5.2.2 Implementing CNN with Keras
Keras is a high-level deep learning API that runs on top of TensorFlow, offering a user-friendly interface for building and training neural networks. It significantly simplifies the process of defining, training, and deploying models by abstracting many of the lower-level details that are typically involved in deep learning implementations.
Key features of Keras include:
- Intuitive API: Keras provides a clean and intuitive API that allows developers to quickly prototype and experiment with different model architectures.
- Sequential and Functional APIs: The Sequential API enables rapid model construction by stacking layers linearly, while the Functional API offers more flexibility for complex model architectures.
- Built-in layers and models: Keras comes with a wide range of pre-built layers (e.g., convolutional, recurrent, pooling) and complete models that can be easily customized.
- Automatic shape inference: Keras can automatically infer the shapes of tensors, reducing the need for manual shape calculations.
With its focus on ease of use and rapid development, Keras is particularly well-suited for:
- Beginners in deep learning who want to quickly grasp the fundamentals of building neural networks.
- Researchers who need to prototype and iterate on ideas rapidly.
- Industry practitioners looking to streamline the development process for production-ready models.
By leveraging the power of TensorFlow while providing a more accessible interface, Keras strikes a balance between simplicity and performance, making it a popular choice in the deep learning community.
Example: CNN in Keras
import tensorflow as tf
from tensorflow.keras import datasets, layers, models
import matplotlib.pyplot as plt
# Load the MNIST dataset
(X_train, y_train), (X_test, y_test) = datasets.mnist.load_data()
# Preprocess the data (reshape and normalize)
X_train = X_train.reshape(-1, 28, 28, 1).astype('float32') / 255.0
X_test = X_test.reshape(-1, 28, 28, 1).astype('float32') / 255.0
# Define the CNN model using Keras Sequential API
model = models.Sequential([
layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)),
layers.MaxPooling2D((2, 2)),
layers.Conv2D(64, (3, 3), activation='relu'),
layers.MaxPooling2D((2, 2)),
layers.Conv2D(64, (3, 3), activation='relu'),
layers.Flatten(),
layers.Dense(64, activation='relu'),
layers.Dense(10, activation='softmax')
])
# Display model summary
model.summary()
# Compile the model
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
# Train the model
history = model.fit(X_train, y_train, epochs=10,
validation_data=(X_test, y_test),
batch_size=64)
# Evaluate the model on the test set
test_loss, test_acc = model.evaluate(X_test, y_test, verbose=2)
print(f"Test Accuracy: {test_acc:.4f}")
# Plot training history
plt.figure(figsize=(12, 4))
plt.subplot(1, 2, 1)
plt.plot(history.history['accuracy'], label='Training Accuracy')
plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
plt.title('Model Accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()
plt.subplot(1, 2, 2)
plt.plot(history.history['loss'], label='Training Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.title('Model Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.tight_layout()
plt.show()
# Make predictions on test data
predictions = model.predict(X_test)
# Display some test images and their predictions
fig, axes = plt.subplots(3, 3, figsize=(12, 12))
for i, ax in enumerate(axes.flat):
ax.imshow(X_test[i].reshape(28, 28), cmap='gray')
ax.set_title(f"True: {y_test[i]}, Predicted: {predictions[i].argmax()}")
ax.axis('off')
plt.tight_layout()
plt.show()
Code Breakdown of the CNN Implementation:
- Imports and Data Preparation:
- We import TensorFlow, Keras components, and Matplotlib for visualization.
- The MNIST dataset is loaded using Keras datasets.
- Images are reshaped to (28, 28, 1) and normalized to the [0, 1] range.
- CNN Model Definition:
- We use the Keras Sequential API to define our model.
- The model consists of three Conv2D layers, two MaxPooling2D layers, a Flatten layer, and two Dense layers.
- We use ReLU activation for hidden layers and softmax for the output layer.
- Model Summary:
- model.summary() provides a detailed view of the model's architecture, including the number of parameters in each layer.
- Model Compilation:
- We use the Adam optimizer and sparse categorical cross-entropy loss.
- Accuracy is chosen as the evaluation metric.
- Model Training:
- The model is trained for 10 epochs with a batch size of 64.
- We use validation_data to monitor performance on the test set during training.
- The training history is stored for later visualization.
- Model Evaluation:
- After training, we evaluate the model on the test set and print the test accuracy.
- Visualization of Training History:
- We plot the training and validation accuracy and loss over epochs.
- This helps in understanding the model's learning progress and identifying potential overfitting.
- Making Predictions and Visualizing Results:
- We use the trained model to make predictions on the test set.
- A 3x3 grid of test images is displayed along with their true labels and model predictions.
This implementation provides a comprehensive view of the CNN training process, including data preparation, model definition, training, evaluation, and result visualization. The added visualizations help in understanding the model's performance and its predictions on actual test data.
5.2.3 Implementing CNN with PyTorch
PyTorch is renowned for its flexibility and user-friendly approach, making it a popular choice in research environments. Unlike TensorFlow and Keras, which use static computation graphs, PyTorch employs dynamic computation graphs. This key difference offers several advantages:
- Greater control over the forward pass: Dynamic graphs allow researchers to modify network behavior on-the-fly, enabling more complex and adaptive architectures.
- Easier debugging: With PyTorch, you can use standard Python debugging tools to inspect your models at runtime, making it simpler to identify and fix issues.
- Intuitive coding: PyTorch's syntax closely resembles standard Python, reducing the learning curve for many developers.
- Better support for variable-length inputs: Dynamic graphs are particularly useful for tasks involving sequences of varying lengths, such as natural language processing.
- Immediate execution: Operations in PyTorch are executed as they're defined, providing instant feedback and facilitating rapid prototyping.
These features make PyTorch an excellent choice for researchers exploring novel network architectures or working with complex, dynamic models. Its design philosophy prioritizes clarity and flexibility, allowing for more natural expression of deep learning algorithms.
Example: CNN in PyTorch
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader
import matplotlib.pyplot as plt
import numpy as np
# Define the CNN model in PyTorch
class SimpleCNN(nn.Module):
def __init__(self):
super(SimpleCNN, self).__init__()
self.conv1 = nn.Conv2d(1, 32, kernel_size=3)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(32, 64, kernel_size=3)
self.fc1 = nn.Linear(64 * 5 * 5, 128)
self.fc2 = nn.Linear(128, 10)
def forward(self, x):
x = self.pool(torch.relu(self.conv1(x)))
x = self.pool(torch.relu(self.conv2(x)))
x = x.view(-1, 64 * 5 * 5)
x = torch.relu(self.fc1(x))
return self.fc2(x)
# Set device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# Preprocess the data
transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.5,), (0.5,))
])
# Load datasets
train_dataset = datasets.MNIST(root='./data', train=True, download=True, transform=transform)
test_dataset = datasets.MNIST(root='./data', train=False, download=True, transform=transform)
# Create data loaders
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False)
# Instantiate the model, define the loss function and optimizer
model = SimpleCNN().to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)
# Training loop
epochs = 10
train_losses = []
train_accuracies = []
test_accuracies = []
for epoch in range(epochs):
model.train()
running_loss = 0.0
correct = 0
total = 0
for inputs, labels in train_loader:
inputs, labels = inputs.to(device), labels.to(device)
optimizer.zero_grad()
outputs = model(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
running_loss += loss.item()
_, predicted = outputs.max(1)
total += labels.size(0)
correct += predicted.eq(labels).sum().item()
train_loss = running_loss / len(train_loader)
train_accuracy = 100. * correct / total
train_losses.append(train_loss)
train_accuracies.append(train_accuracy)
# Evaluate on test set
model.eval()
test_correct = 0
test_total = 0
with torch.no_grad():
for inputs, labels in test_loader:
inputs, labels = inputs.to(device), labels.to(device)
outputs = model(inputs)
_, predicted = outputs.max(1)
test_total += labels.size(0)
test_correct += predicted.eq(labels).sum().item()
test_accuracy = 100. * test_correct / test_total
test_accuracies.append(test_accuracy)
print(f"Epoch {epoch+1}/{epochs}")
print(f"Train Loss: {train_loss:.4f}, Train Accuracy: {train_accuracy:.2f}%")
print(f"Test Accuracy: {test_accuracy:.2f}%")
print("-" * 50)
# Plot training history
plt.figure(figsize=(12, 4))
plt.subplot(1, 2, 1)
plt.plot(train_losses, label='Train Loss')
plt.title('Training Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.subplot(1, 2, 2)
plt.plot(train_accuracies, label='Train Accuracy')
plt.plot(test_accuracies, label='Test Accuracy')
plt.title('Accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy (%)')
plt.legend()
plt.tight_layout()
plt.show()
# Evaluate the final model
model.eval()
correct = 0
total = 0
with torch.no_grad():
for inputs, labels in test_loader:
inputs, labels = inputs.to(device), labels.to(device)
outputs = model(inputs)
_, predicted = outputs.max(1)
total += labels.size(0)
correct += predicted.eq(labels).sum().item()
print(f'Final Test Accuracy: {100 * correct / total:.2f}%')
# Visualize some predictions
def imshow(img):
img = img / 2 + 0.5 # unnormalize
npimg = img.numpy()
plt.imshow(np.transpose(npimg, (1, 2, 0)), cmap="gray")
plt.axis('off')
dataiter = iter(test_loader)
images, labels = next(dataiter)
# Get predictions
outputs = model(images.to(device))
_, predicted = torch.max(outputs, 1)
# Plot images and predictions
fig = plt.figure(figsize=(12, 4))
for i in range(12):
ax = fig.add_subplot(2, 6, i+1, xticks=[], yticks=[])
imshow(images[i])
ax.set_title(f"Pred: {predicted[i].item()} (True: {labels[i].item()})",
color=("green" if predicted[i] == labels[i] else "red"))
plt.tight_layout()
plt.show()
Code Breakdown of the CNN Implementation:
- Imports and Setup:
- We import necessary PyTorch modules, including
nn
for defining neural network layers,optim
for optimization algorithms, andtorchvision
for handling datasets and transformations. matplotlib
andnumpy
are imported for visualizing training progress and model predictions.
- We import necessary PyTorch modules, including
- CNN Model Definition:
- The
SimpleCNN
class is defined, inheriting fromnn.Module
. - It consists of two convolutional layers (
conv1
andconv2
), each followed by ReLU activation and max pooling to extract important features. - Two fully connected layers (
fc1
andfc2
) handle classification after feature extraction. - The
forward
method defines the flow of data through the layers.
- The
- Device Configuration:
- The model is set up to use a GPU if available, allowing for faster training.
- Data Preprocessing and Loading:
- Transformations are defined to convert images into tensors and normalize them for consistent model inputs.
- The MNIST dataset is loaded for both training and testing.
DataLoader
objects are used to efficiently batch and shuffle the data during training.
- Model Instantiation and Training Setup:
- An instance of
SimpleCNN
is created and moved to the selected device. - Cross-entropy loss is used for classification tasks, and the Adam optimizer is chosen for efficient weight updates.
- An instance of
- Training Loop:
- The model is trained over multiple epochs.
- After each epoch, training loss and accuracy are recorded.
- The model is evaluated on the test set at the end of each epoch to track generalization.
- Visualization of Training Progress:
- Training loss and accuracy are plotted over epochs to monitor learning trends.
- Test accuracy is also plotted to identify signs of overfitting or underfitting.
- Final Model Evaluation:
- The trained model is evaluated on the test set to determine its overall classification accuracy.
- Prediction Visualization:
- A few test images are displayed alongside their predicted and actual labels.
- Correct predictions are shown in green, and incorrect ones in red for easy interpretation.
This implementation covers the full workflow of training and evaluating a CNN, ensuring a structured and efficient approach to image classification. It enables easy modification of architecture, hyperparameters, and training settings for further experimentation.
5.2 Implementing CNNs with TensorFlow, Keras, and PyTorch
Convolutional Neural Networks (CNNs) can be implemented using various deep learning frameworks, with TensorFlow, Keras, and PyTorch being among the most popular and versatile options. Each framework offers unique advantages:
- TensorFlow provides a robust and highly scalable infrastructure for deep learning, making it suitable for large-scale deployments and production environments.
- Keras offers a user-friendly API that simplifies model development, making it an excellent choice for beginners and rapid prototyping.
- PyTorch stands out for its dynamic computation graph and Pythonic interface, offering greater flexibility and ease of debugging, which is particularly advantageous in research settings.
To illustrate the implementation of CNNs across these frameworks, we will focus on developing a model for the MNIST dataset. This classic dataset consists of handwritten digits ranging from 0 to 9, serving as an ideal benchmark for image classification tasks. By building and training the same network architecture using TensorFlow, Keras, and PyTorch, we can compare and contrast the syntax, workflow, and unique features of each framework.
This comparative approach will provide valuable insights into the strengths and characteristics of each platform, helping you choose the most suitable framework for your specific deep learning projects.
5.2.1 Implementing CNN with TensorFlow
TensorFlow is a powerful and scalable deep learning framework that has gained widespread adoption in both research and production environments. Developed by Google, TensorFlow offers a comprehensive ecosystem for building and deploying machine learning models, with particular strengths in neural networks and deep learning.
Key features of TensorFlow include:
- Flexible architecture: TensorFlow supports both eager execution for immediate operation evaluation and graph-based execution for optimized performance.
- Scalability: It can run on various platforms, from mobile devices to large-scale distributed systems, making it suitable for a wide range of applications.
- Rich ecosystem: TensorFlow comes with a vast library of pre-built models, tools for visualization (TensorBoard), and extensions for specific domains like TensorFlow Lite for mobile and edge devices.
- Strong community support: With a large and active community, TensorFlow benefits from continuous improvements and a wealth of resources for developers.
Let's explore how to implement a Convolutional Neural Network (CNN) using TensorFlow's low-level API. This approach provides greater control over the model architecture and training process, allowing for fine-grained customization and optimization.
Example: CNN in TensorFlow
import tensorflow as tf
from tensorflow.keras import datasets, layers, models
import matplotlib.pyplot as plt
# Load the MNIST dataset
(X_train, y_train), (X_test, y_test) = datasets.mnist.load_data()
# Preprocess the data (reshape and normalize)
X_train = X_train.reshape(-1, 28, 28, 1).astype('float32') / 255.0
X_test = X_test.reshape(-1, 28, 28, 1).astype('float32') / 255.0
# Define the CNN model
model = models.Sequential([
layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)),
layers.MaxPooling2D((2, 2)),
layers.Conv2D(64, (3, 3), activation='relu'),
layers.MaxPooling2D((2, 2)),
layers.Conv2D(64, (3, 3), activation='relu'),
layers.Flatten(),
layers.Dense(64, activation='relu'),
layers.Dropout(0.5),
layers.Dense(10, activation='softmax')
])
# Compile the model
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
# Train the model
history = model.fit(X_train, y_train, epochs=10,
validation_data=(X_test, y_test),
batch_size=64)
# Evaluate the model on the test set
test_loss, test_acc = model.evaluate(X_test, y_test, verbose=2)
print(f"Test Accuracy: {test_acc:.4f}")
# Plot training history
plt.figure(figsize=(12, 4))
plt.subplot(1, 2, 1)
plt.plot(history.history['accuracy'], label='Training Accuracy')
plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
plt.title('Model Accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()
plt.subplot(1, 2, 2)
plt.plot(history.history['loss'], label='Training Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.title('Model Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.tight_layout()
plt.show()
# Make predictions on test data
predictions = model.predict(X_test)
# Display some test images and their predictions
fig, axes = plt.subplots(3, 3, figsize=(12, 12))
for i, ax in enumerate(axes.flat):
ax.imshow(X_test[i].reshape(28, 28), cmap='gray')
ax.set_title(f"True: {y_test[i]}, Predicted: {predictions[i].argmax()}")
ax.axis('off')
plt.tight_layout()
plt.show()
Breakdown of the CNN Implementation:
- Imports and Data Preparation
- TensorFlow, Keras components, and Matplotlib are imported for model creation and visualization.
- The MNIST dataset is loaded, images are reshaped to
(28, 28, 1)
, and pixel values are normalized to the[0, 1]
range to improve training efficiency.
- CNN Model Definition
- The model is defined using
tf.keras.Sequential
, which simplifies the layer stacking process. - It consists of three convolutional layers (
Conv2D
), two max pooling layers (MaxPooling2D
), a flattening layer, one dense layer withReLU
activation, a dropout layer to prevent overfitting, and a final dense layer withsoftmax
for classification.
- The model is defined using
- Model Compilation
- The Adam optimizer is used for efficient learning.
- Sparse categorical cross-entropy is chosen as the loss function, since the labels are integers.
- Accuracy is used as the evaluation metric.
- Model Training
- The model is trained for 10 epochs with a batch size of 64.
- The
validation_data
parameter is set to evaluate the model on the test set during training, allowing us to monitor potential overfitting.
- Model Evaluation
- The trained model is evaluated on the test set using
model.evaluate()
, and the final test accuracy is printed.
- The trained model is evaluated on the test set using
- Visualization of Training History
- Training and validation accuracy and loss are plotted over epochs.
- This helps analyze how well the model is learning and if any overfitting is occurring.
- Making Predictions and Visualizing Results
- The trained model is used to make predictions on the test set.
- A 3x3 grid of test images is displayed with their true labels and predicted classes.
This implementation provides a structured approach to training a CNN on MNIST, covering data preparation, model definition, training, evaluation, and visualization of results. The use of Sequential()
simplifies model creation, and dropout is included to enhance generalization.
5.2.2 Implementing CNN with Keras
Keras is a high-level deep learning API that runs on top of TensorFlow, offering a user-friendly interface for building and training neural networks. It significantly simplifies the process of defining, training, and deploying models by abstracting many of the lower-level details that are typically involved in deep learning implementations.
Key features of Keras include:
- Intuitive API: Keras provides a clean and intuitive API that allows developers to quickly prototype and experiment with different model architectures.
- Sequential and Functional APIs: The Sequential API enables rapid model construction by stacking layers linearly, while the Functional API offers more flexibility for complex model architectures.
- Built-in layers and models: Keras comes with a wide range of pre-built layers (e.g., convolutional, recurrent, pooling) and complete models that can be easily customized.
- Automatic shape inference: Keras can automatically infer the shapes of tensors, reducing the need for manual shape calculations.
With its focus on ease of use and rapid development, Keras is particularly well-suited for:
- Beginners in deep learning who want to quickly grasp the fundamentals of building neural networks.
- Researchers who need to prototype and iterate on ideas rapidly.
- Industry practitioners looking to streamline the development process for production-ready models.
By leveraging the power of TensorFlow while providing a more accessible interface, Keras strikes a balance between simplicity and performance, making it a popular choice in the deep learning community.
Example: CNN in Keras
import tensorflow as tf
from tensorflow.keras import datasets, layers, models
import matplotlib.pyplot as plt
# Load the MNIST dataset
(X_train, y_train), (X_test, y_test) = datasets.mnist.load_data()
# Preprocess the data (reshape and normalize)
X_train = X_train.reshape(-1, 28, 28, 1).astype('float32') / 255.0
X_test = X_test.reshape(-1, 28, 28, 1).astype('float32') / 255.0
# Define the CNN model using Keras Sequential API
model = models.Sequential([
layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)),
layers.MaxPooling2D((2, 2)),
layers.Conv2D(64, (3, 3), activation='relu'),
layers.MaxPooling2D((2, 2)),
layers.Conv2D(64, (3, 3), activation='relu'),
layers.Flatten(),
layers.Dense(64, activation='relu'),
layers.Dense(10, activation='softmax')
])
# Display model summary
model.summary()
# Compile the model
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
# Train the model
history = model.fit(X_train, y_train, epochs=10,
validation_data=(X_test, y_test),
batch_size=64)
# Evaluate the model on the test set
test_loss, test_acc = model.evaluate(X_test, y_test, verbose=2)
print(f"Test Accuracy: {test_acc:.4f}")
# Plot training history
plt.figure(figsize=(12, 4))
plt.subplot(1, 2, 1)
plt.plot(history.history['accuracy'], label='Training Accuracy')
plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
plt.title('Model Accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()
plt.subplot(1, 2, 2)
plt.plot(history.history['loss'], label='Training Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.title('Model Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.tight_layout()
plt.show()
# Make predictions on test data
predictions = model.predict(X_test)
# Display some test images and their predictions
fig, axes = plt.subplots(3, 3, figsize=(12, 12))
for i, ax in enumerate(axes.flat):
ax.imshow(X_test[i].reshape(28, 28), cmap='gray')
ax.set_title(f"True: {y_test[i]}, Predicted: {predictions[i].argmax()}")
ax.axis('off')
plt.tight_layout()
plt.show()
Code Breakdown of the CNN Implementation:
- Imports and Data Preparation:
- We import TensorFlow, Keras components, and Matplotlib for visualization.
- The MNIST dataset is loaded using Keras datasets.
- Images are reshaped to (28, 28, 1) and normalized to the [0, 1] range.
- CNN Model Definition:
- We use the Keras Sequential API to define our model.
- The model consists of three Conv2D layers, two MaxPooling2D layers, a Flatten layer, and two Dense layers.
- We use ReLU activation for hidden layers and softmax for the output layer.
- Model Summary:
- model.summary() provides a detailed view of the model's architecture, including the number of parameters in each layer.
- Model Compilation:
- We use the Adam optimizer and sparse categorical cross-entropy loss.
- Accuracy is chosen as the evaluation metric.
- Model Training:
- The model is trained for 10 epochs with a batch size of 64.
- We use validation_data to monitor performance on the test set during training.
- The training history is stored for later visualization.
- Model Evaluation:
- After training, we evaluate the model on the test set and print the test accuracy.
- Visualization of Training History:
- We plot the training and validation accuracy and loss over epochs.
- This helps in understanding the model's learning progress and identifying potential overfitting.
- Making Predictions and Visualizing Results:
- We use the trained model to make predictions on the test set.
- A 3x3 grid of test images is displayed along with their true labels and model predictions.
This implementation provides a comprehensive view of the CNN training process, including data preparation, model definition, training, evaluation, and result visualization. The added visualizations help in understanding the model's performance and its predictions on actual test data.
5.2.3 Implementing CNN with PyTorch
PyTorch is renowned for its flexibility and user-friendly approach, making it a popular choice in research environments. Unlike TensorFlow and Keras, which use static computation graphs, PyTorch employs dynamic computation graphs. This key difference offers several advantages:
- Greater control over the forward pass: Dynamic graphs allow researchers to modify network behavior on-the-fly, enabling more complex and adaptive architectures.
- Easier debugging: With PyTorch, you can use standard Python debugging tools to inspect your models at runtime, making it simpler to identify and fix issues.
- Intuitive coding: PyTorch's syntax closely resembles standard Python, reducing the learning curve for many developers.
- Better support for variable-length inputs: Dynamic graphs are particularly useful for tasks involving sequences of varying lengths, such as natural language processing.
- Immediate execution: Operations in PyTorch are executed as they're defined, providing instant feedback and facilitating rapid prototyping.
These features make PyTorch an excellent choice for researchers exploring novel network architectures or working with complex, dynamic models. Its design philosophy prioritizes clarity and flexibility, allowing for more natural expression of deep learning algorithms.
Example: CNN in PyTorch
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader
import matplotlib.pyplot as plt
import numpy as np
# Define the CNN model in PyTorch
class SimpleCNN(nn.Module):
def __init__(self):
super(SimpleCNN, self).__init__()
self.conv1 = nn.Conv2d(1, 32, kernel_size=3)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(32, 64, kernel_size=3)
self.fc1 = nn.Linear(64 * 5 * 5, 128)
self.fc2 = nn.Linear(128, 10)
def forward(self, x):
x = self.pool(torch.relu(self.conv1(x)))
x = self.pool(torch.relu(self.conv2(x)))
x = x.view(-1, 64 * 5 * 5)
x = torch.relu(self.fc1(x))
return self.fc2(x)
# Set device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# Preprocess the data
transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.5,), (0.5,))
])
# Load datasets
train_dataset = datasets.MNIST(root='./data', train=True, download=True, transform=transform)
test_dataset = datasets.MNIST(root='./data', train=False, download=True, transform=transform)
# Create data loaders
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False)
# Instantiate the model, define the loss function and optimizer
model = SimpleCNN().to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)
# Training loop
epochs = 10
train_losses = []
train_accuracies = []
test_accuracies = []
for epoch in range(epochs):
model.train()
running_loss = 0.0
correct = 0
total = 0
for inputs, labels in train_loader:
inputs, labels = inputs.to(device), labels.to(device)
optimizer.zero_grad()
outputs = model(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
running_loss += loss.item()
_, predicted = outputs.max(1)
total += labels.size(0)
correct += predicted.eq(labels).sum().item()
train_loss = running_loss / len(train_loader)
train_accuracy = 100. * correct / total
train_losses.append(train_loss)
train_accuracies.append(train_accuracy)
# Evaluate on test set
model.eval()
test_correct = 0
test_total = 0
with torch.no_grad():
for inputs, labels in test_loader:
inputs, labels = inputs.to(device), labels.to(device)
outputs = model(inputs)
_, predicted = outputs.max(1)
test_total += labels.size(0)
test_correct += predicted.eq(labels).sum().item()
test_accuracy = 100. * test_correct / test_total
test_accuracies.append(test_accuracy)
print(f"Epoch {epoch+1}/{epochs}")
print(f"Train Loss: {train_loss:.4f}, Train Accuracy: {train_accuracy:.2f}%")
print(f"Test Accuracy: {test_accuracy:.2f}%")
print("-" * 50)
# Plot training history
plt.figure(figsize=(12, 4))
plt.subplot(1, 2, 1)
plt.plot(train_losses, label='Train Loss')
plt.title('Training Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.subplot(1, 2, 2)
plt.plot(train_accuracies, label='Train Accuracy')
plt.plot(test_accuracies, label='Test Accuracy')
plt.title('Accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy (%)')
plt.legend()
plt.tight_layout()
plt.show()
# Evaluate the final model
model.eval()
correct = 0
total = 0
with torch.no_grad():
for inputs, labels in test_loader:
inputs, labels = inputs.to(device), labels.to(device)
outputs = model(inputs)
_, predicted = outputs.max(1)
total += labels.size(0)
correct += predicted.eq(labels).sum().item()
print(f'Final Test Accuracy: {100 * correct / total:.2f}%')
# Visualize some predictions
def imshow(img):
img = img / 2 + 0.5 # unnormalize
npimg = img.numpy()
plt.imshow(np.transpose(npimg, (1, 2, 0)), cmap="gray")
plt.axis('off')
dataiter = iter(test_loader)
images, labels = next(dataiter)
# Get predictions
outputs = model(images.to(device))
_, predicted = torch.max(outputs, 1)
# Plot images and predictions
fig = plt.figure(figsize=(12, 4))
for i in range(12):
ax = fig.add_subplot(2, 6, i+1, xticks=[], yticks=[])
imshow(images[i])
ax.set_title(f"Pred: {predicted[i].item()} (True: {labels[i].item()})",
color=("green" if predicted[i] == labels[i] else "red"))
plt.tight_layout()
plt.show()
Code Breakdown of the CNN Implementation:
- Imports and Setup:
- We import necessary PyTorch modules, including
nn
for defining neural network layers,optim
for optimization algorithms, andtorchvision
for handling datasets and transformations. matplotlib
andnumpy
are imported for visualizing training progress and model predictions.
- We import necessary PyTorch modules, including
- CNN Model Definition:
- The
SimpleCNN
class is defined, inheriting fromnn.Module
. - It consists of two convolutional layers (
conv1
andconv2
), each followed by ReLU activation and max pooling to extract important features. - Two fully connected layers (
fc1
andfc2
) handle classification after feature extraction. - The
forward
method defines the flow of data through the layers.
- The
- Device Configuration:
- The model is set up to use a GPU if available, allowing for faster training.
- Data Preprocessing and Loading:
- Transformations are defined to convert images into tensors and normalize them for consistent model inputs.
- The MNIST dataset is loaded for both training and testing.
DataLoader
objects are used to efficiently batch and shuffle the data during training.
- Model Instantiation and Training Setup:
- An instance of
SimpleCNN
is created and moved to the selected device. - Cross-entropy loss is used for classification tasks, and the Adam optimizer is chosen for efficient weight updates.
- An instance of
- Training Loop:
- The model is trained over multiple epochs.
- After each epoch, training loss and accuracy are recorded.
- The model is evaluated on the test set at the end of each epoch to track generalization.
- Visualization of Training Progress:
- Training loss and accuracy are plotted over epochs to monitor learning trends.
- Test accuracy is also plotted to identify signs of overfitting or underfitting.
- Final Model Evaluation:
- The trained model is evaluated on the test set to determine its overall classification accuracy.
- Prediction Visualization:
- A few test images are displayed alongside their predicted and actual labels.
- Correct predictions are shown in green, and incorrect ones in red for easy interpretation.
This implementation covers the full workflow of training and evaluating a CNN, ensuring a structured and efficient approach to image classification. It enables easy modification of architecture, hyperparameters, and training settings for further experimentation.
5.2 Implementing CNNs with TensorFlow, Keras, and PyTorch
Convolutional Neural Networks (CNNs) can be implemented using various deep learning frameworks, with TensorFlow, Keras, and PyTorch being among the most popular and versatile options. Each framework offers unique advantages:
- TensorFlow provides a robust and highly scalable infrastructure for deep learning, making it suitable for large-scale deployments and production environments.
- Keras offers a user-friendly API that simplifies model development, making it an excellent choice for beginners and rapid prototyping.
- PyTorch stands out for its dynamic computation graph and Pythonic interface, offering greater flexibility and ease of debugging, which is particularly advantageous in research settings.
To illustrate the implementation of CNNs across these frameworks, we will focus on developing a model for the MNIST dataset. This classic dataset consists of handwritten digits ranging from 0 to 9, serving as an ideal benchmark for image classification tasks. By building and training the same network architecture using TensorFlow, Keras, and PyTorch, we can compare and contrast the syntax, workflow, and unique features of each framework.
This comparative approach will provide valuable insights into the strengths and characteristics of each platform, helping you choose the most suitable framework for your specific deep learning projects.
5.2.1 Implementing CNN with TensorFlow
TensorFlow is a powerful and scalable deep learning framework that has gained widespread adoption in both research and production environments. Developed by Google, TensorFlow offers a comprehensive ecosystem for building and deploying machine learning models, with particular strengths in neural networks and deep learning.
Key features of TensorFlow include:
- Flexible architecture: TensorFlow supports both eager execution for immediate operation evaluation and graph-based execution for optimized performance.
- Scalability: It can run on various platforms, from mobile devices to large-scale distributed systems, making it suitable for a wide range of applications.
- Rich ecosystem: TensorFlow comes with a vast library of pre-built models, tools for visualization (TensorBoard), and extensions for specific domains like TensorFlow Lite for mobile and edge devices.
- Strong community support: With a large and active community, TensorFlow benefits from continuous improvements and a wealth of resources for developers.
Let's explore how to implement a Convolutional Neural Network (CNN) using TensorFlow's low-level API. This approach provides greater control over the model architecture and training process, allowing for fine-grained customization and optimization.
Example: CNN in TensorFlow
import tensorflow as tf
from tensorflow.keras import datasets, layers, models
import matplotlib.pyplot as plt
# Load the MNIST dataset
(X_train, y_train), (X_test, y_test) = datasets.mnist.load_data()
# Preprocess the data (reshape and normalize)
X_train = X_train.reshape(-1, 28, 28, 1).astype('float32') / 255.0
X_test = X_test.reshape(-1, 28, 28, 1).astype('float32') / 255.0
# Define the CNN model
model = models.Sequential([
layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)),
layers.MaxPooling2D((2, 2)),
layers.Conv2D(64, (3, 3), activation='relu'),
layers.MaxPooling2D((2, 2)),
layers.Conv2D(64, (3, 3), activation='relu'),
layers.Flatten(),
layers.Dense(64, activation='relu'),
layers.Dropout(0.5),
layers.Dense(10, activation='softmax')
])
# Compile the model
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
# Train the model
history = model.fit(X_train, y_train, epochs=10,
validation_data=(X_test, y_test),
batch_size=64)
# Evaluate the model on the test set
test_loss, test_acc = model.evaluate(X_test, y_test, verbose=2)
print(f"Test Accuracy: {test_acc:.4f}")
# Plot training history
plt.figure(figsize=(12, 4))
plt.subplot(1, 2, 1)
plt.plot(history.history['accuracy'], label='Training Accuracy')
plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
plt.title('Model Accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()
plt.subplot(1, 2, 2)
plt.plot(history.history['loss'], label='Training Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.title('Model Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.tight_layout()
plt.show()
# Make predictions on test data
predictions = model.predict(X_test)
# Display some test images and their predictions
fig, axes = plt.subplots(3, 3, figsize=(12, 12))
for i, ax in enumerate(axes.flat):
ax.imshow(X_test[i].reshape(28, 28), cmap='gray')
ax.set_title(f"True: {y_test[i]}, Predicted: {predictions[i].argmax()}")
ax.axis('off')
plt.tight_layout()
plt.show()
Breakdown of the CNN Implementation:
- Imports and Data Preparation
- TensorFlow, Keras components, and Matplotlib are imported for model creation and visualization.
- The MNIST dataset is loaded, images are reshaped to
(28, 28, 1)
, and pixel values are normalized to the[0, 1]
range to improve training efficiency.
- CNN Model Definition
- The model is defined using
tf.keras.Sequential
, which simplifies the layer stacking process. - It consists of three convolutional layers (
Conv2D
), two max pooling layers (MaxPooling2D
), a flattening layer, one dense layer withReLU
activation, a dropout layer to prevent overfitting, and a final dense layer withsoftmax
for classification.
- The model is defined using
- Model Compilation
- The Adam optimizer is used for efficient learning.
- Sparse categorical cross-entropy is chosen as the loss function, since the labels are integers.
- Accuracy is used as the evaluation metric.
- Model Training
- The model is trained for 10 epochs with a batch size of 64.
- The
validation_data
parameter is set to evaluate the model on the test set during training, allowing us to monitor potential overfitting.
- Model Evaluation
- The trained model is evaluated on the test set using
model.evaluate()
, and the final test accuracy is printed.
- The trained model is evaluated on the test set using
- Visualization of Training History
- Training and validation accuracy and loss are plotted over epochs.
- This helps analyze how well the model is learning and if any overfitting is occurring.
- Making Predictions and Visualizing Results
- The trained model is used to make predictions on the test set.
- A 3x3 grid of test images is displayed with their true labels and predicted classes.
This implementation provides a structured approach to training a CNN on MNIST, covering data preparation, model definition, training, evaluation, and visualization of results. The use of Sequential()
simplifies model creation, and dropout is included to enhance generalization.
5.2.2 Implementing CNN with Keras
Keras is a high-level deep learning API that runs on top of TensorFlow, offering a user-friendly interface for building and training neural networks. It significantly simplifies the process of defining, training, and deploying models by abstracting many of the lower-level details that are typically involved in deep learning implementations.
Key features of Keras include:
- Intuitive API: Keras provides a clean and intuitive API that allows developers to quickly prototype and experiment with different model architectures.
- Sequential and Functional APIs: The Sequential API enables rapid model construction by stacking layers linearly, while the Functional API offers more flexibility for complex model architectures.
- Built-in layers and models: Keras comes with a wide range of pre-built layers (e.g., convolutional, recurrent, pooling) and complete models that can be easily customized.
- Automatic shape inference: Keras can automatically infer the shapes of tensors, reducing the need for manual shape calculations.
With its focus on ease of use and rapid development, Keras is particularly well-suited for:
- Beginners in deep learning who want to quickly grasp the fundamentals of building neural networks.
- Researchers who need to prototype and iterate on ideas rapidly.
- Industry practitioners looking to streamline the development process for production-ready models.
By leveraging the power of TensorFlow while providing a more accessible interface, Keras strikes a balance between simplicity and performance, making it a popular choice in the deep learning community.
Example: CNN in Keras
import tensorflow as tf
from tensorflow.keras import datasets, layers, models
import matplotlib.pyplot as plt
# Load the MNIST dataset
(X_train, y_train), (X_test, y_test) = datasets.mnist.load_data()
# Preprocess the data (reshape and normalize)
X_train = X_train.reshape(-1, 28, 28, 1).astype('float32') / 255.0
X_test = X_test.reshape(-1, 28, 28, 1).astype('float32') / 255.0
# Define the CNN model using Keras Sequential API
model = models.Sequential([
layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)),
layers.MaxPooling2D((2, 2)),
layers.Conv2D(64, (3, 3), activation='relu'),
layers.MaxPooling2D((2, 2)),
layers.Conv2D(64, (3, 3), activation='relu'),
layers.Flatten(),
layers.Dense(64, activation='relu'),
layers.Dense(10, activation='softmax')
])
# Display model summary
model.summary()
# Compile the model
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
# Train the model
history = model.fit(X_train, y_train, epochs=10,
validation_data=(X_test, y_test),
batch_size=64)
# Evaluate the model on the test set
test_loss, test_acc = model.evaluate(X_test, y_test, verbose=2)
print(f"Test Accuracy: {test_acc:.4f}")
# Plot training history
plt.figure(figsize=(12, 4))
plt.subplot(1, 2, 1)
plt.plot(history.history['accuracy'], label='Training Accuracy')
plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
plt.title('Model Accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()
plt.subplot(1, 2, 2)
plt.plot(history.history['loss'], label='Training Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.title('Model Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.tight_layout()
plt.show()
# Make predictions on test data
predictions = model.predict(X_test)
# Display some test images and their predictions
fig, axes = plt.subplots(3, 3, figsize=(12, 12))
for i, ax in enumerate(axes.flat):
ax.imshow(X_test[i].reshape(28, 28), cmap='gray')
ax.set_title(f"True: {y_test[i]}, Predicted: {predictions[i].argmax()}")
ax.axis('off')
plt.tight_layout()
plt.show()
Code Breakdown of the CNN Implementation:
- Imports and Data Preparation:
- We import TensorFlow, Keras components, and Matplotlib for visualization.
- The MNIST dataset is loaded using Keras datasets.
- Images are reshaped to (28, 28, 1) and normalized to the [0, 1] range.
- CNN Model Definition:
- We use the Keras Sequential API to define our model.
- The model consists of three Conv2D layers, two MaxPooling2D layers, a Flatten layer, and two Dense layers.
- We use ReLU activation for hidden layers and softmax for the output layer.
- Model Summary:
- model.summary() provides a detailed view of the model's architecture, including the number of parameters in each layer.
- Model Compilation:
- We use the Adam optimizer and sparse categorical cross-entropy loss.
- Accuracy is chosen as the evaluation metric.
- Model Training:
- The model is trained for 10 epochs with a batch size of 64.
- We use validation_data to monitor performance on the test set during training.
- The training history is stored for later visualization.
- Model Evaluation:
- After training, we evaluate the model on the test set and print the test accuracy.
- Visualization of Training History:
- We plot the training and validation accuracy and loss over epochs.
- This helps in understanding the model's learning progress and identifying potential overfitting.
- Making Predictions and Visualizing Results:
- We use the trained model to make predictions on the test set.
- A 3x3 grid of test images is displayed along with their true labels and model predictions.
This implementation provides a comprehensive view of the CNN training process, including data preparation, model definition, training, evaluation, and result visualization. The added visualizations help in understanding the model's performance and its predictions on actual test data.
5.2.3 Implementing CNN with PyTorch
PyTorch is renowned for its flexibility and user-friendly approach, making it a popular choice in research environments. Unlike TensorFlow and Keras, which use static computation graphs, PyTorch employs dynamic computation graphs. This key difference offers several advantages:
- Greater control over the forward pass: Dynamic graphs allow researchers to modify network behavior on-the-fly, enabling more complex and adaptive architectures.
- Easier debugging: With PyTorch, you can use standard Python debugging tools to inspect your models at runtime, making it simpler to identify and fix issues.
- Intuitive coding: PyTorch's syntax closely resembles standard Python, reducing the learning curve for many developers.
- Better support for variable-length inputs: Dynamic graphs are particularly useful for tasks involving sequences of varying lengths, such as natural language processing.
- Immediate execution: Operations in PyTorch are executed as they're defined, providing instant feedback and facilitating rapid prototyping.
These features make PyTorch an excellent choice for researchers exploring novel network architectures or working with complex, dynamic models. Its design philosophy prioritizes clarity and flexibility, allowing for more natural expression of deep learning algorithms.
Example: CNN in PyTorch
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader
import matplotlib.pyplot as plt
import numpy as np
# Define the CNN model in PyTorch
class SimpleCNN(nn.Module):
def __init__(self):
super(SimpleCNN, self).__init__()
self.conv1 = nn.Conv2d(1, 32, kernel_size=3)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(32, 64, kernel_size=3)
self.fc1 = nn.Linear(64 * 5 * 5, 128)
self.fc2 = nn.Linear(128, 10)
def forward(self, x):
x = self.pool(torch.relu(self.conv1(x)))
x = self.pool(torch.relu(self.conv2(x)))
x = x.view(-1, 64 * 5 * 5)
x = torch.relu(self.fc1(x))
return self.fc2(x)
# Set device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# Preprocess the data
transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.5,), (0.5,))
])
# Load datasets
train_dataset = datasets.MNIST(root='./data', train=True, download=True, transform=transform)
test_dataset = datasets.MNIST(root='./data', train=False, download=True, transform=transform)
# Create data loaders
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False)
# Instantiate the model, define the loss function and optimizer
model = SimpleCNN().to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)
# Training loop
epochs = 10
train_losses = []
train_accuracies = []
test_accuracies = []
for epoch in range(epochs):
model.train()
running_loss = 0.0
correct = 0
total = 0
for inputs, labels in train_loader:
inputs, labels = inputs.to(device), labels.to(device)
optimizer.zero_grad()
outputs = model(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
running_loss += loss.item()
_, predicted = outputs.max(1)
total += labels.size(0)
correct += predicted.eq(labels).sum().item()
train_loss = running_loss / len(train_loader)
train_accuracy = 100. * correct / total
train_losses.append(train_loss)
train_accuracies.append(train_accuracy)
# Evaluate on test set
model.eval()
test_correct = 0
test_total = 0
with torch.no_grad():
for inputs, labels in test_loader:
inputs, labels = inputs.to(device), labels.to(device)
outputs = model(inputs)
_, predicted = outputs.max(1)
test_total += labels.size(0)
test_correct += predicted.eq(labels).sum().item()
test_accuracy = 100. * test_correct / test_total
test_accuracies.append(test_accuracy)
print(f"Epoch {epoch+1}/{epochs}")
print(f"Train Loss: {train_loss:.4f}, Train Accuracy: {train_accuracy:.2f}%")
print(f"Test Accuracy: {test_accuracy:.2f}%")
print("-" * 50)
# Plot training history
plt.figure(figsize=(12, 4))
plt.subplot(1, 2, 1)
plt.plot(train_losses, label='Train Loss')
plt.title('Training Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.subplot(1, 2, 2)
plt.plot(train_accuracies, label='Train Accuracy')
plt.plot(test_accuracies, label='Test Accuracy')
plt.title('Accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy (%)')
plt.legend()
plt.tight_layout()
plt.show()
# Evaluate the final model
model.eval()
correct = 0
total = 0
with torch.no_grad():
for inputs, labels in test_loader:
inputs, labels = inputs.to(device), labels.to(device)
outputs = model(inputs)
_, predicted = outputs.max(1)
total += labels.size(0)
correct += predicted.eq(labels).sum().item()
print(f'Final Test Accuracy: {100 * correct / total:.2f}%')
# Visualize some predictions
def imshow(img):
img = img / 2 + 0.5 # unnormalize
npimg = img.numpy()
plt.imshow(np.transpose(npimg, (1, 2, 0)), cmap="gray")
plt.axis('off')
dataiter = iter(test_loader)
images, labels = next(dataiter)
# Get predictions
outputs = model(images.to(device))
_, predicted = torch.max(outputs, 1)
# Plot images and predictions
fig = plt.figure(figsize=(12, 4))
for i in range(12):
ax = fig.add_subplot(2, 6, i+1, xticks=[], yticks=[])
imshow(images[i])
ax.set_title(f"Pred: {predicted[i].item()} (True: {labels[i].item()})",
color=("green" if predicted[i] == labels[i] else "red"))
plt.tight_layout()
plt.show()
Code Breakdown of the CNN Implementation:
- Imports and Setup:
- We import necessary PyTorch modules, including
nn
for defining neural network layers,optim
for optimization algorithms, andtorchvision
for handling datasets and transformations. matplotlib
andnumpy
are imported for visualizing training progress and model predictions.
- We import necessary PyTorch modules, including
- CNN Model Definition:
- The
SimpleCNN
class is defined, inheriting fromnn.Module
. - It consists of two convolutional layers (
conv1
andconv2
), each followed by ReLU activation and max pooling to extract important features. - Two fully connected layers (
fc1
andfc2
) handle classification after feature extraction. - The
forward
method defines the flow of data through the layers.
- The
- Device Configuration:
- The model is set up to use a GPU if available, allowing for faster training.
- Data Preprocessing and Loading:
- Transformations are defined to convert images into tensors and normalize them for consistent model inputs.
- The MNIST dataset is loaded for both training and testing.
DataLoader
objects are used to efficiently batch and shuffle the data during training.
- Model Instantiation and Training Setup:
- An instance of
SimpleCNN
is created and moved to the selected device. - Cross-entropy loss is used for classification tasks, and the Adam optimizer is chosen for efficient weight updates.
- An instance of
- Training Loop:
- The model is trained over multiple epochs.
- After each epoch, training loss and accuracy are recorded.
- The model is evaluated on the test set at the end of each epoch to track generalization.
- Visualization of Training Progress:
- Training loss and accuracy are plotted over epochs to monitor learning trends.
- Test accuracy is also plotted to identify signs of overfitting or underfitting.
- Final Model Evaluation:
- The trained model is evaluated on the test set to determine its overall classification accuracy.
- Prediction Visualization:
- A few test images are displayed alongside their predicted and actual labels.
- Correct predictions are shown in green, and incorrect ones in red for easy interpretation.
This implementation covers the full workflow of training and evaluating a CNN, ensuring a structured and efficient approach to image classification. It enables easy modification of architecture, hyperparameters, and training settings for further experimentation.
5.2 Implementing CNNs with TensorFlow, Keras, and PyTorch
Convolutional Neural Networks (CNNs) can be implemented using various deep learning frameworks, with TensorFlow, Keras, and PyTorch being among the most popular and versatile options. Each framework offers unique advantages:
- TensorFlow provides a robust and highly scalable infrastructure for deep learning, making it suitable for large-scale deployments and production environments.
- Keras offers a user-friendly API that simplifies model development, making it an excellent choice for beginners and rapid prototyping.
- PyTorch stands out for its dynamic computation graph and Pythonic interface, offering greater flexibility and ease of debugging, which is particularly advantageous in research settings.
To illustrate the implementation of CNNs across these frameworks, we will focus on developing a model for the MNIST dataset. This classic dataset consists of handwritten digits ranging from 0 to 9, serving as an ideal benchmark for image classification tasks. By building and training the same network architecture using TensorFlow, Keras, and PyTorch, we can compare and contrast the syntax, workflow, and unique features of each framework.
This comparative approach will provide valuable insights into the strengths and characteristics of each platform, helping you choose the most suitable framework for your specific deep learning projects.
5.2.1 Implementing CNN with TensorFlow
TensorFlow is a powerful and scalable deep learning framework that has gained widespread adoption in both research and production environments. Developed by Google, TensorFlow offers a comprehensive ecosystem for building and deploying machine learning models, with particular strengths in neural networks and deep learning.
Key features of TensorFlow include:
- Flexible architecture: TensorFlow supports both eager execution for immediate operation evaluation and graph-based execution for optimized performance.
- Scalability: It can run on various platforms, from mobile devices to large-scale distributed systems, making it suitable for a wide range of applications.
- Rich ecosystem: TensorFlow comes with a vast library of pre-built models, tools for visualization (TensorBoard), and extensions for specific domains like TensorFlow Lite for mobile and edge devices.
- Strong community support: With a large and active community, TensorFlow benefits from continuous improvements and a wealth of resources for developers.
Let's explore how to implement a Convolutional Neural Network (CNN) using TensorFlow's low-level API. This approach provides greater control over the model architecture and training process, allowing for fine-grained customization and optimization.
Example: CNN in TensorFlow
import tensorflow as tf
from tensorflow.keras import datasets, layers, models
import matplotlib.pyplot as plt
# Load the MNIST dataset
(X_train, y_train), (X_test, y_test) = datasets.mnist.load_data()
# Preprocess the data (reshape and normalize)
X_train = X_train.reshape(-1, 28, 28, 1).astype('float32') / 255.0
X_test = X_test.reshape(-1, 28, 28, 1).astype('float32') / 255.0
# Define the CNN model
model = models.Sequential([
layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)),
layers.MaxPooling2D((2, 2)),
layers.Conv2D(64, (3, 3), activation='relu'),
layers.MaxPooling2D((2, 2)),
layers.Conv2D(64, (3, 3), activation='relu'),
layers.Flatten(),
layers.Dense(64, activation='relu'),
layers.Dropout(0.5),
layers.Dense(10, activation='softmax')
])
# Compile the model
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
# Train the model
history = model.fit(X_train, y_train, epochs=10,
validation_data=(X_test, y_test),
batch_size=64)
# Evaluate the model on the test set
test_loss, test_acc = model.evaluate(X_test, y_test, verbose=2)
print(f"Test Accuracy: {test_acc:.4f}")
# Plot training history
plt.figure(figsize=(12, 4))
plt.subplot(1, 2, 1)
plt.plot(history.history['accuracy'], label='Training Accuracy')
plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
plt.title('Model Accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()
plt.subplot(1, 2, 2)
plt.plot(history.history['loss'], label='Training Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.title('Model Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.tight_layout()
plt.show()
# Make predictions on test data
predictions = model.predict(X_test)
# Display some test images and their predictions
fig, axes = plt.subplots(3, 3, figsize=(12, 12))
for i, ax in enumerate(axes.flat):
ax.imshow(X_test[i].reshape(28, 28), cmap='gray')
ax.set_title(f"True: {y_test[i]}, Predicted: {predictions[i].argmax()}")
ax.axis('off')
plt.tight_layout()
plt.show()
Breakdown of the CNN Implementation:
- Imports and Data Preparation
- TensorFlow, Keras components, and Matplotlib are imported for model creation and visualization.
- The MNIST dataset is loaded, images are reshaped to
(28, 28, 1)
, and pixel values are normalized to the[0, 1]
range to improve training efficiency.
- CNN Model Definition
- The model is defined using
tf.keras.Sequential
, which simplifies the layer stacking process. - It consists of three convolutional layers (
Conv2D
), two max pooling layers (MaxPooling2D
), a flattening layer, one dense layer withReLU
activation, a dropout layer to prevent overfitting, and a final dense layer withsoftmax
for classification.
- The model is defined using
- Model Compilation
- The Adam optimizer is used for efficient learning.
- Sparse categorical cross-entropy is chosen as the loss function, since the labels are integers.
- Accuracy is used as the evaluation metric.
- Model Training
- The model is trained for 10 epochs with a batch size of 64.
- The
validation_data
parameter is set to evaluate the model on the test set during training, allowing us to monitor potential overfitting.
- Model Evaluation
- The trained model is evaluated on the test set using
model.evaluate()
, and the final test accuracy is printed.
- The trained model is evaluated on the test set using
- Visualization of Training History
- Training and validation accuracy and loss are plotted over epochs.
- This helps analyze how well the model is learning and if any overfitting is occurring.
- Making Predictions and Visualizing Results
- The trained model is used to make predictions on the test set.
- A 3x3 grid of test images is displayed with their true labels and predicted classes.
This implementation provides a structured approach to training a CNN on MNIST, covering data preparation, model definition, training, evaluation, and visualization of results. The use of Sequential()
simplifies model creation, and dropout is included to enhance generalization.
5.2.2 Implementing CNN with Keras
Keras is a high-level deep learning API that runs on top of TensorFlow, offering a user-friendly interface for building and training neural networks. It significantly simplifies the process of defining, training, and deploying models by abstracting many of the lower-level details that are typically involved in deep learning implementations.
Key features of Keras include:
- Intuitive API: Keras provides a clean and intuitive API that allows developers to quickly prototype and experiment with different model architectures.
- Sequential and Functional APIs: The Sequential API enables rapid model construction by stacking layers linearly, while the Functional API offers more flexibility for complex model architectures.
- Built-in layers and models: Keras comes with a wide range of pre-built layers (e.g., convolutional, recurrent, pooling) and complete models that can be easily customized.
- Automatic shape inference: Keras can automatically infer the shapes of tensors, reducing the need for manual shape calculations.
With its focus on ease of use and rapid development, Keras is particularly well-suited for:
- Beginners in deep learning who want to quickly grasp the fundamentals of building neural networks.
- Researchers who need to prototype and iterate on ideas rapidly.
- Industry practitioners looking to streamline the development process for production-ready models.
By leveraging the power of TensorFlow while providing a more accessible interface, Keras strikes a balance between simplicity and performance, making it a popular choice in the deep learning community.
Example: CNN in Keras
import tensorflow as tf
from tensorflow.keras import datasets, layers, models
import matplotlib.pyplot as plt
# Load the MNIST dataset
(X_train, y_train), (X_test, y_test) = datasets.mnist.load_data()
# Preprocess the data (reshape and normalize)
X_train = X_train.reshape(-1, 28, 28, 1).astype('float32') / 255.0
X_test = X_test.reshape(-1, 28, 28, 1).astype('float32') / 255.0
# Define the CNN model using Keras Sequential API
model = models.Sequential([
layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)),
layers.MaxPooling2D((2, 2)),
layers.Conv2D(64, (3, 3), activation='relu'),
layers.MaxPooling2D((2, 2)),
layers.Conv2D(64, (3, 3), activation='relu'),
layers.Flatten(),
layers.Dense(64, activation='relu'),
layers.Dense(10, activation='softmax')
])
# Display model summary
model.summary()
# Compile the model
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
# Train the model
history = model.fit(X_train, y_train, epochs=10,
validation_data=(X_test, y_test),
batch_size=64)
# Evaluate the model on the test set
test_loss, test_acc = model.evaluate(X_test, y_test, verbose=2)
print(f"Test Accuracy: {test_acc:.4f}")
# Plot training history
plt.figure(figsize=(12, 4))
plt.subplot(1, 2, 1)
plt.plot(history.history['accuracy'], label='Training Accuracy')
plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
plt.title('Model Accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()
plt.subplot(1, 2, 2)
plt.plot(history.history['loss'], label='Training Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.title('Model Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.tight_layout()
plt.show()
# Make predictions on test data
predictions = model.predict(X_test)
# Display some test images and their predictions
fig, axes = plt.subplots(3, 3, figsize=(12, 12))
for i, ax in enumerate(axes.flat):
ax.imshow(X_test[i].reshape(28, 28), cmap='gray')
ax.set_title(f"True: {y_test[i]}, Predicted: {predictions[i].argmax()}")
ax.axis('off')
plt.tight_layout()
plt.show()
Code Breakdown of the CNN Implementation:
- Imports and Data Preparation:
- We import TensorFlow, Keras components, and Matplotlib for visualization.
- The MNIST dataset is loaded using Keras datasets.
- Images are reshaped to (28, 28, 1) and normalized to the [0, 1] range.
- CNN Model Definition:
- We use the Keras Sequential API to define our model.
- The model consists of three Conv2D layers, two MaxPooling2D layers, a Flatten layer, and two Dense layers.
- We use ReLU activation for hidden layers and softmax for the output layer.
- Model Summary:
- model.summary() provides a detailed view of the model's architecture, including the number of parameters in each layer.
- Model Compilation:
- We use the Adam optimizer and sparse categorical cross-entropy loss.
- Accuracy is chosen as the evaluation metric.
- Model Training:
- The model is trained for 10 epochs with a batch size of 64.
- We use validation_data to monitor performance on the test set during training.
- The training history is stored for later visualization.
- Model Evaluation:
- After training, we evaluate the model on the test set and print the test accuracy.
- Visualization of Training History:
- We plot the training and validation accuracy and loss over epochs.
- This helps in understanding the model's learning progress and identifying potential overfitting.
- Making Predictions and Visualizing Results:
- We use the trained model to make predictions on the test set.
- A 3x3 grid of test images is displayed along with their true labels and model predictions.
This implementation provides a comprehensive view of the CNN training process, including data preparation, model definition, training, evaluation, and result visualization. The added visualizations help in understanding the model's performance and its predictions on actual test data.
5.2.3 Implementing CNN with PyTorch
PyTorch is renowned for its flexibility and user-friendly approach, making it a popular choice in research environments. Unlike TensorFlow and Keras, which use static computation graphs, PyTorch employs dynamic computation graphs. This key difference offers several advantages:
- Greater control over the forward pass: Dynamic graphs allow researchers to modify network behavior on-the-fly, enabling more complex and adaptive architectures.
- Easier debugging: With PyTorch, you can use standard Python debugging tools to inspect your models at runtime, making it simpler to identify and fix issues.
- Intuitive coding: PyTorch's syntax closely resembles standard Python, reducing the learning curve for many developers.
- Better support for variable-length inputs: Dynamic graphs are particularly useful for tasks involving sequences of varying lengths, such as natural language processing.
- Immediate execution: Operations in PyTorch are executed as they're defined, providing instant feedback and facilitating rapid prototyping.
These features make PyTorch an excellent choice for researchers exploring novel network architectures or working with complex, dynamic models. Its design philosophy prioritizes clarity and flexibility, allowing for more natural expression of deep learning algorithms.
Example: CNN in PyTorch
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader
import matplotlib.pyplot as plt
import numpy as np
# Define the CNN model in PyTorch
class SimpleCNN(nn.Module):
def __init__(self):
super(SimpleCNN, self).__init__()
self.conv1 = nn.Conv2d(1, 32, kernel_size=3)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(32, 64, kernel_size=3)
self.fc1 = nn.Linear(64 * 5 * 5, 128)
self.fc2 = nn.Linear(128, 10)
def forward(self, x):
x = self.pool(torch.relu(self.conv1(x)))
x = self.pool(torch.relu(self.conv2(x)))
x = x.view(-1, 64 * 5 * 5)
x = torch.relu(self.fc1(x))
return self.fc2(x)
# Set device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# Preprocess the data
transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.5,), (0.5,))
])
# Load datasets
train_dataset = datasets.MNIST(root='./data', train=True, download=True, transform=transform)
test_dataset = datasets.MNIST(root='./data', train=False, download=True, transform=transform)
# Create data loaders
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False)
# Instantiate the model, define the loss function and optimizer
model = SimpleCNN().to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)
# Training loop
epochs = 10
train_losses = []
train_accuracies = []
test_accuracies = []
for epoch in range(epochs):
model.train()
running_loss = 0.0
correct = 0
total = 0
for inputs, labels in train_loader:
inputs, labels = inputs.to(device), labels.to(device)
optimizer.zero_grad()
outputs = model(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
running_loss += loss.item()
_, predicted = outputs.max(1)
total += labels.size(0)
correct += predicted.eq(labels).sum().item()
train_loss = running_loss / len(train_loader)
train_accuracy = 100. * correct / total
train_losses.append(train_loss)
train_accuracies.append(train_accuracy)
# Evaluate on test set
model.eval()
test_correct = 0
test_total = 0
with torch.no_grad():
for inputs, labels in test_loader:
inputs, labels = inputs.to(device), labels.to(device)
outputs = model(inputs)
_, predicted = outputs.max(1)
test_total += labels.size(0)
test_correct += predicted.eq(labels).sum().item()
test_accuracy = 100. * test_correct / test_total
test_accuracies.append(test_accuracy)
print(f"Epoch {epoch+1}/{epochs}")
print(f"Train Loss: {train_loss:.4f}, Train Accuracy: {train_accuracy:.2f}%")
print(f"Test Accuracy: {test_accuracy:.2f}%")
print("-" * 50)
# Plot training history
plt.figure(figsize=(12, 4))
plt.subplot(1, 2, 1)
plt.plot(train_losses, label='Train Loss')
plt.title('Training Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.subplot(1, 2, 2)
plt.plot(train_accuracies, label='Train Accuracy')
plt.plot(test_accuracies, label='Test Accuracy')
plt.title('Accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy (%)')
plt.legend()
plt.tight_layout()
plt.show()
# Evaluate the final model
model.eval()
correct = 0
total = 0
with torch.no_grad():
for inputs, labels in test_loader:
inputs, labels = inputs.to(device), labels.to(device)
outputs = model(inputs)
_, predicted = outputs.max(1)
total += labels.size(0)
correct += predicted.eq(labels).sum().item()
print(f'Final Test Accuracy: {100 * correct / total:.2f}%')
# Visualize some predictions
def imshow(img):
img = img / 2 + 0.5 # unnormalize
npimg = img.numpy()
plt.imshow(np.transpose(npimg, (1, 2, 0)), cmap="gray")
plt.axis('off')
dataiter = iter(test_loader)
images, labels = next(dataiter)
# Get predictions
outputs = model(images.to(device))
_, predicted = torch.max(outputs, 1)
# Plot images and predictions
fig = plt.figure(figsize=(12, 4))
for i in range(12):
ax = fig.add_subplot(2, 6, i+1, xticks=[], yticks=[])
imshow(images[i])
ax.set_title(f"Pred: {predicted[i].item()} (True: {labels[i].item()})",
color=("green" if predicted[i] == labels[i] else "red"))
plt.tight_layout()
plt.show()
Code Breakdown of the CNN Implementation:
- Imports and Setup:
- We import necessary PyTorch modules, including
nn
for defining neural network layers,optim
for optimization algorithms, andtorchvision
for handling datasets and transformations. matplotlib
andnumpy
are imported for visualizing training progress and model predictions.
- We import necessary PyTorch modules, including
- CNN Model Definition:
- The
SimpleCNN
class is defined, inheriting fromnn.Module
. - It consists of two convolutional layers (
conv1
andconv2
), each followed by ReLU activation and max pooling to extract important features. - Two fully connected layers (
fc1
andfc2
) handle classification after feature extraction. - The
forward
method defines the flow of data through the layers.
- The
- Device Configuration:
- The model is set up to use a GPU if available, allowing for faster training.
- Data Preprocessing and Loading:
- Transformations are defined to convert images into tensors and normalize them for consistent model inputs.
- The MNIST dataset is loaded for both training and testing.
DataLoader
objects are used to efficiently batch and shuffle the data during training.
- Model Instantiation and Training Setup:
- An instance of
SimpleCNN
is created and moved to the selected device. - Cross-entropy loss is used for classification tasks, and the Adam optimizer is chosen for efficient weight updates.
- An instance of
- Training Loop:
- The model is trained over multiple epochs.
- After each epoch, training loss and accuracy are recorded.
- The model is evaluated on the test set at the end of each epoch to track generalization.
- Visualization of Training Progress:
- Training loss and accuracy are plotted over epochs to monitor learning trends.
- Test accuracy is also plotted to identify signs of overfitting or underfitting.
- Final Model Evaluation:
- The trained model is evaluated on the test set to determine its overall classification accuracy.
- Prediction Visualization:
- A few test images are displayed alongside their predicted and actual labels.
- Correct predictions are shown in green, and incorrect ones in red for easy interpretation.
This implementation covers the full workflow of training and evaluating a CNN, ensuring a structured and efficient approach to image classification. It enables easy modification of architecture, hyperparameters, and training settings for further experimentation.