Click here to view the next lesson.

Chapter 4: The Transformer Architecture

Practical Exercises for Chapter 4

These practical exercises are designed to reinforce your understanding of the core concepts discussed in Chapter 4, including the foundational principles of the Transformer architecture, its components, and comparisons with traditional architectures. Each exercise includes solutions and code examples for hands-on experience.

Exercise 1: Understanding Positional Encoding

Task: Write a Python function to generate positional encodings for a sequence of length nn and embedding dimension dmodeld_{\text{model}}. Visualize the positional encoding values for a sequence length of 10 and embedding dimension of 16.

Solution:

import numpy as np
import matplotlib.pyplot as plt

def positional_encoding(sequence_length, d_model):
    """
    Generate positional encoding for a sequence.
    sequence_length: Length of the sequence
    d_model: Dimensionality of embeddings
    """
    pos = np.arange(sequence_length)[:, np.newaxis]  # Positions
    i = np.arange(d_model)[np.newaxis, :]  # Embedding dimensions
    angle_rates = 1 / np.power(10000, (2 * (i // 2)) / d_model)
    angle_rads = pos * angle_rates

    # Apply sine to even indices, cosine to odd indices
    pos_encoding = np.zeros_like(angle_rads)
    pos_encoding[:, 0::2] = np.sin(angle_rads[:, 0::2])
    pos_encoding[:, 1::2] = np.cos(angle_rads[:, 1::2])
    return pos_encoding

# Generate positional encoding
sequence_length = 10
d_model = 16
pos_encoding = positional_encoding(sequence_length, d_model)

# Visualize the positional encoding
plt.figure(figsize=(10, 6))
plt.imshow(pos_encoding, cmap='viridis')
plt.colorbar(label='Encoding Value')
plt.title('Positional Encoding Visualization')
plt.xlabel('Embedding Dimension')
plt.ylabel('Token Position')
plt.show()

Exercise 2: Scaled Dot-Product Attention

Task: Implement a function for scaled dot-product attention and apply it to a small dataset. Print the attention weights and output.

Solution:

import numpy as np

def scaled_dot_product_attention(Q, K, V):
    """
    Compute scaled dot-product attention.
    Q: Queries
    K: Keys
    V: Values
    """
    d_k = Q.shape[-1]  # Dimension of keys
    scores = np.dot(Q, K.T) / np.sqrt(d_k)  # Scaled dot product
    weights = np.exp(scores) / np.sum(np.exp(scores), axis=-1, keepdims=True)  # Softmax
    output = np.dot(weights, V)  # Weighted sum of values
    return output, weights

# Example inputs
Q = np.array([[1, 0, 1]])
K = np.array([[1, 0, 1], [0, 1, 0], [1, 1, 0]])
V = np.array([[0.5, 1.0], [0.2, 0.8], [0.9, 0.3]])

output, weights = scaled_dot_product_attention(Q, K, V)
print("Attention Weights:\n", weights)
print("Attention Output:\n", output)

Expected Output:

Attention Weights:
 [[0.57611688 0.21194156 0.21194156]]
Attention Output:
 [[0.67394156 0.55611688]]

Exercise 3: Comparing RNN and Transformer Outputs

Task: Create a simple RNN and a Transformer model. Use both models to process the same input sequence and compare their outputs. For simplicity, use PyTorch.

Solution:

import torch
import torch.nn as nn
from transformers import BertModel, BertTokenizer

# Define a simple RNN
class SimpleRNN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(SimpleRNN, self).__init__()
        self.rnn = nn.RNN(input_size, hidden_size, batch_first=True)
        self.fc = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        out, _ = self.rnn(x)
        return self.fc(out[:, -1, :])

# RNN parameters
input_size = 10
hidden_size = 20
output_size = 10
sequence_length = 5
batch_size = 1

# Initialize and process input with RNN
rnn_model = SimpleRNN(input_size, hidden_size, output_size)
rnn_input = torch.randn(batch_size, sequence_length, input_size)
rnn_output = rnn_model(rnn_input)
print("RNN Output Shape:", rnn_output.shape)

# Transformer: Use pre-trained BERT
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
bert_model = BertModel.from_pretrained("bert-base-uncased")

# Input for Transformer
text = "The cat sat on the mat."
inputs = tokenizer(text, return_tensors="pt")
bert_output = bert_model(**inputs)
print("Transformer Output Shape:", bert_output.last_hidden_state.shape)

Exercise 4: Encoder-Decoder Interaction

Task: Simulate an encoder-decoder interaction by implementing simple encoder and decoder components. Pass data through both and print the final output.

Solution:

class Encoder(nn.Module):
    def __init__(self, input_dim, hidden_dim):
        super(Encoder, self).__init__()
        self.fc = nn.Linear(input_dim, hidden_dim)

    def forward(self, x):
        return torch.relu(self.fc(x))

class Decoder(nn.Module):
    def __init__(self, hidden_dim, output_dim):
        super(Decoder, self).__init__()
        self.fc = nn.Linear(hidden_dim, output_dim)

    def forward(self, x, encoder_output):
        combined = x + encoder_output  # Simple interaction
        return torch.sigmoid(self.fc(combined))

# Encoder-Decoder parameters
input_dim = 10
hidden_dim = 20
output_dim = 5
sequence_length = 6

# Initialize models
encoder = Encoder(input_dim, hidden_dim)
decoder = Decoder(hidden_dim, output_dim)

# Dummy input
x = torch.randn(sequence_length, input_dim)
encoder_output = encoder(x)
decoder_output = decoder(x, encoder_output)

print("Encoder Output Shape:", encoder_output.shape)
print("Decoder Output Shape:", decoder_output.shape)

These exercises provide a comprehensive hands-on experience with the concepts covered in Chapter 4, such as positional encoding, attention mechanisms, and encoder-decoder interactions. By completing these tasks, you’ll gain a deeper understanding of the Transformer architecture and its advantages over traditional models.

Practical Exercises for Chapter 4

Exercise 1: Understanding Positional Encoding

Solution:

import numpy as np
import matplotlib.pyplot as plt

def positional_encoding(sequence_length, d_model):
    """
    Generate positional encoding for a sequence.
    sequence_length: Length of the sequence
    d_model: Dimensionality of embeddings
    """
    pos = np.arange(sequence_length)[:, np.newaxis]  # Positions
    i = np.arange(d_model)[np.newaxis, :]  # Embedding dimensions
    angle_rates = 1 / np.power(10000, (2 * (i // 2)) / d_model)
    angle_rads = pos * angle_rates

    # Apply sine to even indices, cosine to odd indices
    pos_encoding = np.zeros_like(angle_rads)
    pos_encoding[:, 0::2] = np.sin(angle_rads[:, 0::2])
    pos_encoding[:, 1::2] = np.cos(angle_rads[:, 1::2])
    return pos_encoding

# Generate positional encoding
sequence_length = 10
d_model = 16
pos_encoding = positional_encoding(sequence_length, d_model)

# Visualize the positional encoding
plt.figure(figsize=(10, 6))
plt.imshow(pos_encoding, cmap='viridis')
plt.colorbar(label='Encoding Value')
plt.title('Positional Encoding Visualization')
plt.xlabel('Embedding Dimension')
plt.ylabel('Token Position')
plt.show()

Exercise 2: Scaled Dot-Product Attention

Task: Implement a function for scaled dot-product attention and apply it to a small dataset. Print the attention weights and output.

Solution:

import numpy as np

def scaled_dot_product_attention(Q, K, V):
    """
    Compute scaled dot-product attention.
    Q: Queries
    K: Keys
    V: Values
    """
    d_k = Q.shape[-1]  # Dimension of keys
    scores = np.dot(Q, K.T) / np.sqrt(d_k)  # Scaled dot product
    weights = np.exp(scores) / np.sum(np.exp(scores), axis=-1, keepdims=True)  # Softmax
    output = np.dot(weights, V)  # Weighted sum of values
    return output, weights

# Example inputs
Q = np.array([[1, 0, 1]])
K = np.array([[1, 0, 1], [0, 1, 0], [1, 1, 0]])
V = np.array([[0.5, 1.0], [0.2, 0.8], [0.9, 0.3]])

output, weights = scaled_dot_product_attention(Q, K, V)
print("Attention Weights:\n", weights)
print("Attention Output:\n", output)

Expected Output:

Attention Weights:
 [[0.57611688 0.21194156 0.21194156]]
Attention Output:
 [[0.67394156 0.55611688]]

Exercise 3: Comparing RNN and Transformer Outputs

Task: Create a simple RNN and a Transformer model. Use both models to process the same input sequence and compare their outputs. For simplicity, use PyTorch.

Solution:

import torch
import torch.nn as nn
from transformers import BertModel, BertTokenizer

# Define a simple RNN
class SimpleRNN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(SimpleRNN, self).__init__()
        self.rnn = nn.RNN(input_size, hidden_size, batch_first=True)
        self.fc = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        out, _ = self.rnn(x)
        return self.fc(out[:, -1, :])

# RNN parameters
input_size = 10
hidden_size = 20
output_size = 10
sequence_length = 5
batch_size = 1

# Initialize and process input with RNN
rnn_model = SimpleRNN(input_size, hidden_size, output_size)
rnn_input = torch.randn(batch_size, sequence_length, input_size)
rnn_output = rnn_model(rnn_input)
print("RNN Output Shape:", rnn_output.shape)

# Transformer: Use pre-trained BERT
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
bert_model = BertModel.from_pretrained("bert-base-uncased")

# Input for Transformer
text = "The cat sat on the mat."
inputs = tokenizer(text, return_tensors="pt")
bert_output = bert_model(**inputs)
print("Transformer Output Shape:", bert_output.last_hidden_state.shape)

Exercise 4: Encoder-Decoder Interaction

Task: Simulate an encoder-decoder interaction by implementing simple encoder and decoder components. Pass data through both and print the final output.

Solution:

class Encoder(nn.Module):
    def __init__(self, input_dim, hidden_dim):
        super(Encoder, self).__init__()
        self.fc = nn.Linear(input_dim, hidden_dim)

    def forward(self, x):
        return torch.relu(self.fc(x))

class Decoder(nn.Module):
    def __init__(self, hidden_dim, output_dim):
        super(Decoder, self).__init__()
        self.fc = nn.Linear(hidden_dim, output_dim)

    def forward(self, x, encoder_output):
        combined = x + encoder_output  # Simple interaction
        return torch.sigmoid(self.fc(combined))

# Encoder-Decoder parameters
input_dim = 10
hidden_dim = 20
output_dim = 5
sequence_length = 6

# Initialize models
encoder = Encoder(input_dim, hidden_dim)
decoder = Decoder(hidden_dim, output_dim)

# Dummy input
x = torch.randn(sequence_length, input_dim)
encoder_output = encoder(x)
decoder_output = decoder(x, encoder_output)

print("Encoder Output Shape:", encoder_output.shape)
print("Decoder Output Shape:", decoder_output.shape)

Practical Exercises for Chapter 4

Exercise 1: Understanding Positional Encoding

Solution:

import numpy as np
import matplotlib.pyplot as plt

def positional_encoding(sequence_length, d_model):
    """
    Generate positional encoding for a sequence.
    sequence_length: Length of the sequence
    d_model: Dimensionality of embeddings
    """
    pos = np.arange(sequence_length)[:, np.newaxis]  # Positions
    i = np.arange(d_model)[np.newaxis, :]  # Embedding dimensions
    angle_rates = 1 / np.power(10000, (2 * (i // 2)) / d_model)
    angle_rads = pos * angle_rates

    # Apply sine to even indices, cosine to odd indices
    pos_encoding = np.zeros_like(angle_rads)
    pos_encoding[:, 0::2] = np.sin(angle_rads[:, 0::2])
    pos_encoding[:, 1::2] = np.cos(angle_rads[:, 1::2])
    return pos_encoding

# Generate positional encoding
sequence_length = 10
d_model = 16
pos_encoding = positional_encoding(sequence_length, d_model)

# Visualize the positional encoding
plt.figure(figsize=(10, 6))
plt.imshow(pos_encoding, cmap='viridis')
plt.colorbar(label='Encoding Value')
plt.title('Positional Encoding Visualization')
plt.xlabel('Embedding Dimension')
plt.ylabel('Token Position')
plt.show()

Exercise 2: Scaled Dot-Product Attention

Task: Implement a function for scaled dot-product attention and apply it to a small dataset. Print the attention weights and output.

Solution:

import numpy as np

def scaled_dot_product_attention(Q, K, V):
    """
    Compute scaled dot-product attention.
    Q: Queries
    K: Keys
    V: Values
    """
    d_k = Q.shape[-1]  # Dimension of keys
    scores = np.dot(Q, K.T) / np.sqrt(d_k)  # Scaled dot product
    weights = np.exp(scores) / np.sum(np.exp(scores), axis=-1, keepdims=True)  # Softmax
    output = np.dot(weights, V)  # Weighted sum of values
    return output, weights

# Example inputs
Q = np.array([[1, 0, 1]])
K = np.array([[1, 0, 1], [0, 1, 0], [1, 1, 0]])
V = np.array([[0.5, 1.0], [0.2, 0.8], [0.9, 0.3]])

output, weights = scaled_dot_product_attention(Q, K, V)
print("Attention Weights:\n", weights)
print("Attention Output:\n", output)

Expected Output:

Attention Weights:
 [[0.57611688 0.21194156 0.21194156]]
Attention Output:
 [[0.67394156 0.55611688]]

Exercise 3: Comparing RNN and Transformer Outputs

Task: Create a simple RNN and a Transformer model. Use both models to process the same input sequence and compare their outputs. For simplicity, use PyTorch.

Solution:

import torch
import torch.nn as nn
from transformers import BertModel, BertTokenizer

# Define a simple RNN
class SimpleRNN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(SimpleRNN, self).__init__()
        self.rnn = nn.RNN(input_size, hidden_size, batch_first=True)
        self.fc = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        out, _ = self.rnn(x)
        return self.fc(out[:, -1, :])

# RNN parameters
input_size = 10
hidden_size = 20
output_size = 10
sequence_length = 5
batch_size = 1

# Initialize and process input with RNN
rnn_model = SimpleRNN(input_size, hidden_size, output_size)
rnn_input = torch.randn(batch_size, sequence_length, input_size)
rnn_output = rnn_model(rnn_input)
print("RNN Output Shape:", rnn_output.shape)

# Transformer: Use pre-trained BERT
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
bert_model = BertModel.from_pretrained("bert-base-uncased")

# Input for Transformer
text = "The cat sat on the mat."
inputs = tokenizer(text, return_tensors="pt")
bert_output = bert_model(**inputs)
print("Transformer Output Shape:", bert_output.last_hidden_state.shape)

Exercise 4: Encoder-Decoder Interaction

Task: Simulate an encoder-decoder interaction by implementing simple encoder and decoder components. Pass data through both and print the final output.

Solution:

class Encoder(nn.Module):
    def __init__(self, input_dim, hidden_dim):
        super(Encoder, self).__init__()
        self.fc = nn.Linear(input_dim, hidden_dim)

    def forward(self, x):
        return torch.relu(self.fc(x))

class Decoder(nn.Module):
    def __init__(self, hidden_dim, output_dim):
        super(Decoder, self).__init__()
        self.fc = nn.Linear(hidden_dim, output_dim)

    def forward(self, x, encoder_output):
        combined = x + encoder_output  # Simple interaction
        return torch.sigmoid(self.fc(combined))

# Encoder-Decoder parameters
input_dim = 10
hidden_dim = 20
output_dim = 5
sequence_length = 6

# Initialize models
encoder = Encoder(input_dim, hidden_dim)
decoder = Decoder(hidden_dim, output_dim)

# Dummy input
x = torch.randn(sequence_length, input_dim)
encoder_output = encoder(x)
decoder_output = decoder(x, encoder_output)

print("Encoder Output Shape:", encoder_output.shape)
print("Decoder Output Shape:", decoder_output.shape)

Practical Exercises for Chapter 4

Exercise 1: Understanding Positional Encoding

Solution:

import numpy as np
import matplotlib.pyplot as plt

def positional_encoding(sequence_length, d_model):
    """
    Generate positional encoding for a sequence.
    sequence_length: Length of the sequence
    d_model: Dimensionality of embeddings
    """
    pos = np.arange(sequence_length)[:, np.newaxis]  # Positions
    i = np.arange(d_model)[np.newaxis, :]  # Embedding dimensions
    angle_rates = 1 / np.power(10000, (2 * (i // 2)) / d_model)
    angle_rads = pos * angle_rates

    # Apply sine to even indices, cosine to odd indices
    pos_encoding = np.zeros_like(angle_rads)
    pos_encoding[:, 0::2] = np.sin(angle_rads[:, 0::2])
    pos_encoding[:, 1::2] = np.cos(angle_rads[:, 1::2])
    return pos_encoding

# Generate positional encoding
sequence_length = 10
d_model = 16
pos_encoding = positional_encoding(sequence_length, d_model)

# Visualize the positional encoding
plt.figure(figsize=(10, 6))
plt.imshow(pos_encoding, cmap='viridis')
plt.colorbar(label='Encoding Value')
plt.title('Positional Encoding Visualization')
plt.xlabel('Embedding Dimension')
plt.ylabel('Token Position')
plt.show()

Exercise 2: Scaled Dot-Product Attention

Task: Implement a function for scaled dot-product attention and apply it to a small dataset. Print the attention weights and output.

Solution:

import numpy as np

def scaled_dot_product_attention(Q, K, V):
    """
    Compute scaled dot-product attention.
    Q: Queries
    K: Keys
    V: Values
    """
    d_k = Q.shape[-1]  # Dimension of keys
    scores = np.dot(Q, K.T) / np.sqrt(d_k)  # Scaled dot product
    weights = np.exp(scores) / np.sum(np.exp(scores), axis=-1, keepdims=True)  # Softmax
    output = np.dot(weights, V)  # Weighted sum of values
    return output, weights

# Example inputs
Q = np.array([[1, 0, 1]])
K = np.array([[1, 0, 1], [0, 1, 0], [1, 1, 0]])
V = np.array([[0.5, 1.0], [0.2, 0.8], [0.9, 0.3]])

output, weights = scaled_dot_product_attention(Q, K, V)
print("Attention Weights:\n", weights)
print("Attention Output:\n", output)

Expected Output:

Attention Weights:
 [[0.57611688 0.21194156 0.21194156]]
Attention Output:
 [[0.67394156 0.55611688]]

Exercise 3: Comparing RNN and Transformer Outputs

Task: Create a simple RNN and a Transformer model. Use both models to process the same input sequence and compare their outputs. For simplicity, use PyTorch.

Solution:

import torch
import torch.nn as nn
from transformers import BertModel, BertTokenizer

# Define a simple RNN
class SimpleRNN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(SimpleRNN, self).__init__()
        self.rnn = nn.RNN(input_size, hidden_size, batch_first=True)
        self.fc = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        out, _ = self.rnn(x)
        return self.fc(out[:, -1, :])

# RNN parameters
input_size = 10
hidden_size = 20
output_size = 10
sequence_length = 5
batch_size = 1

# Initialize and process input with RNN
rnn_model = SimpleRNN(input_size, hidden_size, output_size)
rnn_input = torch.randn(batch_size, sequence_length, input_size)
rnn_output = rnn_model(rnn_input)
print("RNN Output Shape:", rnn_output.shape)

# Transformer: Use pre-trained BERT
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
bert_model = BertModel.from_pretrained("bert-base-uncased")

# Input for Transformer
text = "The cat sat on the mat."
inputs = tokenizer(text, return_tensors="pt")
bert_output = bert_model(**inputs)
print("Transformer Output Shape:", bert_output.last_hidden_state.shape)

Exercise 4: Encoder-Decoder Interaction

Task: Simulate an encoder-decoder interaction by implementing simple encoder and decoder components. Pass data through both and print the final output.

Solution:

class Encoder(nn.Module):
    def __init__(self, input_dim, hidden_dim):
        super(Encoder, self).__init__()
        self.fc = nn.Linear(input_dim, hidden_dim)

    def forward(self, x):
        return torch.relu(self.fc(x))

class Decoder(nn.Module):
    def __init__(self, hidden_dim, output_dim):
        super(Decoder, self).__init__()
        self.fc = nn.Linear(hidden_dim, output_dim)

    def forward(self, x, encoder_output):
        combined = x + encoder_output  # Simple interaction
        return torch.sigmoid(self.fc(combined))

# Encoder-Decoder parameters
input_dim = 10
hidden_dim = 20
output_dim = 5
sequence_length = 6

# Initialize models
encoder = Encoder(input_dim, hidden_dim)
decoder = Decoder(hidden_dim, output_dim)

# Dummy input
x = torch.randn(sequence_length, input_dim)
encoder_output = encoder(x)
decoder_output = decoder(x, encoder_output)

print("Encoder Output Shape:", encoder_output.shape)
print("Decoder Output Shape:", decoder_output.shape)

Purchase this book

The App is Under a Quick Maintenance

We apologize for the inconvenience. Please come back later

Chapter 4: The Transformer Architecture

Practical Exercises for Chapter 4

Exercise 1: Understanding Positional Encoding

Exercise 2: Scaled Dot-Product Attention

Exercise 3: Comparing RNN and Transformer Outputs

Exercise 4: Encoder-Decoder Interaction

Practical Exercises for Chapter 4

Exercise 1: Understanding Positional Encoding

Exercise 2: Scaled Dot-Product Attention

Exercise 3: Comparing RNN and Transformer Outputs

Exercise 4: Encoder-Decoder Interaction

Practical Exercises for Chapter 4

Exercise 1: Understanding Positional Encoding

Exercise 2: Scaled Dot-Product Attention

Exercise 3: Comparing RNN and Transformer Outputs

Exercise 4: Encoder-Decoder Interaction

Practical Exercises for Chapter 4

Exercise 1: Understanding Positional Encoding

Exercise 2: Scaled Dot-Product Attention

Exercise 3: Comparing RNN and Transformer Outputs

Exercise 4: Encoder-Decoder Interaction