Click here to view the next lesson.

Chapter 3: Training and Fine-Tuning Transformers

3.4 Practical Exercises

This section provides practical exercises to strengthen your understanding of training and fine-tuning transformer models. These exercises cover data preprocessing, fine-tuning techniques, and evaluation metrics. Each exercise includes a solution with detailed code examples to guide your implementation.

Exercise 1: Data Preprocessing for Classification

Task: Preprocess text data for binary classification using a tokenizer, including tokenization, padding, and truncation.

Instructions:

Use the BERT tokenizer to tokenize a list of text samples.
Ensure all sequences are padded and truncated to a fixed length.
Output the tokenized input IDs and attention masks.

Solution:

from transformers import BertTokenizer

# Initialize the BERT tokenizer
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")

# Sample text data
texts = ["Transformers are amazing!", "They are used in many NLP tasks."]

# Tokenize the text with padding and truncation
tokenized = tokenizer(texts, padding="max_length", truncation=True, max_length=10, return_tensors="pt")

# Display tokenized output
print("Input IDs:", tokenized["input_ids"])
print("Attention Masks:", tokenized["attention_mask"])

Expected Output:

Input IDs: [[ 101 19081  2024  6429   999  102    0    0    0    0]
            [ 101 2027  2024  2109  1999  2116 17953  4703  1012  102]]
Attention Masks: [[1 1 1 1 1 1 0 0 0 0]
                  [1 1 1 1 1 1 1 1 1 1]]

Exercise 2: Fine-Tune a Model Using LoRA

Task: Use LoRA to fine-tune a BERT model for sentiment analysis on the IMDB dataset.

Instructions:

Install the required libraries.
Load and preprocess the IMDB dataset.
Apply LoRA to the BERT model.
Fine-tune the model for two epochs.

Solution:

from datasets import load_dataset
from transformers import AutoTokenizer, AutoModelForSequenceClassification, TrainingArguments, Trainer
from peft import get_peft_model, LoraConfig, TaskType

# Load and preprocess the dataset
dataset = load_dataset("imdb")
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")

def preprocess_function(examples):
    return tokenizer(examples["text"], truncation=True, padding="max_length", max_length=256)

tokenized_datasets = dataset.map(preprocess_function, batched=True)

# Apply LoRA to the model
model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=2)
lora_config = LoraConfig(
    task_type=TaskType.SEQ_CLS, r=8, lora_alpha=32, lora_dropout=0.1
)
lora_model = get_peft_model(model, lora_config)

# Define training arguments
training_args = TrainingArguments(
    output_dir="./lora_results",
    evaluation_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=8,
    num_train_epochs=2
)

# Fine-tune the model
trainer = Trainer(
    model=lora_model,
    args=training_args,
    train_dataset=tokenized_datasets["train"].shuffle(seed=42).select(range(2000)),
    eval_dataset=tokenized_datasets["test"].shuffle(seed=42).select(range(500))
)
trainer.train()

Exercise 3: Evaluate a Model Using BLEU

Task: Evaluate a machine translation model’s output using the BLEU metric.

Instructions:

Define a reference translation and a candidate translation.
Calculate the BLEU score using NLTK.

Solution:

from nltk.translate.bleu_score import sentence_bleu, SmoothingFunction

# Reference and candidate translations
reference = ["The cat is on the mat".split()]
candidate = "The cat is on the mat".split()

# Calculate BLEU score
bleu_score = sentence_bleu(reference, candidate, smoothing_function=SmoothingFunction().method1)

print(f"BLEU Score: {bleu_score:.2f}")

Expected Output:

BLEU Score: 1.00

Exercise 4: Evaluate a Summarization Model Using ROUGE

Task: Evaluate a summarization model’s output using the ROUGE metric.

Instructions:

Define a reference summary and a candidate summary.
Calculate ROUGE-1, ROUGE-2, and ROUGE-L scores.

Solution:

from rouge_score import rouge_scorer

# Reference and candidate summaries
reference = "The cat is on the mat."
candidate = "The cat lies on the mat."

# Initialize ROUGE scorer
scorer = rouge_scorer.RougeScorer(['rouge1', 'rouge2', 'rougeL'], use_stemmer=True)

# Calculate ROUGE scores
scores = scorer.score(reference, candidate)

# Display results
print("ROUGE Scores:")
for key, value in scores.items():
    print(f"{key}: Precision: {value.precision:.3f}, Recall: {value.recall:.3f}, F1: {value.fmeasure:.3f}")

Expected Output:

ROUGE Scores:
rouge1: Precision: 0.833, Recall: 0.833, F1: 0.833
rouge2: Precision: 0.750, Recall: 0.750, F1: 0.750
rougeL: Precision: 0.833, Recall: 0.833, F1: 0.833

Exercise 5: Evaluate Text Generation Using BERTScore

Task: Evaluate the semantic similarity between generated text and a reference using BERTScore.

Instructions:

Define a reference and candidate text.
Compute BERTScore using a pretrained BERT model.

Solution:

from bert_score import score

# Reference and candidate texts
references = ["The cat is on the mat."]
candidates = ["The cat lies on the mat."]

# Compute BERTScore
P, R, F1 = score(candidates, references, lang="en", model_type="bert-base-uncased")

# Display results
print(f"BERTScore Precision: {P.mean():.3f}")
print(f"BERTScore Recall: {R.mean():.3f}")
print(f"BERTScore F1: {F1.mean():.3f}")

Expected Output:

BERTScore Precision: 0.987
BERTScore Recall: 0.992
BERTScore F1: 0.989

These exercises demonstrate the key steps in data preprocessing, fine-tuning using LoRA, and evaluating transformer models with BLEU, ROUGE, and BERTScore metrics. Completing these exercises will provide practical experience and deepen your understanding of training and evaluation techniques for transformer-based NLP models.

3.4 Practical Exercises

Exercise 1: Data Preprocessing for Classification

Task: Preprocess text data for binary classification using a tokenizer, including tokenization, padding, and truncation.

Instructions:

Use the BERT tokenizer to tokenize a list of text samples.
Ensure all sequences are padded and truncated to a fixed length.
Output the tokenized input IDs and attention masks.

Solution:

from transformers import BertTokenizer

# Initialize the BERT tokenizer
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")

# Sample text data
texts = ["Transformers are amazing!", "They are used in many NLP tasks."]

# Tokenize the text with padding and truncation
tokenized = tokenizer(texts, padding="max_length", truncation=True, max_length=10, return_tensors="pt")

# Display tokenized output
print("Input IDs:", tokenized["input_ids"])
print("Attention Masks:", tokenized["attention_mask"])

Expected Output:

Input IDs: [[ 101 19081  2024  6429   999  102    0    0    0    0]
            [ 101 2027  2024  2109  1999  2116 17953  4703  1012  102]]
Attention Masks: [[1 1 1 1 1 1 0 0 0 0]
                  [1 1 1 1 1 1 1 1 1 1]]

Exercise 2: Fine-Tune a Model Using LoRA

Task: Use LoRA to fine-tune a BERT model for sentiment analysis on the IMDB dataset.

Instructions:

Install the required libraries.
Load and preprocess the IMDB dataset.
Apply LoRA to the BERT model.
Fine-tune the model for two epochs.

Solution:

from datasets import load_dataset
from transformers import AutoTokenizer, AutoModelForSequenceClassification, TrainingArguments, Trainer
from peft import get_peft_model, LoraConfig, TaskType

# Load and preprocess the dataset
dataset = load_dataset("imdb")
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")

def preprocess_function(examples):
    return tokenizer(examples["text"], truncation=True, padding="max_length", max_length=256)

tokenized_datasets = dataset.map(preprocess_function, batched=True)

# Apply LoRA to the model
model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=2)
lora_config = LoraConfig(
    task_type=TaskType.SEQ_CLS, r=8, lora_alpha=32, lora_dropout=0.1
)
lora_model = get_peft_model(model, lora_config)

# Define training arguments
training_args = TrainingArguments(
    output_dir="./lora_results",
    evaluation_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=8,
    num_train_epochs=2
)

# Fine-tune the model
trainer = Trainer(
    model=lora_model,
    args=training_args,
    train_dataset=tokenized_datasets["train"].shuffle(seed=42).select(range(2000)),
    eval_dataset=tokenized_datasets["test"].shuffle(seed=42).select(range(500))
)
trainer.train()

Exercise 3: Evaluate a Model Using BLEU

Task: Evaluate a machine translation model’s output using the BLEU metric.

Instructions:

Define a reference translation and a candidate translation.
Calculate the BLEU score using NLTK.

Solution:

from nltk.translate.bleu_score import sentence_bleu, SmoothingFunction

# Reference and candidate translations
reference = ["The cat is on the mat".split()]
candidate = "The cat is on the mat".split()

# Calculate BLEU score
bleu_score = sentence_bleu(reference, candidate, smoothing_function=SmoothingFunction().method1)

print(f"BLEU Score: {bleu_score:.2f}")

Expected Output:

BLEU Score: 1.00

Exercise 4: Evaluate a Summarization Model Using ROUGE

Task: Evaluate a summarization model’s output using the ROUGE metric.

Instructions:

Define a reference summary and a candidate summary.
Calculate ROUGE-1, ROUGE-2, and ROUGE-L scores.

Solution:

from rouge_score import rouge_scorer

# Reference and candidate summaries
reference = "The cat is on the mat."
candidate = "The cat lies on the mat."

# Initialize ROUGE scorer
scorer = rouge_scorer.RougeScorer(['rouge1', 'rouge2', 'rougeL'], use_stemmer=True)

# Calculate ROUGE scores
scores = scorer.score(reference, candidate)

# Display results
print("ROUGE Scores:")
for key, value in scores.items():
    print(f"{key}: Precision: {value.precision:.3f}, Recall: {value.recall:.3f}, F1: {value.fmeasure:.3f}")

Expected Output:

ROUGE Scores:
rouge1: Precision: 0.833, Recall: 0.833, F1: 0.833
rouge2: Precision: 0.750, Recall: 0.750, F1: 0.750
rougeL: Precision: 0.833, Recall: 0.833, F1: 0.833

Exercise 5: Evaluate Text Generation Using BERTScore

Task: Evaluate the semantic similarity between generated text and a reference using BERTScore.

Instructions:

Define a reference and candidate text.
Compute BERTScore using a pretrained BERT model.

Solution:

from bert_score import score

# Reference and candidate texts
references = ["The cat is on the mat."]
candidates = ["The cat lies on the mat."]

# Compute BERTScore
P, R, F1 = score(candidates, references, lang="en", model_type="bert-base-uncased")

# Display results
print(f"BERTScore Precision: {P.mean():.3f}")
print(f"BERTScore Recall: {R.mean():.3f}")
print(f"BERTScore F1: {F1.mean():.3f}")

Expected Output:

BERTScore Precision: 0.987
BERTScore Recall: 0.992
BERTScore F1: 0.989

3.4 Practical Exercises

Exercise 1: Data Preprocessing for Classification

Task: Preprocess text data for binary classification using a tokenizer, including tokenization, padding, and truncation.

Instructions:

Use the BERT tokenizer to tokenize a list of text samples.
Ensure all sequences are padded and truncated to a fixed length.
Output the tokenized input IDs and attention masks.

Solution:

from transformers import BertTokenizer

# Initialize the BERT tokenizer
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")

# Sample text data
texts = ["Transformers are amazing!", "They are used in many NLP tasks."]

# Tokenize the text with padding and truncation
tokenized = tokenizer(texts, padding="max_length", truncation=True, max_length=10, return_tensors="pt")

# Display tokenized output
print("Input IDs:", tokenized["input_ids"])
print("Attention Masks:", tokenized["attention_mask"])

Expected Output:

Input IDs: [[ 101 19081  2024  6429   999  102    0    0    0    0]
            [ 101 2027  2024  2109  1999  2116 17953  4703  1012  102]]
Attention Masks: [[1 1 1 1 1 1 0 0 0 0]
                  [1 1 1 1 1 1 1 1 1 1]]

Exercise 2: Fine-Tune a Model Using LoRA

Task: Use LoRA to fine-tune a BERT model for sentiment analysis on the IMDB dataset.

Instructions:

Install the required libraries.
Load and preprocess the IMDB dataset.
Apply LoRA to the BERT model.
Fine-tune the model for two epochs.

Solution:

from datasets import load_dataset
from transformers import AutoTokenizer, AutoModelForSequenceClassification, TrainingArguments, Trainer
from peft import get_peft_model, LoraConfig, TaskType

# Load and preprocess the dataset
dataset = load_dataset("imdb")
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")

def preprocess_function(examples):
    return tokenizer(examples["text"], truncation=True, padding="max_length", max_length=256)

tokenized_datasets = dataset.map(preprocess_function, batched=True)

# Apply LoRA to the model
model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=2)
lora_config = LoraConfig(
    task_type=TaskType.SEQ_CLS, r=8, lora_alpha=32, lora_dropout=0.1
)
lora_model = get_peft_model(model, lora_config)

# Define training arguments
training_args = TrainingArguments(
    output_dir="./lora_results",
    evaluation_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=8,
    num_train_epochs=2
)

# Fine-tune the model
trainer = Trainer(
    model=lora_model,
    args=training_args,
    train_dataset=tokenized_datasets["train"].shuffle(seed=42).select(range(2000)),
    eval_dataset=tokenized_datasets["test"].shuffle(seed=42).select(range(500))
)
trainer.train()

Exercise 3: Evaluate a Model Using BLEU

Task: Evaluate a machine translation model’s output using the BLEU metric.

Instructions:

Define a reference translation and a candidate translation.
Calculate the BLEU score using NLTK.

Solution:

from nltk.translate.bleu_score import sentence_bleu, SmoothingFunction

# Reference and candidate translations
reference = ["The cat is on the mat".split()]
candidate = "The cat is on the mat".split()

# Calculate BLEU score
bleu_score = sentence_bleu(reference, candidate, smoothing_function=SmoothingFunction().method1)

print(f"BLEU Score: {bleu_score:.2f}")

Expected Output:

BLEU Score: 1.00

Exercise 4: Evaluate a Summarization Model Using ROUGE

Task: Evaluate a summarization model’s output using the ROUGE metric.

Instructions:

Define a reference summary and a candidate summary.
Calculate ROUGE-1, ROUGE-2, and ROUGE-L scores.

Solution:

from rouge_score import rouge_scorer

# Reference and candidate summaries
reference = "The cat is on the mat."
candidate = "The cat lies on the mat."

# Initialize ROUGE scorer
scorer = rouge_scorer.RougeScorer(['rouge1', 'rouge2', 'rougeL'], use_stemmer=True)

# Calculate ROUGE scores
scores = scorer.score(reference, candidate)

# Display results
print("ROUGE Scores:")
for key, value in scores.items():
    print(f"{key}: Precision: {value.precision:.3f}, Recall: {value.recall:.3f}, F1: {value.fmeasure:.3f}")

Expected Output:

ROUGE Scores:
rouge1: Precision: 0.833, Recall: 0.833, F1: 0.833
rouge2: Precision: 0.750, Recall: 0.750, F1: 0.750
rougeL: Precision: 0.833, Recall: 0.833, F1: 0.833

Exercise 5: Evaluate Text Generation Using BERTScore

Task: Evaluate the semantic similarity between generated text and a reference using BERTScore.

Instructions:

Define a reference and candidate text.
Compute BERTScore using a pretrained BERT model.

Solution:

from bert_score import score

# Reference and candidate texts
references = ["The cat is on the mat."]
candidates = ["The cat lies on the mat."]

# Compute BERTScore
P, R, F1 = score(candidates, references, lang="en", model_type="bert-base-uncased")

# Display results
print(f"BERTScore Precision: {P.mean():.3f}")
print(f"BERTScore Recall: {R.mean():.3f}")
print(f"BERTScore F1: {F1.mean():.3f}")

Expected Output:

BERTScore Precision: 0.987
BERTScore Recall: 0.992
BERTScore F1: 0.989

3.4 Practical Exercises

Exercise 1: Data Preprocessing for Classification

Task: Preprocess text data for binary classification using a tokenizer, including tokenization, padding, and truncation.

Instructions:

Use the BERT tokenizer to tokenize a list of text samples.
Ensure all sequences are padded and truncated to a fixed length.
Output the tokenized input IDs and attention masks.

Solution:

from transformers import BertTokenizer

# Initialize the BERT tokenizer
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")

# Sample text data
texts = ["Transformers are amazing!", "They are used in many NLP tasks."]

# Tokenize the text with padding and truncation
tokenized = tokenizer(texts, padding="max_length", truncation=True, max_length=10, return_tensors="pt")

# Display tokenized output
print("Input IDs:", tokenized["input_ids"])
print("Attention Masks:", tokenized["attention_mask"])

Expected Output:

Input IDs: [[ 101 19081  2024  6429   999  102    0    0    0    0]
            [ 101 2027  2024  2109  1999  2116 17953  4703  1012  102]]
Attention Masks: [[1 1 1 1 1 1 0 0 0 0]
                  [1 1 1 1 1 1 1 1 1 1]]

Exercise 2: Fine-Tune a Model Using LoRA

Task: Use LoRA to fine-tune a BERT model for sentiment analysis on the IMDB dataset.

Instructions:

Install the required libraries.
Load and preprocess the IMDB dataset.
Apply LoRA to the BERT model.
Fine-tune the model for two epochs.

Solution:

from datasets import load_dataset
from transformers import AutoTokenizer, AutoModelForSequenceClassification, TrainingArguments, Trainer
from peft import get_peft_model, LoraConfig, TaskType

# Load and preprocess the dataset
dataset = load_dataset("imdb")
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")

def preprocess_function(examples):
    return tokenizer(examples["text"], truncation=True, padding="max_length", max_length=256)

tokenized_datasets = dataset.map(preprocess_function, batched=True)

# Apply LoRA to the model
model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=2)
lora_config = LoraConfig(
    task_type=TaskType.SEQ_CLS, r=8, lora_alpha=32, lora_dropout=0.1
)
lora_model = get_peft_model(model, lora_config)

# Define training arguments
training_args = TrainingArguments(
    output_dir="./lora_results",
    evaluation_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=8,
    num_train_epochs=2
)

# Fine-tune the model
trainer = Trainer(
    model=lora_model,
    args=training_args,
    train_dataset=tokenized_datasets["train"].shuffle(seed=42).select(range(2000)),
    eval_dataset=tokenized_datasets["test"].shuffle(seed=42).select(range(500))
)
trainer.train()

Exercise 3: Evaluate a Model Using BLEU

Task: Evaluate a machine translation model’s output using the BLEU metric.

Instructions:

Define a reference translation and a candidate translation.
Calculate the BLEU score using NLTK.

Solution:

from nltk.translate.bleu_score import sentence_bleu, SmoothingFunction

# Reference and candidate translations
reference = ["The cat is on the mat".split()]
candidate = "The cat is on the mat".split()

# Calculate BLEU score
bleu_score = sentence_bleu(reference, candidate, smoothing_function=SmoothingFunction().method1)

print(f"BLEU Score: {bleu_score:.2f}")

Expected Output:

BLEU Score: 1.00

Exercise 4: Evaluate a Summarization Model Using ROUGE

Task: Evaluate a summarization model’s output using the ROUGE metric.

Instructions:

Define a reference summary and a candidate summary.
Calculate ROUGE-1, ROUGE-2, and ROUGE-L scores.

Solution:

from rouge_score import rouge_scorer

# Reference and candidate summaries
reference = "The cat is on the mat."
candidate = "The cat lies on the mat."

# Initialize ROUGE scorer
scorer = rouge_scorer.RougeScorer(['rouge1', 'rouge2', 'rougeL'], use_stemmer=True)

# Calculate ROUGE scores
scores = scorer.score(reference, candidate)

# Display results
print("ROUGE Scores:")
for key, value in scores.items():
    print(f"{key}: Precision: {value.precision:.3f}, Recall: {value.recall:.3f}, F1: {value.fmeasure:.3f}")

Expected Output:

ROUGE Scores:
rouge1: Precision: 0.833, Recall: 0.833, F1: 0.833
rouge2: Precision: 0.750, Recall: 0.750, F1: 0.750
rougeL: Precision: 0.833, Recall: 0.833, F1: 0.833

Exercise 5: Evaluate Text Generation Using BERTScore

Task: Evaluate the semantic similarity between generated text and a reference using BERTScore.

Instructions:

Define a reference and candidate text.
Compute BERTScore using a pretrained BERT model.

Solution:

from bert_score import score

# Reference and candidate texts
references = ["The cat is on the mat."]
candidates = ["The cat lies on the mat."]

# Compute BERTScore
P, R, F1 = score(candidates, references, lang="en", model_type="bert-base-uncased")

# Display results
print(f"BERTScore Precision: {P.mean():.3f}")
print(f"BERTScore Recall: {R.mean():.3f}")
print(f"BERTScore F1: {F1.mean():.3f}")

Expected Output:

BERTScore Precision: 0.987
BERTScore Recall: 0.992
BERTScore F1: 0.989

Purchase this book

The App is Under a Quick Maintenance

We apologize for the inconvenience. Please come back later

Chapter 3: Training and Fine-Tuning Transformers

3.4 Practical Exercises

Exercise 1: Data Preprocessing for Classification

Exercise 2: Fine-Tune a Model Using LoRA

Exercise 3: Evaluate a Model Using BLEU

Exercise 4: Evaluate a Summarization Model Using ROUGE

Exercise 5: Evaluate Text Generation Using BERTScore

3.4 Practical Exercises

Exercise 1: Data Preprocessing for Classification

Exercise 2: Fine-Tune a Model Using LoRA

Exercise 3: Evaluate a Model Using BLEU

Exercise 4: Evaluate a Summarization Model Using ROUGE

Exercise 5: Evaluate Text Generation Using BERTScore

3.4 Practical Exercises

Exercise 1: Data Preprocessing for Classification

Exercise 2: Fine-Tune a Model Using LoRA

Exercise 3: Evaluate a Model Using BLEU

Exercise 4: Evaluate a Summarization Model Using ROUGE

Exercise 5: Evaluate Text Generation Using BERTScore

3.4 Practical Exercises

Exercise 1: Data Preprocessing for Classification

Exercise 2: Fine-Tune a Model Using LoRA

Exercise 3: Evaluate a Model Using BLEU

Exercise 4: Evaluate a Summarization Model Using ROUGE

Exercise 5: Evaluate Text Generation Using BERTScore