Chapter 6: Core NLP Applications
Practical Exercises for Chapter 6
These practical exercises will help you solidify your understanding of sentiment analysis, named entity recognition (NER), and text classification. Each exercise includes a solution with detailed code examples for hands-on practice.
Exercise 1: Sentiment Analysis
Task: Perform sentiment analysis on a set of product reviews using a pre-trained BERT model.
Solution:
from transformers import pipeline
# Load sentiment analysis pipeline
sentiment_analyzer = pipeline("sentiment-analysis")
# Product reviews
reviews = [
"This product is amazing! Highly recommend.",
"It was a complete waste of money.",
"The product is okay, but not worth the price."
]
# Analyze sentiment
results = sentiment_analyzer(reviews)
# Display results
print("Sentiment Analysis Results:")
for review, result in zip(reviews, results):
print(f"Review: {review}")
print(f"Sentiment: {result['label']}, Score: {result['score']:.2f}\n")
Expected Output:
Sentiment Analysis Results:
Review: This product is amazing! Highly recommend.
Sentiment: POSITIVE, Score: 0.99
Review: It was a complete waste of money.
Sentiment: NEGATIVE, Score: 0.97
Review: The product is okay, but not worth the price.
Sentiment: NEUTRAL, Score: 0.75
Exercise 2: Named Entity Recognition
Task: Identify named entities in a legal document using a pre-trained NER model.
Solution:
from transformers import pipeline
# Load NER pipeline
ner_pipeline = pipeline("ner", grouped_entities=True)
# Legal text
text = "The contract was signed by John Doe on January 15, 2023, in New York City."
# Perform NER
results = ner_pipeline(text)
# Display results
print("Named Entities:")
for entity in results:
print(f"Entity: {entity['word']}, Type: {entity['entity_group']}, Score: {entity['score']:.2f}")
Expected Output:
Named Entities:
Entity: John Doe, Type: PER, Score: 0.99
Entity: January 15, 2023, Type: DATE, Score: 0.98
Entity: New York City, Type: LOC, Score: 0.97
Exercise 3: Custom Text Classification
Task: Classify customer support queries into categories like "Billing", "Technical Issue", or "General Inquiry."
Solution:
from transformers import pipeline
# Load text classification pipeline
classifier = pipeline("text-classification", model="distilbert-base-uncased")
# Customer support queries
queries = [
"I need help with my billing statement.",
"The app crashes whenever I open it.",
"Can you tell me about your subscription plans?"
]
# Perform classification
results = classifier(queries)
# Display results
print("Classification Results:")
for query, result in zip(queries, results):
print(f"Query: {query}")
print(f"Label: {result['label']}, Score: {result['score']:.2f}\n")
Exercise 4: Fine-Tune a Transformer for Text Classification
Task: Fine-tune a BERT model to classify news articles into topics like "Politics", "Sports", and "Technology."
Solution:
from transformers import BertTokenizer, BertForSequenceClassification, Trainer, TrainingArguments
from torch.utils.data import Dataset
# Define a custom dataset class
class NewsDataset(Dataset):
def __init__(self, texts, labels, tokenizer, max_length=128):
self.texts = texts
self.labels = labels
self.tokenizer = tokenizer
self.max_length = max_length
def __len__(self):
return len(self.texts)
def __getitem__(self, idx):
text = self.texts[idx]
label = self.labels[idx]
encoding = self.tokenizer(
text, truncation=True, padding="max_length", max_length=self.max_length, return_tensors="pt"
)
return {key: val.squeeze(0) for key, val in encoding.items()}, label
# Example data
texts = [
"The government passed a new law today.",
"The local team won the championship game!",
"New advancements in AI are transforming technology."
]
labels = [0, 1, 2] # 0: Politics, 1: Sports, 2: Technology
# Load tokenizer and model
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
model = BertForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=3)
# Prepare dataset
dataset = NewsDataset(texts, labels, tokenizer)
# Define training arguments
training_args = TrainingArguments(
output_dir="./news_results",
evaluation_strategy="epoch",
learning_rate=2e-5,
per_device_train_batch_size=4,
num_train_epochs=3,
weight_decay=0.01,
)
# Initialize Trainer
trainer = Trainer(
model=model,
args=training_args,
train_dataset=dataset,
)
# Fine-tune the model
trainer.train()
Exercise 5: Evaluate the Fine-Tuned Model
Task: Use the fine-tuned model to classify new articles into topics.
Solution:
# New articles
new_texts = [
"The president addressed the nation on economic reforms.",
"The basketball team secured a historic win last night."
]
# Tokenize and predict
for text in new_texts:
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)
prediction = torch.argmax(outputs.logits, dim=1).item()
label = ["Politics", "Sports", "Technology"][prediction]
print(f"Article: {text}\nPredicted Topic: {label}\n")
Expected Output:
Article: The president addressed the nation on economic reforms.
Predicted Topic: Politics
Article: The basketball team secured a historic win last night.
Predicted Topic: Sports
These exercises cover practical applications of sentiment analysis, NER, and text classification. You’ve learned how to use pre-trained models, fine-tune them for custom tasks, and apply them to real-world scenarios. By completing these tasks, you’re well-equipped to implement these powerful NLP techniques in your projects.
Practical Exercises for Chapter 6
These practical exercises will help you solidify your understanding of sentiment analysis, named entity recognition (NER), and text classification. Each exercise includes a solution with detailed code examples for hands-on practice.
Exercise 1: Sentiment Analysis
Task: Perform sentiment analysis on a set of product reviews using a pre-trained BERT model.
Solution:
from transformers import pipeline
# Load sentiment analysis pipeline
sentiment_analyzer = pipeline("sentiment-analysis")
# Product reviews
reviews = [
"This product is amazing! Highly recommend.",
"It was a complete waste of money.",
"The product is okay, but not worth the price."
]
# Analyze sentiment
results = sentiment_analyzer(reviews)
# Display results
print("Sentiment Analysis Results:")
for review, result in zip(reviews, results):
print(f"Review: {review}")
print(f"Sentiment: {result['label']}, Score: {result['score']:.2f}\n")
Expected Output:
Sentiment Analysis Results:
Review: This product is amazing! Highly recommend.
Sentiment: POSITIVE, Score: 0.99
Review: It was a complete waste of money.
Sentiment: NEGATIVE, Score: 0.97
Review: The product is okay, but not worth the price.
Sentiment: NEUTRAL, Score: 0.75
Exercise 2: Named Entity Recognition
Task: Identify named entities in a legal document using a pre-trained NER model.
Solution:
from transformers import pipeline
# Load NER pipeline
ner_pipeline = pipeline("ner", grouped_entities=True)
# Legal text
text = "The contract was signed by John Doe on January 15, 2023, in New York City."
# Perform NER
results = ner_pipeline(text)
# Display results
print("Named Entities:")
for entity in results:
print(f"Entity: {entity['word']}, Type: {entity['entity_group']}, Score: {entity['score']:.2f}")
Expected Output:
Named Entities:
Entity: John Doe, Type: PER, Score: 0.99
Entity: January 15, 2023, Type: DATE, Score: 0.98
Entity: New York City, Type: LOC, Score: 0.97
Exercise 3: Custom Text Classification
Task: Classify customer support queries into categories like "Billing", "Technical Issue", or "General Inquiry."
Solution:
from transformers import pipeline
# Load text classification pipeline
classifier = pipeline("text-classification", model="distilbert-base-uncased")
# Customer support queries
queries = [
"I need help with my billing statement.",
"The app crashes whenever I open it.",
"Can you tell me about your subscription plans?"
]
# Perform classification
results = classifier(queries)
# Display results
print("Classification Results:")
for query, result in zip(queries, results):
print(f"Query: {query}")
print(f"Label: {result['label']}, Score: {result['score']:.2f}\n")
Exercise 4: Fine-Tune a Transformer for Text Classification
Task: Fine-tune a BERT model to classify news articles into topics like "Politics", "Sports", and "Technology."
Solution:
from transformers import BertTokenizer, BertForSequenceClassification, Trainer, TrainingArguments
from torch.utils.data import Dataset
# Define a custom dataset class
class NewsDataset(Dataset):
def __init__(self, texts, labels, tokenizer, max_length=128):
self.texts = texts
self.labels = labels
self.tokenizer = tokenizer
self.max_length = max_length
def __len__(self):
return len(self.texts)
def __getitem__(self, idx):
text = self.texts[idx]
label = self.labels[idx]
encoding = self.tokenizer(
text, truncation=True, padding="max_length", max_length=self.max_length, return_tensors="pt"
)
return {key: val.squeeze(0) for key, val in encoding.items()}, label
# Example data
texts = [
"The government passed a new law today.",
"The local team won the championship game!",
"New advancements in AI are transforming technology."
]
labels = [0, 1, 2] # 0: Politics, 1: Sports, 2: Technology
# Load tokenizer and model
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
model = BertForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=3)
# Prepare dataset
dataset = NewsDataset(texts, labels, tokenizer)
# Define training arguments
training_args = TrainingArguments(
output_dir="./news_results",
evaluation_strategy="epoch",
learning_rate=2e-5,
per_device_train_batch_size=4,
num_train_epochs=3,
weight_decay=0.01,
)
# Initialize Trainer
trainer = Trainer(
model=model,
args=training_args,
train_dataset=dataset,
)
# Fine-tune the model
trainer.train()
Exercise 5: Evaluate the Fine-Tuned Model
Task: Use the fine-tuned model to classify new articles into topics.
Solution:
# New articles
new_texts = [
"The president addressed the nation on economic reforms.",
"The basketball team secured a historic win last night."
]
# Tokenize and predict
for text in new_texts:
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)
prediction = torch.argmax(outputs.logits, dim=1).item()
label = ["Politics", "Sports", "Technology"][prediction]
print(f"Article: {text}\nPredicted Topic: {label}\n")
Expected Output:
Article: The president addressed the nation on economic reforms.
Predicted Topic: Politics
Article: The basketball team secured a historic win last night.
Predicted Topic: Sports
These exercises cover practical applications of sentiment analysis, NER, and text classification. You’ve learned how to use pre-trained models, fine-tune them for custom tasks, and apply them to real-world scenarios. By completing these tasks, you’re well-equipped to implement these powerful NLP techniques in your projects.
Practical Exercises for Chapter 6
These practical exercises will help you solidify your understanding of sentiment analysis, named entity recognition (NER), and text classification. Each exercise includes a solution with detailed code examples for hands-on practice.
Exercise 1: Sentiment Analysis
Task: Perform sentiment analysis on a set of product reviews using a pre-trained BERT model.
Solution:
from transformers import pipeline
# Load sentiment analysis pipeline
sentiment_analyzer = pipeline("sentiment-analysis")
# Product reviews
reviews = [
"This product is amazing! Highly recommend.",
"It was a complete waste of money.",
"The product is okay, but not worth the price."
]
# Analyze sentiment
results = sentiment_analyzer(reviews)
# Display results
print("Sentiment Analysis Results:")
for review, result in zip(reviews, results):
print(f"Review: {review}")
print(f"Sentiment: {result['label']}, Score: {result['score']:.2f}\n")
Expected Output:
Sentiment Analysis Results:
Review: This product is amazing! Highly recommend.
Sentiment: POSITIVE, Score: 0.99
Review: It was a complete waste of money.
Sentiment: NEGATIVE, Score: 0.97
Review: The product is okay, but not worth the price.
Sentiment: NEUTRAL, Score: 0.75
Exercise 2: Named Entity Recognition
Task: Identify named entities in a legal document using a pre-trained NER model.
Solution:
from transformers import pipeline
# Load NER pipeline
ner_pipeline = pipeline("ner", grouped_entities=True)
# Legal text
text = "The contract was signed by John Doe on January 15, 2023, in New York City."
# Perform NER
results = ner_pipeline(text)
# Display results
print("Named Entities:")
for entity in results:
print(f"Entity: {entity['word']}, Type: {entity['entity_group']}, Score: {entity['score']:.2f}")
Expected Output:
Named Entities:
Entity: John Doe, Type: PER, Score: 0.99
Entity: January 15, 2023, Type: DATE, Score: 0.98
Entity: New York City, Type: LOC, Score: 0.97
Exercise 3: Custom Text Classification
Task: Classify customer support queries into categories like "Billing", "Technical Issue", or "General Inquiry."
Solution:
from transformers import pipeline
# Load text classification pipeline
classifier = pipeline("text-classification", model="distilbert-base-uncased")
# Customer support queries
queries = [
"I need help with my billing statement.",
"The app crashes whenever I open it.",
"Can you tell me about your subscription plans?"
]
# Perform classification
results = classifier(queries)
# Display results
print("Classification Results:")
for query, result in zip(queries, results):
print(f"Query: {query}")
print(f"Label: {result['label']}, Score: {result['score']:.2f}\n")
Exercise 4: Fine-Tune a Transformer for Text Classification
Task: Fine-tune a BERT model to classify news articles into topics like "Politics", "Sports", and "Technology."
Solution:
from transformers import BertTokenizer, BertForSequenceClassification, Trainer, TrainingArguments
from torch.utils.data import Dataset
# Define a custom dataset class
class NewsDataset(Dataset):
def __init__(self, texts, labels, tokenizer, max_length=128):
self.texts = texts
self.labels = labels
self.tokenizer = tokenizer
self.max_length = max_length
def __len__(self):
return len(self.texts)
def __getitem__(self, idx):
text = self.texts[idx]
label = self.labels[idx]
encoding = self.tokenizer(
text, truncation=True, padding="max_length", max_length=self.max_length, return_tensors="pt"
)
return {key: val.squeeze(0) for key, val in encoding.items()}, label
# Example data
texts = [
"The government passed a new law today.",
"The local team won the championship game!",
"New advancements in AI are transforming technology."
]
labels = [0, 1, 2] # 0: Politics, 1: Sports, 2: Technology
# Load tokenizer and model
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
model = BertForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=3)
# Prepare dataset
dataset = NewsDataset(texts, labels, tokenizer)
# Define training arguments
training_args = TrainingArguments(
output_dir="./news_results",
evaluation_strategy="epoch",
learning_rate=2e-5,
per_device_train_batch_size=4,
num_train_epochs=3,
weight_decay=0.01,
)
# Initialize Trainer
trainer = Trainer(
model=model,
args=training_args,
train_dataset=dataset,
)
# Fine-tune the model
trainer.train()
Exercise 5: Evaluate the Fine-Tuned Model
Task: Use the fine-tuned model to classify new articles into topics.
Solution:
# New articles
new_texts = [
"The president addressed the nation on economic reforms.",
"The basketball team secured a historic win last night."
]
# Tokenize and predict
for text in new_texts:
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)
prediction = torch.argmax(outputs.logits, dim=1).item()
label = ["Politics", "Sports", "Technology"][prediction]
print(f"Article: {text}\nPredicted Topic: {label}\n")
Expected Output:
Article: The president addressed the nation on economic reforms.
Predicted Topic: Politics
Article: The basketball team secured a historic win last night.
Predicted Topic: Sports
These exercises cover practical applications of sentiment analysis, NER, and text classification. You’ve learned how to use pre-trained models, fine-tune them for custom tasks, and apply them to real-world scenarios. By completing these tasks, you’re well-equipped to implement these powerful NLP techniques in your projects.
Practical Exercises for Chapter 6
These practical exercises will help you solidify your understanding of sentiment analysis, named entity recognition (NER), and text classification. Each exercise includes a solution with detailed code examples for hands-on practice.
Exercise 1: Sentiment Analysis
Task: Perform sentiment analysis on a set of product reviews using a pre-trained BERT model.
Solution:
from transformers import pipeline
# Load sentiment analysis pipeline
sentiment_analyzer = pipeline("sentiment-analysis")
# Product reviews
reviews = [
"This product is amazing! Highly recommend.",
"It was a complete waste of money.",
"The product is okay, but not worth the price."
]
# Analyze sentiment
results = sentiment_analyzer(reviews)
# Display results
print("Sentiment Analysis Results:")
for review, result in zip(reviews, results):
print(f"Review: {review}")
print(f"Sentiment: {result['label']}, Score: {result['score']:.2f}\n")
Expected Output:
Sentiment Analysis Results:
Review: This product is amazing! Highly recommend.
Sentiment: POSITIVE, Score: 0.99
Review: It was a complete waste of money.
Sentiment: NEGATIVE, Score: 0.97
Review: The product is okay, but not worth the price.
Sentiment: NEUTRAL, Score: 0.75
Exercise 2: Named Entity Recognition
Task: Identify named entities in a legal document using a pre-trained NER model.
Solution:
from transformers import pipeline
# Load NER pipeline
ner_pipeline = pipeline("ner", grouped_entities=True)
# Legal text
text = "The contract was signed by John Doe on January 15, 2023, in New York City."
# Perform NER
results = ner_pipeline(text)
# Display results
print("Named Entities:")
for entity in results:
print(f"Entity: {entity['word']}, Type: {entity['entity_group']}, Score: {entity['score']:.2f}")
Expected Output:
Named Entities:
Entity: John Doe, Type: PER, Score: 0.99
Entity: January 15, 2023, Type: DATE, Score: 0.98
Entity: New York City, Type: LOC, Score: 0.97
Exercise 3: Custom Text Classification
Task: Classify customer support queries into categories like "Billing", "Technical Issue", or "General Inquiry."
Solution:
from transformers import pipeline
# Load text classification pipeline
classifier = pipeline("text-classification", model="distilbert-base-uncased")
# Customer support queries
queries = [
"I need help with my billing statement.",
"The app crashes whenever I open it.",
"Can you tell me about your subscription plans?"
]
# Perform classification
results = classifier(queries)
# Display results
print("Classification Results:")
for query, result in zip(queries, results):
print(f"Query: {query}")
print(f"Label: {result['label']}, Score: {result['score']:.2f}\n")
Exercise 4: Fine-Tune a Transformer for Text Classification
Task: Fine-tune a BERT model to classify news articles into topics like "Politics", "Sports", and "Technology."
Solution:
from transformers import BertTokenizer, BertForSequenceClassification, Trainer, TrainingArguments
from torch.utils.data import Dataset
# Define a custom dataset class
class NewsDataset(Dataset):
def __init__(self, texts, labels, tokenizer, max_length=128):
self.texts = texts
self.labels = labels
self.tokenizer = tokenizer
self.max_length = max_length
def __len__(self):
return len(self.texts)
def __getitem__(self, idx):
text = self.texts[idx]
label = self.labels[idx]
encoding = self.tokenizer(
text, truncation=True, padding="max_length", max_length=self.max_length, return_tensors="pt"
)
return {key: val.squeeze(0) for key, val in encoding.items()}, label
# Example data
texts = [
"The government passed a new law today.",
"The local team won the championship game!",
"New advancements in AI are transforming technology."
]
labels = [0, 1, 2] # 0: Politics, 1: Sports, 2: Technology
# Load tokenizer and model
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
model = BertForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=3)
# Prepare dataset
dataset = NewsDataset(texts, labels, tokenizer)
# Define training arguments
training_args = TrainingArguments(
output_dir="./news_results",
evaluation_strategy="epoch",
learning_rate=2e-5,
per_device_train_batch_size=4,
num_train_epochs=3,
weight_decay=0.01,
)
# Initialize Trainer
trainer = Trainer(
model=model,
args=training_args,
train_dataset=dataset,
)
# Fine-tune the model
trainer.train()
Exercise 5: Evaluate the Fine-Tuned Model
Task: Use the fine-tuned model to classify new articles into topics.
Solution:
# New articles
new_texts = [
"The president addressed the nation on economic reforms.",
"The basketball team secured a historic win last night."
]
# Tokenize and predict
for text in new_texts:
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)
prediction = torch.argmax(outputs.logits, dim=1).item()
label = ["Politics", "Sports", "Technology"][prediction]
print(f"Article: {text}\nPredicted Topic: {label}\n")
Expected Output:
Article: The president addressed the nation on economic reforms.
Predicted Topic: Politics
Article: The basketball team secured a historic win last night.
Predicted Topic: Sports
These exercises cover practical applications of sentiment analysis, NER, and text classification. You’ve learned how to use pre-trained models, fine-tune them for custom tasks, and apply them to real-world scenarios. By completing these tasks, you’re well-equipped to implement these powerful NLP techniques in your projects.