Code icon

The App is Under a Quick Maintenance

We apologize for the inconvenience. Please come back later

Menu iconMenu iconNLP con Transformers: fundamentos y aplicaciones principales
NLP con Transformers: fundamentos y aplicaciones principales

Chapter 6: Core NLP Applications

Practical Exercises for Chapter 6

These practical exercises will help you solidify your understanding of sentiment analysisnamed entity recognition (NER), and text classification. Each exercise includes a solution with detailed code examples for hands-on practice.

Exercise 1: Sentiment Analysis

Task: Perform sentiment analysis on a set of product reviews using a pre-trained BERT model.

Solution:

from transformers import pipeline

# Load sentiment analysis pipeline
sentiment_analyzer = pipeline("sentiment-analysis")

# Product reviews
reviews = [
    "This product is amazing! Highly recommend.",
    "It was a complete waste of money.",
    "The product is okay, but not worth the price."
]

# Analyze sentiment
results = sentiment_analyzer(reviews)

# Display results
print("Sentiment Analysis Results:")
for review, result in zip(reviews, results):
    print(f"Review: {review}")
    print(f"Sentiment: {result['label']}, Score: {result['score']:.2f}\n")

Expected Output:

Sentiment Analysis Results:
Review: This product is amazing! Highly recommend.
Sentiment: POSITIVE, Score: 0.99

Review: It was a complete waste of money.
Sentiment: NEGATIVE, Score: 0.97

Review: The product is okay, but not worth the price.
Sentiment: NEUTRAL, Score: 0.75

Exercise 2: Named Entity Recognition

Task: Identify named entities in a legal document using a pre-trained NER model.

Solution:

from transformers import pipeline

# Load NER pipeline
ner_pipeline = pipeline("ner", grouped_entities=True)

# Legal text
text = "The contract was signed by John Doe on January 15, 2023, in New York City."

# Perform NER
results = ner_pipeline(text)

# Display results
print("Named Entities:")
for entity in results:
    print(f"Entity: {entity['word']}, Type: {entity['entity_group']}, Score: {entity['score']:.2f}")

Expected Output:

Named Entities:
Entity: John Doe, Type: PER, Score: 0.99
Entity: January 15, 2023, Type: DATE, Score: 0.98
Entity: New York City, Type: LOC, Score: 0.97

Exercise 3: Custom Text Classification

Task: Classify customer support queries into categories like "Billing", "Technical Issue", or "General Inquiry."

Solution:

from transformers import pipeline

# Load text classification pipeline
classifier = pipeline("text-classification", model="distilbert-base-uncased")

# Customer support queries
queries = [
    "I need help with my billing statement.",
    "The app crashes whenever I open it.",
    "Can you tell me about your subscription plans?"
]

# Perform classification
results = classifier(queries)

# Display results
print("Classification Results:")
for query, result in zip(queries, results):
    print(f"Query: {query}")
    print(f"Label: {result['label']}, Score: {result['score']:.2f}\n")

Exercise 4: Fine-Tune a Transformer for Text Classification

Task: Fine-tune a BERT model to classify news articles into topics like "Politics", "Sports", and "Technology."

Solution:

from transformers import BertTokenizer, BertForSequenceClassification, Trainer, TrainingArguments
from torch.utils.data import Dataset

# Define a custom dataset class
class NewsDataset(Dataset):
    def __init__(self, texts, labels, tokenizer, max_length=128):
        self.texts = texts
        self.labels = labels
        self.tokenizer = tokenizer
        self.max_length = max_length

    def __len__(self):
        return len(self.texts)

    def __getitem__(self, idx):
        text = self.texts[idx]
        label = self.labels[idx]
        encoding = self.tokenizer(
            text, truncation=True, padding="max_length", max_length=self.max_length, return_tensors="pt"
        )
        return {key: val.squeeze(0) for key, val in encoding.items()}, label

# Example data
texts = [
    "The government passed a new law today.",
    "The local team won the championship game!",
    "New advancements in AI are transforming technology."
]
labels = [0, 1, 2]  # 0: Politics, 1: Sports, 2: Technology

# Load tokenizer and model
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
model = BertForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=3)

# Prepare dataset
dataset = NewsDataset(texts, labels, tokenizer)

# Define training arguments
training_args = TrainingArguments(
    output_dir="./news_results",
    evaluation_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=4,
    num_train_epochs=3,
    weight_decay=0.01,
)

# Initialize Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=dataset,
)

# Fine-tune the model
trainer.train()

Exercise 5: Evaluate the Fine-Tuned Model

Task: Use the fine-tuned model to classify new articles into topics.

Solution:

# New articles
new_texts = [
    "The president addressed the nation on economic reforms.",
    "The basketball team secured a historic win last night."
]

# Tokenize and predict
for text in new_texts:
    inputs = tokenizer(text, return_tensors="pt")
    outputs = model(**inputs)
    prediction = torch.argmax(outputs.logits, dim=1).item()
    label = ["Politics", "Sports", "Technology"][prediction]
    print(f"Article: {text}\nPredicted Topic: {label}\n")

Expected Output:

Article: The president addressed the nation on economic reforms.
Predicted Topic: Politics

Article: The basketball team secured a historic win last night.
Predicted Topic: Sports

These exercises cover practical applications of sentiment analysis, NER, and text classification. You’ve learned how to use pre-trained models, fine-tune them for custom tasks, and apply them to real-world scenarios. By completing these tasks, you’re well-equipped to implement these powerful NLP techniques in your projects.

Practical Exercises for Chapter 6

These practical exercises will help you solidify your understanding of sentiment analysisnamed entity recognition (NER), and text classification. Each exercise includes a solution with detailed code examples for hands-on practice.

Exercise 1: Sentiment Analysis

Task: Perform sentiment analysis on a set of product reviews using a pre-trained BERT model.

Solution:

from transformers import pipeline

# Load sentiment analysis pipeline
sentiment_analyzer = pipeline("sentiment-analysis")

# Product reviews
reviews = [
    "This product is amazing! Highly recommend.",
    "It was a complete waste of money.",
    "The product is okay, but not worth the price."
]

# Analyze sentiment
results = sentiment_analyzer(reviews)

# Display results
print("Sentiment Analysis Results:")
for review, result in zip(reviews, results):
    print(f"Review: {review}")
    print(f"Sentiment: {result['label']}, Score: {result['score']:.2f}\n")

Expected Output:

Sentiment Analysis Results:
Review: This product is amazing! Highly recommend.
Sentiment: POSITIVE, Score: 0.99

Review: It was a complete waste of money.
Sentiment: NEGATIVE, Score: 0.97

Review: The product is okay, but not worth the price.
Sentiment: NEUTRAL, Score: 0.75

Exercise 2: Named Entity Recognition

Task: Identify named entities in a legal document using a pre-trained NER model.

Solution:

from transformers import pipeline

# Load NER pipeline
ner_pipeline = pipeline("ner", grouped_entities=True)

# Legal text
text = "The contract was signed by John Doe on January 15, 2023, in New York City."

# Perform NER
results = ner_pipeline(text)

# Display results
print("Named Entities:")
for entity in results:
    print(f"Entity: {entity['word']}, Type: {entity['entity_group']}, Score: {entity['score']:.2f}")

Expected Output:

Named Entities:
Entity: John Doe, Type: PER, Score: 0.99
Entity: January 15, 2023, Type: DATE, Score: 0.98
Entity: New York City, Type: LOC, Score: 0.97

Exercise 3: Custom Text Classification

Task: Classify customer support queries into categories like "Billing", "Technical Issue", or "General Inquiry."

Solution:

from transformers import pipeline

# Load text classification pipeline
classifier = pipeline("text-classification", model="distilbert-base-uncased")

# Customer support queries
queries = [
    "I need help with my billing statement.",
    "The app crashes whenever I open it.",
    "Can you tell me about your subscription plans?"
]

# Perform classification
results = classifier(queries)

# Display results
print("Classification Results:")
for query, result in zip(queries, results):
    print(f"Query: {query}")
    print(f"Label: {result['label']}, Score: {result['score']:.2f}\n")

Exercise 4: Fine-Tune a Transformer for Text Classification

Task: Fine-tune a BERT model to classify news articles into topics like "Politics", "Sports", and "Technology."

Solution:

from transformers import BertTokenizer, BertForSequenceClassification, Trainer, TrainingArguments
from torch.utils.data import Dataset

# Define a custom dataset class
class NewsDataset(Dataset):
    def __init__(self, texts, labels, tokenizer, max_length=128):
        self.texts = texts
        self.labels = labels
        self.tokenizer = tokenizer
        self.max_length = max_length

    def __len__(self):
        return len(self.texts)

    def __getitem__(self, idx):
        text = self.texts[idx]
        label = self.labels[idx]
        encoding = self.tokenizer(
            text, truncation=True, padding="max_length", max_length=self.max_length, return_tensors="pt"
        )
        return {key: val.squeeze(0) for key, val in encoding.items()}, label

# Example data
texts = [
    "The government passed a new law today.",
    "The local team won the championship game!",
    "New advancements in AI are transforming technology."
]
labels = [0, 1, 2]  # 0: Politics, 1: Sports, 2: Technology

# Load tokenizer and model
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
model = BertForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=3)

# Prepare dataset
dataset = NewsDataset(texts, labels, tokenizer)

# Define training arguments
training_args = TrainingArguments(
    output_dir="./news_results",
    evaluation_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=4,
    num_train_epochs=3,
    weight_decay=0.01,
)

# Initialize Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=dataset,
)

# Fine-tune the model
trainer.train()

Exercise 5: Evaluate the Fine-Tuned Model

Task: Use the fine-tuned model to classify new articles into topics.

Solution:

# New articles
new_texts = [
    "The president addressed the nation on economic reforms.",
    "The basketball team secured a historic win last night."
]

# Tokenize and predict
for text in new_texts:
    inputs = tokenizer(text, return_tensors="pt")
    outputs = model(**inputs)
    prediction = torch.argmax(outputs.logits, dim=1).item()
    label = ["Politics", "Sports", "Technology"][prediction]
    print(f"Article: {text}\nPredicted Topic: {label}\n")

Expected Output:

Article: The president addressed the nation on economic reforms.
Predicted Topic: Politics

Article: The basketball team secured a historic win last night.
Predicted Topic: Sports

These exercises cover practical applications of sentiment analysis, NER, and text classification. You’ve learned how to use pre-trained models, fine-tune them for custom tasks, and apply them to real-world scenarios. By completing these tasks, you’re well-equipped to implement these powerful NLP techniques in your projects.

Practical Exercises for Chapter 6

These practical exercises will help you solidify your understanding of sentiment analysisnamed entity recognition (NER), and text classification. Each exercise includes a solution with detailed code examples for hands-on practice.

Exercise 1: Sentiment Analysis

Task: Perform sentiment analysis on a set of product reviews using a pre-trained BERT model.

Solution:

from transformers import pipeline

# Load sentiment analysis pipeline
sentiment_analyzer = pipeline("sentiment-analysis")

# Product reviews
reviews = [
    "This product is amazing! Highly recommend.",
    "It was a complete waste of money.",
    "The product is okay, but not worth the price."
]

# Analyze sentiment
results = sentiment_analyzer(reviews)

# Display results
print("Sentiment Analysis Results:")
for review, result in zip(reviews, results):
    print(f"Review: {review}")
    print(f"Sentiment: {result['label']}, Score: {result['score']:.2f}\n")

Expected Output:

Sentiment Analysis Results:
Review: This product is amazing! Highly recommend.
Sentiment: POSITIVE, Score: 0.99

Review: It was a complete waste of money.
Sentiment: NEGATIVE, Score: 0.97

Review: The product is okay, but not worth the price.
Sentiment: NEUTRAL, Score: 0.75

Exercise 2: Named Entity Recognition

Task: Identify named entities in a legal document using a pre-trained NER model.

Solution:

from transformers import pipeline

# Load NER pipeline
ner_pipeline = pipeline("ner", grouped_entities=True)

# Legal text
text = "The contract was signed by John Doe on January 15, 2023, in New York City."

# Perform NER
results = ner_pipeline(text)

# Display results
print("Named Entities:")
for entity in results:
    print(f"Entity: {entity['word']}, Type: {entity['entity_group']}, Score: {entity['score']:.2f}")

Expected Output:

Named Entities:
Entity: John Doe, Type: PER, Score: 0.99
Entity: January 15, 2023, Type: DATE, Score: 0.98
Entity: New York City, Type: LOC, Score: 0.97

Exercise 3: Custom Text Classification

Task: Classify customer support queries into categories like "Billing", "Technical Issue", or "General Inquiry."

Solution:

from transformers import pipeline

# Load text classification pipeline
classifier = pipeline("text-classification", model="distilbert-base-uncased")

# Customer support queries
queries = [
    "I need help with my billing statement.",
    "The app crashes whenever I open it.",
    "Can you tell me about your subscription plans?"
]

# Perform classification
results = classifier(queries)

# Display results
print("Classification Results:")
for query, result in zip(queries, results):
    print(f"Query: {query}")
    print(f"Label: {result['label']}, Score: {result['score']:.2f}\n")

Exercise 4: Fine-Tune a Transformer for Text Classification

Task: Fine-tune a BERT model to classify news articles into topics like "Politics", "Sports", and "Technology."

Solution:

from transformers import BertTokenizer, BertForSequenceClassification, Trainer, TrainingArguments
from torch.utils.data import Dataset

# Define a custom dataset class
class NewsDataset(Dataset):
    def __init__(self, texts, labels, tokenizer, max_length=128):
        self.texts = texts
        self.labels = labels
        self.tokenizer = tokenizer
        self.max_length = max_length

    def __len__(self):
        return len(self.texts)

    def __getitem__(self, idx):
        text = self.texts[idx]
        label = self.labels[idx]
        encoding = self.tokenizer(
            text, truncation=True, padding="max_length", max_length=self.max_length, return_tensors="pt"
        )
        return {key: val.squeeze(0) for key, val in encoding.items()}, label

# Example data
texts = [
    "The government passed a new law today.",
    "The local team won the championship game!",
    "New advancements in AI are transforming technology."
]
labels = [0, 1, 2]  # 0: Politics, 1: Sports, 2: Technology

# Load tokenizer and model
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
model = BertForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=3)

# Prepare dataset
dataset = NewsDataset(texts, labels, tokenizer)

# Define training arguments
training_args = TrainingArguments(
    output_dir="./news_results",
    evaluation_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=4,
    num_train_epochs=3,
    weight_decay=0.01,
)

# Initialize Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=dataset,
)

# Fine-tune the model
trainer.train()

Exercise 5: Evaluate the Fine-Tuned Model

Task: Use the fine-tuned model to classify new articles into topics.

Solution:

# New articles
new_texts = [
    "The president addressed the nation on economic reforms.",
    "The basketball team secured a historic win last night."
]

# Tokenize and predict
for text in new_texts:
    inputs = tokenizer(text, return_tensors="pt")
    outputs = model(**inputs)
    prediction = torch.argmax(outputs.logits, dim=1).item()
    label = ["Politics", "Sports", "Technology"][prediction]
    print(f"Article: {text}\nPredicted Topic: {label}\n")

Expected Output:

Article: The president addressed the nation on economic reforms.
Predicted Topic: Politics

Article: The basketball team secured a historic win last night.
Predicted Topic: Sports

These exercises cover practical applications of sentiment analysis, NER, and text classification. You’ve learned how to use pre-trained models, fine-tune them for custom tasks, and apply them to real-world scenarios. By completing these tasks, you’re well-equipped to implement these powerful NLP techniques in your projects.

Practical Exercises for Chapter 6

These practical exercises will help you solidify your understanding of sentiment analysisnamed entity recognition (NER), and text classification. Each exercise includes a solution with detailed code examples for hands-on practice.

Exercise 1: Sentiment Analysis

Task: Perform sentiment analysis on a set of product reviews using a pre-trained BERT model.

Solution:

from transformers import pipeline

# Load sentiment analysis pipeline
sentiment_analyzer = pipeline("sentiment-analysis")

# Product reviews
reviews = [
    "This product is amazing! Highly recommend.",
    "It was a complete waste of money.",
    "The product is okay, but not worth the price."
]

# Analyze sentiment
results = sentiment_analyzer(reviews)

# Display results
print("Sentiment Analysis Results:")
for review, result in zip(reviews, results):
    print(f"Review: {review}")
    print(f"Sentiment: {result['label']}, Score: {result['score']:.2f}\n")

Expected Output:

Sentiment Analysis Results:
Review: This product is amazing! Highly recommend.
Sentiment: POSITIVE, Score: 0.99

Review: It was a complete waste of money.
Sentiment: NEGATIVE, Score: 0.97

Review: The product is okay, but not worth the price.
Sentiment: NEUTRAL, Score: 0.75

Exercise 2: Named Entity Recognition

Task: Identify named entities in a legal document using a pre-trained NER model.

Solution:

from transformers import pipeline

# Load NER pipeline
ner_pipeline = pipeline("ner", grouped_entities=True)

# Legal text
text = "The contract was signed by John Doe on January 15, 2023, in New York City."

# Perform NER
results = ner_pipeline(text)

# Display results
print("Named Entities:")
for entity in results:
    print(f"Entity: {entity['word']}, Type: {entity['entity_group']}, Score: {entity['score']:.2f}")

Expected Output:

Named Entities:
Entity: John Doe, Type: PER, Score: 0.99
Entity: January 15, 2023, Type: DATE, Score: 0.98
Entity: New York City, Type: LOC, Score: 0.97

Exercise 3: Custom Text Classification

Task: Classify customer support queries into categories like "Billing", "Technical Issue", or "General Inquiry."

Solution:

from transformers import pipeline

# Load text classification pipeline
classifier = pipeline("text-classification", model="distilbert-base-uncased")

# Customer support queries
queries = [
    "I need help with my billing statement.",
    "The app crashes whenever I open it.",
    "Can you tell me about your subscription plans?"
]

# Perform classification
results = classifier(queries)

# Display results
print("Classification Results:")
for query, result in zip(queries, results):
    print(f"Query: {query}")
    print(f"Label: {result['label']}, Score: {result['score']:.2f}\n")

Exercise 4: Fine-Tune a Transformer for Text Classification

Task: Fine-tune a BERT model to classify news articles into topics like "Politics", "Sports", and "Technology."

Solution:

from transformers import BertTokenizer, BertForSequenceClassification, Trainer, TrainingArguments
from torch.utils.data import Dataset

# Define a custom dataset class
class NewsDataset(Dataset):
    def __init__(self, texts, labels, tokenizer, max_length=128):
        self.texts = texts
        self.labels = labels
        self.tokenizer = tokenizer
        self.max_length = max_length

    def __len__(self):
        return len(self.texts)

    def __getitem__(self, idx):
        text = self.texts[idx]
        label = self.labels[idx]
        encoding = self.tokenizer(
            text, truncation=True, padding="max_length", max_length=self.max_length, return_tensors="pt"
        )
        return {key: val.squeeze(0) for key, val in encoding.items()}, label

# Example data
texts = [
    "The government passed a new law today.",
    "The local team won the championship game!",
    "New advancements in AI are transforming technology."
]
labels = [0, 1, 2]  # 0: Politics, 1: Sports, 2: Technology

# Load tokenizer and model
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
model = BertForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=3)

# Prepare dataset
dataset = NewsDataset(texts, labels, tokenizer)

# Define training arguments
training_args = TrainingArguments(
    output_dir="./news_results",
    evaluation_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=4,
    num_train_epochs=3,
    weight_decay=0.01,
)

# Initialize Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=dataset,
)

# Fine-tune the model
trainer.train()

Exercise 5: Evaluate the Fine-Tuned Model

Task: Use the fine-tuned model to classify new articles into topics.

Solution:

# New articles
new_texts = [
    "The president addressed the nation on economic reforms.",
    "The basketball team secured a historic win last night."
]

# Tokenize and predict
for text in new_texts:
    inputs = tokenizer(text, return_tensors="pt")
    outputs = model(**inputs)
    prediction = torch.argmax(outputs.logits, dim=1).item()
    label = ["Politics", "Sports", "Technology"][prediction]
    print(f"Article: {text}\nPredicted Topic: {label}\n")

Expected Output:

Article: The president addressed the nation on economic reforms.
Predicted Topic: Politics

Article: The basketball team secured a historic win last night.
Predicted Topic: Sports

These exercises cover practical applications of sentiment analysis, NER, and text classification. You’ve learned how to use pre-trained models, fine-tune them for custom tasks, and apply them to real-world scenarios. By completing these tasks, you’re well-equipped to implement these powerful NLP techniques in your projects.