Capítulo 6: Aplicaciones centrales de PLN
Ejercicios Prácticos para el Capítulo 6
Estos ejercicios prácticos te ayudarán a consolidar tu comprensión del análisis de sentimiento, reconocimiento de entidades nombradas (NER) y clasificación de texto. Cada ejercicio incluye una solución con ejemplos detallados de código para práctica práctica.
Ejercicio 1: Análisis de Sentimiento
Tarea: Realizar análisis de sentimiento en un conjunto de reseñas de productos utilizando un modelo BERT pre-entrenado.
Solución:
from transformers import pipeline
# Load sentiment analysis pipeline
sentiment_analyzer = pipeline("sentiment-analysis")
# Product reviews
reviews = [
"This product is amazing! Highly recommend.",
"It was a complete waste of money.",
"The product is okay, but not worth the price."
]
# Analyze sentiment
results = sentiment_analyzer(reviews)
# Display results
print("Sentiment Analysis Results:")
for review, result in zip(reviews, results):
print(f"Review: {review}")
print(f"Sentiment: {result['label']}, Score: {result['score']:.2f}\n")
Salida esperada:
Sentiment Analysis Results:
Review: This product is amazing! Highly recommend.
Sentiment: POSITIVE, Score: 0.99
Review: It was a complete waste of money.
Sentiment: NEGATIVE, Score: 0.97
Review: The product is okay, but not worth the price.
Sentiment: NEUTRAL, Score: 0.75
Ejercicio 2: Reconocimiento de Entidades Nombradas
Tarea: Identificar entidades nombradas en un documento legal utilizando un modelo NER pre-entrenado.
Solución:
from transformers import pipeline
# Load NER pipeline
ner_pipeline = pipeline("ner", grouped_entities=True)
# Legal text
text = "The contract was signed by John Doe on January 15, 2023, in New York City."
# Perform NER
results = ner_pipeline(text)
# Display results
print("Named Entities:")
for entity in results:
print(f"Entity: {entity['word']}, Type: {entity['entity_group']}, Score: {entity['score']:.2f}")
Salida esperada:
Named Entities:
Entity: John Doe, Type: PER, Score: 0.99
Entity: January 15, 2023, Type: DATE, Score: 0.98
Entity: New York City, Type: LOC, Score: 0.97
Ejercicio 3: Clasificación de Texto Personalizada
Tarea: Clasificar consultas de atención al cliente en categorías como "Facturación", "Problema Técnico" o "Consulta General".
Solución:
from transformers import pipeline
# Load text classification pipeline
classifier = pipeline("text-classification", model="distilbert-base-uncased")
# Customer support queries
queries = [
"I need help with my billing statement.",
"The app crashes whenever I open it.",
"Can you tell me about your subscription plans?"
]
# Perform classification
results = classifier(queries)
# Display results
print("Classification Results:")
for query, result in zip(queries, results):
print(f"Query: {query}")
print(f"Label: {result['label']}, Score: {result['score']:.2f}\n")
Ejercicio 4: Ajuste Fino de un Transformer para Clasificación de Texto
Tarea: Realizar el ajuste fino de un modelo BERT para clasificar artículos de noticias en temas como "Política", "Deportes" y "Tecnología".
Solución:
from transformers import BertTokenizer, BertForSequenceClassification, Trainer, TrainingArguments
from torch.utils.data import Dataset
# Define a custom dataset class
class NewsDataset(Dataset):
def __init__(self, texts, labels, tokenizer, max_length=128):
self.texts = texts
self.labels = labels
self.tokenizer = tokenizer
self.max_length = max_length
def __len__(self):
return len(self.texts)
def __getitem__(self, idx):
text = self.texts[idx]
label = self.labels[idx]
encoding = self.tokenizer(
text, truncation=True, padding="max_length", max_length=self.max_length, return_tensors="pt"
)
return {key: val.squeeze(0) for key, val in encoding.items()}, label
# Example data
texts = [
"The government passed a new law today.",
"The local team won the championship game!",
"New advancements in AI are transforming technology."
]
labels = [0, 1, 2] # 0: Politics, 1: Sports, 2: Technology
# Load tokenizer and model
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
model = BertForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=3)
# Prepare dataset
dataset = NewsDataset(texts, labels, tokenizer)
# Define training arguments
training_args = TrainingArguments(
output_dir="./news_results",
evaluation_strategy="epoch",
learning_rate=2e-5,
per_device_train_batch_size=4,
num_train_epochs=3,
weight_decay=0.01,
)
# Initialize Trainer
trainer = Trainer(
model=model,
args=training_args,
train_dataset=dataset,
)
# Fine-tune the model
trainer.train()
Ejercicio 5: Evaluar el Modelo Ajustado
Tarea: Utilizar el modelo ajustado para clasificar nuevos artículos en temas.
Solución:
# New articles
new_texts = [
"The president addressed the nation on economic reforms.",
"The basketball team secured a historic win last night."
]
# Tokenize and predict
for text in new_texts:
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)
prediction = torch.argmax(outputs.logits, dim=1).item()
label = ["Politics", "Sports", "Technology"][prediction]
print(f"Article: {text}\nPredicted Topic: {label}\n")
Salida esperada:
Article: The president addressed the nation on economic reforms.
Predicted Topic: Politics
Article: The basketball team secured a historic win last night.
Predicted Topic: Sports
Estos ejercicios cubren aplicaciones prácticas de análisis de sentimiento, NER y clasificación de texto. Has aprendido a utilizar modelos pre-entrenados, ajustarlos para tareas específicas y aplicarlos a escenarios del mundo real. Al completar estas tareas, estás bien preparado para implementar estas potentes técnicas de PLN en tus proyectos.
Ejercicios Prácticos para el Capítulo 6
Estos ejercicios prácticos te ayudarán a consolidar tu comprensión del análisis de sentimiento, reconocimiento de entidades nombradas (NER) y clasificación de texto. Cada ejercicio incluye una solución con ejemplos detallados de código para práctica práctica.
Ejercicio 1: Análisis de Sentimiento
Tarea: Realizar análisis de sentimiento en un conjunto de reseñas de productos utilizando un modelo BERT pre-entrenado.
Solución:
from transformers import pipeline
# Load sentiment analysis pipeline
sentiment_analyzer = pipeline("sentiment-analysis")
# Product reviews
reviews = [
"This product is amazing! Highly recommend.",
"It was a complete waste of money.",
"The product is okay, but not worth the price."
]
# Analyze sentiment
results = sentiment_analyzer(reviews)
# Display results
print("Sentiment Analysis Results:")
for review, result in zip(reviews, results):
print(f"Review: {review}")
print(f"Sentiment: {result['label']}, Score: {result['score']:.2f}\n")
Salida esperada:
Sentiment Analysis Results:
Review: This product is amazing! Highly recommend.
Sentiment: POSITIVE, Score: 0.99
Review: It was a complete waste of money.
Sentiment: NEGATIVE, Score: 0.97
Review: The product is okay, but not worth the price.
Sentiment: NEUTRAL, Score: 0.75
Ejercicio 2: Reconocimiento de Entidades Nombradas
Tarea: Identificar entidades nombradas en un documento legal utilizando un modelo NER pre-entrenado.
Solución:
from transformers import pipeline
# Load NER pipeline
ner_pipeline = pipeline("ner", grouped_entities=True)
# Legal text
text = "The contract was signed by John Doe on January 15, 2023, in New York City."
# Perform NER
results = ner_pipeline(text)
# Display results
print("Named Entities:")
for entity in results:
print(f"Entity: {entity['word']}, Type: {entity['entity_group']}, Score: {entity['score']:.2f}")
Salida esperada:
Named Entities:
Entity: John Doe, Type: PER, Score: 0.99
Entity: January 15, 2023, Type: DATE, Score: 0.98
Entity: New York City, Type: LOC, Score: 0.97
Ejercicio 3: Clasificación de Texto Personalizada
Tarea: Clasificar consultas de atención al cliente en categorías como "Facturación", "Problema Técnico" o "Consulta General".
Solución:
from transformers import pipeline
# Load text classification pipeline
classifier = pipeline("text-classification", model="distilbert-base-uncased")
# Customer support queries
queries = [
"I need help with my billing statement.",
"The app crashes whenever I open it.",
"Can you tell me about your subscription plans?"
]
# Perform classification
results = classifier(queries)
# Display results
print("Classification Results:")
for query, result in zip(queries, results):
print(f"Query: {query}")
print(f"Label: {result['label']}, Score: {result['score']:.2f}\n")
Ejercicio 4: Ajuste Fino de un Transformer para Clasificación de Texto
Tarea: Realizar el ajuste fino de un modelo BERT para clasificar artículos de noticias en temas como "Política", "Deportes" y "Tecnología".
Solución:
from transformers import BertTokenizer, BertForSequenceClassification, Trainer, TrainingArguments
from torch.utils.data import Dataset
# Define a custom dataset class
class NewsDataset(Dataset):
def __init__(self, texts, labels, tokenizer, max_length=128):
self.texts = texts
self.labels = labels
self.tokenizer = tokenizer
self.max_length = max_length
def __len__(self):
return len(self.texts)
def __getitem__(self, idx):
text = self.texts[idx]
label = self.labels[idx]
encoding = self.tokenizer(
text, truncation=True, padding="max_length", max_length=self.max_length, return_tensors="pt"
)
return {key: val.squeeze(0) for key, val in encoding.items()}, label
# Example data
texts = [
"The government passed a new law today.",
"The local team won the championship game!",
"New advancements in AI are transforming technology."
]
labels = [0, 1, 2] # 0: Politics, 1: Sports, 2: Technology
# Load tokenizer and model
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
model = BertForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=3)
# Prepare dataset
dataset = NewsDataset(texts, labels, tokenizer)
# Define training arguments
training_args = TrainingArguments(
output_dir="./news_results",
evaluation_strategy="epoch",
learning_rate=2e-5,
per_device_train_batch_size=4,
num_train_epochs=3,
weight_decay=0.01,
)
# Initialize Trainer
trainer = Trainer(
model=model,
args=training_args,
train_dataset=dataset,
)
# Fine-tune the model
trainer.train()
Ejercicio 5: Evaluar el Modelo Ajustado
Tarea: Utilizar el modelo ajustado para clasificar nuevos artículos en temas.
Solución:
# New articles
new_texts = [
"The president addressed the nation on economic reforms.",
"The basketball team secured a historic win last night."
]
# Tokenize and predict
for text in new_texts:
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)
prediction = torch.argmax(outputs.logits, dim=1).item()
label = ["Politics", "Sports", "Technology"][prediction]
print(f"Article: {text}\nPredicted Topic: {label}\n")
Salida esperada:
Article: The president addressed the nation on economic reforms.
Predicted Topic: Politics
Article: The basketball team secured a historic win last night.
Predicted Topic: Sports
Estos ejercicios cubren aplicaciones prácticas de análisis de sentimiento, NER y clasificación de texto. Has aprendido a utilizar modelos pre-entrenados, ajustarlos para tareas específicas y aplicarlos a escenarios del mundo real. Al completar estas tareas, estás bien preparado para implementar estas potentes técnicas de PLN en tus proyectos.
Ejercicios Prácticos para el Capítulo 6
Estos ejercicios prácticos te ayudarán a consolidar tu comprensión del análisis de sentimiento, reconocimiento de entidades nombradas (NER) y clasificación de texto. Cada ejercicio incluye una solución con ejemplos detallados de código para práctica práctica.
Ejercicio 1: Análisis de Sentimiento
Tarea: Realizar análisis de sentimiento en un conjunto de reseñas de productos utilizando un modelo BERT pre-entrenado.
Solución:
from transformers import pipeline
# Load sentiment analysis pipeline
sentiment_analyzer = pipeline("sentiment-analysis")
# Product reviews
reviews = [
"This product is amazing! Highly recommend.",
"It was a complete waste of money.",
"The product is okay, but not worth the price."
]
# Analyze sentiment
results = sentiment_analyzer(reviews)
# Display results
print("Sentiment Analysis Results:")
for review, result in zip(reviews, results):
print(f"Review: {review}")
print(f"Sentiment: {result['label']}, Score: {result['score']:.2f}\n")
Salida esperada:
Sentiment Analysis Results:
Review: This product is amazing! Highly recommend.
Sentiment: POSITIVE, Score: 0.99
Review: It was a complete waste of money.
Sentiment: NEGATIVE, Score: 0.97
Review: The product is okay, but not worth the price.
Sentiment: NEUTRAL, Score: 0.75
Ejercicio 2: Reconocimiento de Entidades Nombradas
Tarea: Identificar entidades nombradas en un documento legal utilizando un modelo NER pre-entrenado.
Solución:
from transformers import pipeline
# Load NER pipeline
ner_pipeline = pipeline("ner", grouped_entities=True)
# Legal text
text = "The contract was signed by John Doe on January 15, 2023, in New York City."
# Perform NER
results = ner_pipeline(text)
# Display results
print("Named Entities:")
for entity in results:
print(f"Entity: {entity['word']}, Type: {entity['entity_group']}, Score: {entity['score']:.2f}")
Salida esperada:
Named Entities:
Entity: John Doe, Type: PER, Score: 0.99
Entity: January 15, 2023, Type: DATE, Score: 0.98
Entity: New York City, Type: LOC, Score: 0.97
Ejercicio 3: Clasificación de Texto Personalizada
Tarea: Clasificar consultas de atención al cliente en categorías como "Facturación", "Problema Técnico" o "Consulta General".
Solución:
from transformers import pipeline
# Load text classification pipeline
classifier = pipeline("text-classification", model="distilbert-base-uncased")
# Customer support queries
queries = [
"I need help with my billing statement.",
"The app crashes whenever I open it.",
"Can you tell me about your subscription plans?"
]
# Perform classification
results = classifier(queries)
# Display results
print("Classification Results:")
for query, result in zip(queries, results):
print(f"Query: {query}")
print(f"Label: {result['label']}, Score: {result['score']:.2f}\n")
Ejercicio 4: Ajuste Fino de un Transformer para Clasificación de Texto
Tarea: Realizar el ajuste fino de un modelo BERT para clasificar artículos de noticias en temas como "Política", "Deportes" y "Tecnología".
Solución:
from transformers import BertTokenizer, BertForSequenceClassification, Trainer, TrainingArguments
from torch.utils.data import Dataset
# Define a custom dataset class
class NewsDataset(Dataset):
def __init__(self, texts, labels, tokenizer, max_length=128):
self.texts = texts
self.labels = labels
self.tokenizer = tokenizer
self.max_length = max_length
def __len__(self):
return len(self.texts)
def __getitem__(self, idx):
text = self.texts[idx]
label = self.labels[idx]
encoding = self.tokenizer(
text, truncation=True, padding="max_length", max_length=self.max_length, return_tensors="pt"
)
return {key: val.squeeze(0) for key, val in encoding.items()}, label
# Example data
texts = [
"The government passed a new law today.",
"The local team won the championship game!",
"New advancements in AI are transforming technology."
]
labels = [0, 1, 2] # 0: Politics, 1: Sports, 2: Technology
# Load tokenizer and model
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
model = BertForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=3)
# Prepare dataset
dataset = NewsDataset(texts, labels, tokenizer)
# Define training arguments
training_args = TrainingArguments(
output_dir="./news_results",
evaluation_strategy="epoch",
learning_rate=2e-5,
per_device_train_batch_size=4,
num_train_epochs=3,
weight_decay=0.01,
)
# Initialize Trainer
trainer = Trainer(
model=model,
args=training_args,
train_dataset=dataset,
)
# Fine-tune the model
trainer.train()
Ejercicio 5: Evaluar el Modelo Ajustado
Tarea: Utilizar el modelo ajustado para clasificar nuevos artículos en temas.
Solución:
# New articles
new_texts = [
"The president addressed the nation on economic reforms.",
"The basketball team secured a historic win last night."
]
# Tokenize and predict
for text in new_texts:
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)
prediction = torch.argmax(outputs.logits, dim=1).item()
label = ["Politics", "Sports", "Technology"][prediction]
print(f"Article: {text}\nPredicted Topic: {label}\n")
Salida esperada:
Article: The president addressed the nation on economic reforms.
Predicted Topic: Politics
Article: The basketball team secured a historic win last night.
Predicted Topic: Sports
Estos ejercicios cubren aplicaciones prácticas de análisis de sentimiento, NER y clasificación de texto. Has aprendido a utilizar modelos pre-entrenados, ajustarlos para tareas específicas y aplicarlos a escenarios del mundo real. Al completar estas tareas, estás bien preparado para implementar estas potentes técnicas de PLN en tus proyectos.
Ejercicios Prácticos para el Capítulo 6
Estos ejercicios prácticos te ayudarán a consolidar tu comprensión del análisis de sentimiento, reconocimiento de entidades nombradas (NER) y clasificación de texto. Cada ejercicio incluye una solución con ejemplos detallados de código para práctica práctica.
Ejercicio 1: Análisis de Sentimiento
Tarea: Realizar análisis de sentimiento en un conjunto de reseñas de productos utilizando un modelo BERT pre-entrenado.
Solución:
from transformers import pipeline
# Load sentiment analysis pipeline
sentiment_analyzer = pipeline("sentiment-analysis")
# Product reviews
reviews = [
"This product is amazing! Highly recommend.",
"It was a complete waste of money.",
"The product is okay, but not worth the price."
]
# Analyze sentiment
results = sentiment_analyzer(reviews)
# Display results
print("Sentiment Analysis Results:")
for review, result in zip(reviews, results):
print(f"Review: {review}")
print(f"Sentiment: {result['label']}, Score: {result['score']:.2f}\n")
Salida esperada:
Sentiment Analysis Results:
Review: This product is amazing! Highly recommend.
Sentiment: POSITIVE, Score: 0.99
Review: It was a complete waste of money.
Sentiment: NEGATIVE, Score: 0.97
Review: The product is okay, but not worth the price.
Sentiment: NEUTRAL, Score: 0.75
Ejercicio 2: Reconocimiento de Entidades Nombradas
Tarea: Identificar entidades nombradas en un documento legal utilizando un modelo NER pre-entrenado.
Solución:
from transformers import pipeline
# Load NER pipeline
ner_pipeline = pipeline("ner", grouped_entities=True)
# Legal text
text = "The contract was signed by John Doe on January 15, 2023, in New York City."
# Perform NER
results = ner_pipeline(text)
# Display results
print("Named Entities:")
for entity in results:
print(f"Entity: {entity['word']}, Type: {entity['entity_group']}, Score: {entity['score']:.2f}")
Salida esperada:
Named Entities:
Entity: John Doe, Type: PER, Score: 0.99
Entity: January 15, 2023, Type: DATE, Score: 0.98
Entity: New York City, Type: LOC, Score: 0.97
Ejercicio 3: Clasificación de Texto Personalizada
Tarea: Clasificar consultas de atención al cliente en categorías como "Facturación", "Problema Técnico" o "Consulta General".
Solución:
from transformers import pipeline
# Load text classification pipeline
classifier = pipeline("text-classification", model="distilbert-base-uncased")
# Customer support queries
queries = [
"I need help with my billing statement.",
"The app crashes whenever I open it.",
"Can you tell me about your subscription plans?"
]
# Perform classification
results = classifier(queries)
# Display results
print("Classification Results:")
for query, result in zip(queries, results):
print(f"Query: {query}")
print(f"Label: {result['label']}, Score: {result['score']:.2f}\n")
Ejercicio 4: Ajuste Fino de un Transformer para Clasificación de Texto
Tarea: Realizar el ajuste fino de un modelo BERT para clasificar artículos de noticias en temas como "Política", "Deportes" y "Tecnología".
Solución:
from transformers import BertTokenizer, BertForSequenceClassification, Trainer, TrainingArguments
from torch.utils.data import Dataset
# Define a custom dataset class
class NewsDataset(Dataset):
def __init__(self, texts, labels, tokenizer, max_length=128):
self.texts = texts
self.labels = labels
self.tokenizer = tokenizer
self.max_length = max_length
def __len__(self):
return len(self.texts)
def __getitem__(self, idx):
text = self.texts[idx]
label = self.labels[idx]
encoding = self.tokenizer(
text, truncation=True, padding="max_length", max_length=self.max_length, return_tensors="pt"
)
return {key: val.squeeze(0) for key, val in encoding.items()}, label
# Example data
texts = [
"The government passed a new law today.",
"The local team won the championship game!",
"New advancements in AI are transforming technology."
]
labels = [0, 1, 2] # 0: Politics, 1: Sports, 2: Technology
# Load tokenizer and model
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
model = BertForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=3)
# Prepare dataset
dataset = NewsDataset(texts, labels, tokenizer)
# Define training arguments
training_args = TrainingArguments(
output_dir="./news_results",
evaluation_strategy="epoch",
learning_rate=2e-5,
per_device_train_batch_size=4,
num_train_epochs=3,
weight_decay=0.01,
)
# Initialize Trainer
trainer = Trainer(
model=model,
args=training_args,
train_dataset=dataset,
)
# Fine-tune the model
trainer.train()
Ejercicio 5: Evaluar el Modelo Ajustado
Tarea: Utilizar el modelo ajustado para clasificar nuevos artículos en temas.
Solución:
# New articles
new_texts = [
"The president addressed the nation on economic reforms.",
"The basketball team secured a historic win last night."
]
# Tokenize and predict
for text in new_texts:
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)
prediction = torch.argmax(outputs.logits, dim=1).item()
label = ["Politics", "Sports", "Technology"][prediction]
print(f"Article: {text}\nPredicted Topic: {label}\n")
Salida esperada:
Article: The president addressed the nation on economic reforms.
Predicted Topic: Politics
Article: The basketball team secured a historic win last night.
Predicted Topic: Sports
Estos ejercicios cubren aplicaciones prácticas de análisis de sentimiento, NER y clasificación de texto. Has aprendido a utilizar modelos pre-entrenados, ajustarlos para tareas específicas y aplicarlos a escenarios del mundo real. Al completar estas tareas, estás bien preparado para implementar estas potentes técnicas de PLN en tus proyectos.