Click here to view the next lesson.

Capítulo 6: Aplicaciones centrales de PLN

Ejercicios Prácticos para el Capítulo 6

Estos ejercicios prácticos te ayudarán a consolidar tu comprensión del análisis de sentimiento, reconocimiento de entidades nombradas (NER) y clasificación de texto. Cada ejercicio incluye una solución con ejemplos detallados de código para práctica práctica.

Ejercicio 1: Análisis de Sentimiento

Tarea: Realizar análisis de sentimiento en un conjunto de reseñas de productos utilizando un modelo BERT pre-entrenado.

Solución:

from transformers import pipeline

# Load sentiment analysis pipeline
sentiment_analyzer = pipeline("sentiment-analysis")

# Product reviews
reviews = [
    "This product is amazing! Highly recommend.",
    "It was a complete waste of money.",
    "The product is okay, but not worth the price."
]

# Analyze sentiment
results = sentiment_analyzer(reviews)

# Display results
print("Sentiment Analysis Results:")
for review, result in zip(reviews, results):
    print(f"Review: {review}")
    print(f"Sentiment: {result['label']}, Score: {result['score']:.2f}\n")

Salida esperada:

Sentiment Analysis Results:
Review: This product is amazing! Highly recommend.
Sentiment: POSITIVE, Score: 0.99

Review: It was a complete waste of money.
Sentiment: NEGATIVE, Score: 0.97

Review: The product is okay, but not worth the price.
Sentiment: NEUTRAL, Score: 0.75

Ejercicio 2: Reconocimiento de Entidades Nombradas

Tarea: Identificar entidades nombradas en un documento legal utilizando un modelo NER pre-entrenado.

Solución:

from transformers import pipeline

# Load NER pipeline
ner_pipeline = pipeline("ner", grouped_entities=True)

# Legal text
text = "The contract was signed by John Doe on January 15, 2023, in New York City."

# Perform NER
results = ner_pipeline(text)

# Display results
print("Named Entities:")
for entity in results:
    print(f"Entity: {entity['word']}, Type: {entity['entity_group']}, Score: {entity['score']:.2f}")

Salida esperada:

Named Entities:
Entity: John Doe, Type: PER, Score: 0.99
Entity: January 15, 2023, Type: DATE, Score: 0.98
Entity: New York City, Type: LOC, Score: 0.97

Ejercicio 3: Clasificación de Texto Personalizada

Tarea: Clasificar consultas de atención al cliente en categorías como "Facturación", "Problema Técnico" o "Consulta General".

Solución:

from transformers import pipeline

# Load text classification pipeline
classifier = pipeline("text-classification", model="distilbert-base-uncased")

# Customer support queries
queries = [
    "I need help with my billing statement.",
    "The app crashes whenever I open it.",
    "Can you tell me about your subscription plans?"
]

# Perform classification
results = classifier(queries)

# Display results
print("Classification Results:")
for query, result in zip(queries, results):
    print(f"Query: {query}")
    print(f"Label: {result['label']}, Score: {result['score']:.2f}\n")

Ejercicio 4: Ajuste Fino de un Transformer para Clasificación de Texto

Tarea: Realizar el ajuste fino de un modelo BERT para clasificar artículos de noticias en temas como "Política", "Deportes" y "Tecnología".

Solución:

from transformers import BertTokenizer, BertForSequenceClassification, Trainer, TrainingArguments
from torch.utils.data import Dataset

# Define a custom dataset class
class NewsDataset(Dataset):
    def __init__(self, texts, labels, tokenizer, max_length=128):
        self.texts = texts
        self.labels = labels
        self.tokenizer = tokenizer
        self.max_length = max_length

    def __len__(self):
        return len(self.texts)

    def __getitem__(self, idx):
        text = self.texts[idx]
        label = self.labels[idx]
        encoding = self.tokenizer(
            text, truncation=True, padding="max_length", max_length=self.max_length, return_tensors="pt"
        )
        return {key: val.squeeze(0) for key, val in encoding.items()}, label

# Example data
texts = [
    "The government passed a new law today.",
    "The local team won the championship game!",
    "New advancements in AI are transforming technology."
]
labels = [0, 1, 2]  # 0: Politics, 1: Sports, 2: Technology

# Load tokenizer and model
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
model = BertForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=3)

# Prepare dataset
dataset = NewsDataset(texts, labels, tokenizer)

# Define training arguments
training_args = TrainingArguments(
    output_dir="./news_results",
    evaluation_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=4,
    num_train_epochs=3,
    weight_decay=0.01,
)

# Initialize Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=dataset,
)

# Fine-tune the model
trainer.train()

Ejercicio 5: Evaluar el Modelo Ajustado

Tarea: Utilizar el modelo ajustado para clasificar nuevos artículos en temas.

Solución:

# New articles
new_texts = [
    "The president addressed the nation on economic reforms.",
    "The basketball team secured a historic win last night."
]

# Tokenize and predict
for text in new_texts:
    inputs = tokenizer(text, return_tensors="pt")
    outputs = model(**inputs)
    prediction = torch.argmax(outputs.logits, dim=1).item()
    label = ["Politics", "Sports", "Technology"][prediction]
    print(f"Article: {text}\nPredicted Topic: {label}\n")

Salida esperada:

Article: The president addressed the nation on economic reforms.
Predicted Topic: Politics

Article: The basketball team secured a historic win last night.
Predicted Topic: Sports

Estos ejercicios cubren aplicaciones prácticas de análisis de sentimiento, NER y clasificación de texto. Has aprendido a utilizar modelos pre-entrenados, ajustarlos para tareas específicas y aplicarlos a escenarios del mundo real. Al completar estas tareas, estás bien preparado para implementar estas potentes técnicas de PLN en tus proyectos.

Ejercicios Prácticos para el Capítulo 6

Ejercicio 1: Análisis de Sentimiento

Tarea: Realizar análisis de sentimiento en un conjunto de reseñas de productos utilizando un modelo BERT pre-entrenado.

Solución:

from transformers import pipeline

# Load sentiment analysis pipeline
sentiment_analyzer = pipeline("sentiment-analysis")

# Product reviews
reviews = [
    "This product is amazing! Highly recommend.",
    "It was a complete waste of money.",
    "The product is okay, but not worth the price."
]

# Analyze sentiment
results = sentiment_analyzer(reviews)

# Display results
print("Sentiment Analysis Results:")
for review, result in zip(reviews, results):
    print(f"Review: {review}")
    print(f"Sentiment: {result['label']}, Score: {result['score']:.2f}\n")

Salida esperada:

Sentiment Analysis Results:
Review: This product is amazing! Highly recommend.
Sentiment: POSITIVE, Score: 0.99

Review: It was a complete waste of money.
Sentiment: NEGATIVE, Score: 0.97

Review: The product is okay, but not worth the price.
Sentiment: NEUTRAL, Score: 0.75

Ejercicio 2: Reconocimiento de Entidades Nombradas

Tarea: Identificar entidades nombradas en un documento legal utilizando un modelo NER pre-entrenado.

Solución:

from transformers import pipeline

# Load NER pipeline
ner_pipeline = pipeline("ner", grouped_entities=True)

# Legal text
text = "The contract was signed by John Doe on January 15, 2023, in New York City."

# Perform NER
results = ner_pipeline(text)

# Display results
print("Named Entities:")
for entity in results:
    print(f"Entity: {entity['word']}, Type: {entity['entity_group']}, Score: {entity['score']:.2f}")

Salida esperada:

Named Entities:
Entity: John Doe, Type: PER, Score: 0.99
Entity: January 15, 2023, Type: DATE, Score: 0.98
Entity: New York City, Type: LOC, Score: 0.97

Ejercicio 3: Clasificación de Texto Personalizada

Tarea: Clasificar consultas de atención al cliente en categorías como "Facturación", "Problema Técnico" o "Consulta General".

Solución:

from transformers import pipeline

# Load text classification pipeline
classifier = pipeline("text-classification", model="distilbert-base-uncased")

# Customer support queries
queries = [
    "I need help with my billing statement.",
    "The app crashes whenever I open it.",
    "Can you tell me about your subscription plans?"
]

# Perform classification
results = classifier(queries)

# Display results
print("Classification Results:")
for query, result in zip(queries, results):
    print(f"Query: {query}")
    print(f"Label: {result['label']}, Score: {result['score']:.2f}\n")

Ejercicio 4: Ajuste Fino de un Transformer para Clasificación de Texto

Tarea: Realizar el ajuste fino de un modelo BERT para clasificar artículos de noticias en temas como "Política", "Deportes" y "Tecnología".

Solución:

from transformers import BertTokenizer, BertForSequenceClassification, Trainer, TrainingArguments
from torch.utils.data import Dataset

# Define a custom dataset class
class NewsDataset(Dataset):
    def __init__(self, texts, labels, tokenizer, max_length=128):
        self.texts = texts
        self.labels = labels
        self.tokenizer = tokenizer
        self.max_length = max_length

    def __len__(self):
        return len(self.texts)

    def __getitem__(self, idx):
        text = self.texts[idx]
        label = self.labels[idx]
        encoding = self.tokenizer(
            text, truncation=True, padding="max_length", max_length=self.max_length, return_tensors="pt"
        )
        return {key: val.squeeze(0) for key, val in encoding.items()}, label

# Example data
texts = [
    "The government passed a new law today.",
    "The local team won the championship game!",
    "New advancements in AI are transforming technology."
]
labels = [0, 1, 2]  # 0: Politics, 1: Sports, 2: Technology

# Load tokenizer and model
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
model = BertForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=3)

# Prepare dataset
dataset = NewsDataset(texts, labels, tokenizer)

# Define training arguments
training_args = TrainingArguments(
    output_dir="./news_results",
    evaluation_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=4,
    num_train_epochs=3,
    weight_decay=0.01,
)

# Initialize Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=dataset,
)

# Fine-tune the model
trainer.train()

Ejercicio 5: Evaluar el Modelo Ajustado

Tarea: Utilizar el modelo ajustado para clasificar nuevos artículos en temas.

Solución:

# New articles
new_texts = [
    "The president addressed the nation on economic reforms.",
    "The basketball team secured a historic win last night."
]

# Tokenize and predict
for text in new_texts:
    inputs = tokenizer(text, return_tensors="pt")
    outputs = model(**inputs)
    prediction = torch.argmax(outputs.logits, dim=1).item()
    label = ["Politics", "Sports", "Technology"][prediction]
    print(f"Article: {text}\nPredicted Topic: {label}\n")

Salida esperada:

Article: The president addressed the nation on economic reforms.
Predicted Topic: Politics

Article: The basketball team secured a historic win last night.
Predicted Topic: Sports

Ejercicios Prácticos para el Capítulo 6

Ejercicio 1: Análisis de Sentimiento

Tarea: Realizar análisis de sentimiento en un conjunto de reseñas de productos utilizando un modelo BERT pre-entrenado.

Solución:

from transformers import pipeline

# Load sentiment analysis pipeline
sentiment_analyzer = pipeline("sentiment-analysis")

# Product reviews
reviews = [
    "This product is amazing! Highly recommend.",
    "It was a complete waste of money.",
    "The product is okay, but not worth the price."
]

# Analyze sentiment
results = sentiment_analyzer(reviews)

# Display results
print("Sentiment Analysis Results:")
for review, result in zip(reviews, results):
    print(f"Review: {review}")
    print(f"Sentiment: {result['label']}, Score: {result['score']:.2f}\n")

Salida esperada:

Sentiment Analysis Results:
Review: This product is amazing! Highly recommend.
Sentiment: POSITIVE, Score: 0.99

Review: It was a complete waste of money.
Sentiment: NEGATIVE, Score: 0.97

Review: The product is okay, but not worth the price.
Sentiment: NEUTRAL, Score: 0.75

Ejercicio 2: Reconocimiento de Entidades Nombradas

Tarea: Identificar entidades nombradas en un documento legal utilizando un modelo NER pre-entrenado.

Solución:

from transformers import pipeline

# Load NER pipeline
ner_pipeline = pipeline("ner", grouped_entities=True)

# Legal text
text = "The contract was signed by John Doe on January 15, 2023, in New York City."

# Perform NER
results = ner_pipeline(text)

# Display results
print("Named Entities:")
for entity in results:
    print(f"Entity: {entity['word']}, Type: {entity['entity_group']}, Score: {entity['score']:.2f}")

Salida esperada:

Named Entities:
Entity: John Doe, Type: PER, Score: 0.99
Entity: January 15, 2023, Type: DATE, Score: 0.98
Entity: New York City, Type: LOC, Score: 0.97

Ejercicio 3: Clasificación de Texto Personalizada

Tarea: Clasificar consultas de atención al cliente en categorías como "Facturación", "Problema Técnico" o "Consulta General".

Solución:

from transformers import pipeline

# Load text classification pipeline
classifier = pipeline("text-classification", model="distilbert-base-uncased")

# Customer support queries
queries = [
    "I need help with my billing statement.",
    "The app crashes whenever I open it.",
    "Can you tell me about your subscription plans?"
]

# Perform classification
results = classifier(queries)

# Display results
print("Classification Results:")
for query, result in zip(queries, results):
    print(f"Query: {query}")
    print(f"Label: {result['label']}, Score: {result['score']:.2f}\n")

Ejercicio 4: Ajuste Fino de un Transformer para Clasificación de Texto

Tarea: Realizar el ajuste fino de un modelo BERT para clasificar artículos de noticias en temas como "Política", "Deportes" y "Tecnología".

Solución:

from transformers import BertTokenizer, BertForSequenceClassification, Trainer, TrainingArguments
from torch.utils.data import Dataset

# Define a custom dataset class
class NewsDataset(Dataset):
    def __init__(self, texts, labels, tokenizer, max_length=128):
        self.texts = texts
        self.labels = labels
        self.tokenizer = tokenizer
        self.max_length = max_length

    def __len__(self):
        return len(self.texts)

    def __getitem__(self, idx):
        text = self.texts[idx]
        label = self.labels[idx]
        encoding = self.tokenizer(
            text, truncation=True, padding="max_length", max_length=self.max_length, return_tensors="pt"
        )
        return {key: val.squeeze(0) for key, val in encoding.items()}, label

# Example data
texts = [
    "The government passed a new law today.",
    "The local team won the championship game!",
    "New advancements in AI are transforming technology."
]
labels = [0, 1, 2]  # 0: Politics, 1: Sports, 2: Technology

# Load tokenizer and model
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
model = BertForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=3)

# Prepare dataset
dataset = NewsDataset(texts, labels, tokenizer)

# Define training arguments
training_args = TrainingArguments(
    output_dir="./news_results",
    evaluation_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=4,
    num_train_epochs=3,
    weight_decay=0.01,
)

# Initialize Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=dataset,
)

# Fine-tune the model
trainer.train()

Ejercicio 5: Evaluar el Modelo Ajustado

Tarea: Utilizar el modelo ajustado para clasificar nuevos artículos en temas.

Solución:

# New articles
new_texts = [
    "The president addressed the nation on economic reforms.",
    "The basketball team secured a historic win last night."
]

# Tokenize and predict
for text in new_texts:
    inputs = tokenizer(text, return_tensors="pt")
    outputs = model(**inputs)
    prediction = torch.argmax(outputs.logits, dim=1).item()
    label = ["Politics", "Sports", "Technology"][prediction]
    print(f"Article: {text}\nPredicted Topic: {label}\n")

Salida esperada:

Article: The president addressed the nation on economic reforms.
Predicted Topic: Politics

Article: The basketball team secured a historic win last night.
Predicted Topic: Sports

Ejercicios Prácticos para el Capítulo 6

Ejercicio 1: Análisis de Sentimiento

Tarea: Realizar análisis de sentimiento en un conjunto de reseñas de productos utilizando un modelo BERT pre-entrenado.

Solución:

from transformers import pipeline

# Load sentiment analysis pipeline
sentiment_analyzer = pipeline("sentiment-analysis")

# Product reviews
reviews = [
    "This product is amazing! Highly recommend.",
    "It was a complete waste of money.",
    "The product is okay, but not worth the price."
]

# Analyze sentiment
results = sentiment_analyzer(reviews)

# Display results
print("Sentiment Analysis Results:")
for review, result in zip(reviews, results):
    print(f"Review: {review}")
    print(f"Sentiment: {result['label']}, Score: {result['score']:.2f}\n")

Salida esperada:

Sentiment Analysis Results:
Review: This product is amazing! Highly recommend.
Sentiment: POSITIVE, Score: 0.99

Review: It was a complete waste of money.
Sentiment: NEGATIVE, Score: 0.97

Review: The product is okay, but not worth the price.
Sentiment: NEUTRAL, Score: 0.75

Ejercicio 2: Reconocimiento de Entidades Nombradas

Tarea: Identificar entidades nombradas en un documento legal utilizando un modelo NER pre-entrenado.

Solución:

from transformers import pipeline

# Load NER pipeline
ner_pipeline = pipeline("ner", grouped_entities=True)

# Legal text
text = "The contract was signed by John Doe on January 15, 2023, in New York City."

# Perform NER
results = ner_pipeline(text)

# Display results
print("Named Entities:")
for entity in results:
    print(f"Entity: {entity['word']}, Type: {entity['entity_group']}, Score: {entity['score']:.2f}")

Salida esperada:

Named Entities:
Entity: John Doe, Type: PER, Score: 0.99
Entity: January 15, 2023, Type: DATE, Score: 0.98
Entity: New York City, Type: LOC, Score: 0.97

Ejercicio 3: Clasificación de Texto Personalizada

Tarea: Clasificar consultas de atención al cliente en categorías como "Facturación", "Problema Técnico" o "Consulta General".

Solución:

from transformers import pipeline

# Load text classification pipeline
classifier = pipeline("text-classification", model="distilbert-base-uncased")

# Customer support queries
queries = [
    "I need help with my billing statement.",
    "The app crashes whenever I open it.",
    "Can you tell me about your subscription plans?"
]

# Perform classification
results = classifier(queries)

# Display results
print("Classification Results:")
for query, result in zip(queries, results):
    print(f"Query: {query}")
    print(f"Label: {result['label']}, Score: {result['score']:.2f}\n")

Ejercicio 4: Ajuste Fino de un Transformer para Clasificación de Texto

Tarea: Realizar el ajuste fino de un modelo BERT para clasificar artículos de noticias en temas como "Política", "Deportes" y "Tecnología".

Solución:

from transformers import BertTokenizer, BertForSequenceClassification, Trainer, TrainingArguments
from torch.utils.data import Dataset

# Define a custom dataset class
class NewsDataset(Dataset):
    def __init__(self, texts, labels, tokenizer, max_length=128):
        self.texts = texts
        self.labels = labels
        self.tokenizer = tokenizer
        self.max_length = max_length

    def __len__(self):
        return len(self.texts)

    def __getitem__(self, idx):
        text = self.texts[idx]
        label = self.labels[idx]
        encoding = self.tokenizer(
            text, truncation=True, padding="max_length", max_length=self.max_length, return_tensors="pt"
        )
        return {key: val.squeeze(0) for key, val in encoding.items()}, label

# Example data
texts = [
    "The government passed a new law today.",
    "The local team won the championship game!",
    "New advancements in AI are transforming technology."
]
labels = [0, 1, 2]  # 0: Politics, 1: Sports, 2: Technology

# Load tokenizer and model
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
model = BertForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=3)

# Prepare dataset
dataset = NewsDataset(texts, labels, tokenizer)

# Define training arguments
training_args = TrainingArguments(
    output_dir="./news_results",
    evaluation_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=4,
    num_train_epochs=3,
    weight_decay=0.01,
)

# Initialize Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=dataset,
)

# Fine-tune the model
trainer.train()

Ejercicio 5: Evaluar el Modelo Ajustado

Tarea: Utilizar el modelo ajustado para clasificar nuevos artículos en temas.

Solución:

# New articles
new_texts = [
    "The president addressed the nation on economic reforms.",
    "The basketball team secured a historic win last night."
]

# Tokenize and predict
for text in new_texts:
    inputs = tokenizer(text, return_tensors="pt")
    outputs = model(**inputs)
    prediction = torch.argmax(outputs.logits, dim=1).item()
    label = ["Politics", "Sports", "Technology"][prediction]
    print(f"Article: {text}\nPredicted Topic: {label}\n")

Salida esperada:

Article: The president addressed the nation on economic reforms.
Predicted Topic: Politics

Article: The basketball team secured a historic win last night.
Predicted Topic: Sports

Compra este libro