Chapter 13: Advanced Topics
13.5 Practical Exercises of Chapter 13: Advanced Topics
Exercise 1: Transfer Learning
Experiment with different pre-trained models available in the Hugging Face Transformers library and fine-tune them for a text classification task of your choice.
Example:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased", num_labels=2)
inputs = tokenizer("I love this place!", return_tensors="pt")
outputs = model(**inputs)
print(outputs.logits)
Exercise 2: Natural Language Understanding
Using the spaCy library, extract entities, dependencies, and semantic roles from a corpus of text documents.
Example:
import spacy
nlp = spacy.load("en_core_web_sm")
doc = nlp("Apple is looking at buying U.K. startup for $1 billion")
for ent in doc.ents:
print(ent.text, ent.start_char, ent.end_char, ent.label_)
Exercise 3: Natural Language Generation
Use the GPT-2 model to generate a short story based on a prompt of your choice. Experiment with different prompts and observe how the model responds.
Example:
from transformers import pipeline
generator = pipeline('text-generation', model='gpt2')
text = generator("Once upon a time in a far away land,", max_length=100, do_sample=True)
print(text[0]['generated_text'])
Exercise 4: Advanced Transformer Models
Fine-tune BERT, RoBERTa, and GPT-2 models for a specific NLP task (like named entity recognition, sentiment analysis, or text classification). Compare their performance.
Example:
# Fine-tuning BERT for named entity recognition
from transformers import BertForTokenClassification
model = BertForTokenClassification.from_pretrained("bert-base-uncased", num_labels=3)
# Input needs to be tokenized and encoded
inputs = tokenizer(["I love Paris", "Apple is a great company"], return_tensors="pt", padding=True, truncation=True)
labels = torch.tensor([[1,1,2,0,0], [1,1,1,1,2]]).unsqueeze(0) # Batch size 1
outputs = model(**inputs, labels=labels)
loss = outputs.loss
logits = outputs.logits
print(loss, logits)
The key to mastering these advanced topics is to practice and experiment. Don't be afraid to try different things and make mistakes. That's how you learn and grow in the field of NLP.
Remember that these are very basic examples and real use-cases might require more extensive preprocessing, model configuration, and training procedures. Always refer to the official documentation of the libraries for more detailed information and best practices.
13.5 Practical Exercises of Chapter 13: Advanced Topics
Exercise 1: Transfer Learning
Experiment with different pre-trained models available in the Hugging Face Transformers library and fine-tune them for a text classification task of your choice.
Example:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased", num_labels=2)
inputs = tokenizer("I love this place!", return_tensors="pt")
outputs = model(**inputs)
print(outputs.logits)
Exercise 2: Natural Language Understanding
Using the spaCy library, extract entities, dependencies, and semantic roles from a corpus of text documents.
Example:
import spacy
nlp = spacy.load("en_core_web_sm")
doc = nlp("Apple is looking at buying U.K. startup for $1 billion")
for ent in doc.ents:
print(ent.text, ent.start_char, ent.end_char, ent.label_)
Exercise 3: Natural Language Generation
Use the GPT-2 model to generate a short story based on a prompt of your choice. Experiment with different prompts and observe how the model responds.
Example:
from transformers import pipeline
generator = pipeline('text-generation', model='gpt2')
text = generator("Once upon a time in a far away land,", max_length=100, do_sample=True)
print(text[0]['generated_text'])
Exercise 4: Advanced Transformer Models
Fine-tune BERT, RoBERTa, and GPT-2 models for a specific NLP task (like named entity recognition, sentiment analysis, or text classification). Compare their performance.
Example:
# Fine-tuning BERT for named entity recognition
from transformers import BertForTokenClassification
model = BertForTokenClassification.from_pretrained("bert-base-uncased", num_labels=3)
# Input needs to be tokenized and encoded
inputs = tokenizer(["I love Paris", "Apple is a great company"], return_tensors="pt", padding=True, truncation=True)
labels = torch.tensor([[1,1,2,0,0], [1,1,1,1,2]]).unsqueeze(0) # Batch size 1
outputs = model(**inputs, labels=labels)
loss = outputs.loss
logits = outputs.logits
print(loss, logits)
The key to mastering these advanced topics is to practice and experiment. Don't be afraid to try different things and make mistakes. That's how you learn and grow in the field of NLP.
Remember that these are very basic examples and real use-cases might require more extensive preprocessing, model configuration, and training procedures. Always refer to the official documentation of the libraries for more detailed information and best practices.
13.5 Practical Exercises of Chapter 13: Advanced Topics
Exercise 1: Transfer Learning
Experiment with different pre-trained models available in the Hugging Face Transformers library and fine-tune them for a text classification task of your choice.
Example:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased", num_labels=2)
inputs = tokenizer("I love this place!", return_tensors="pt")
outputs = model(**inputs)
print(outputs.logits)
Exercise 2: Natural Language Understanding
Using the spaCy library, extract entities, dependencies, and semantic roles from a corpus of text documents.
Example:
import spacy
nlp = spacy.load("en_core_web_sm")
doc = nlp("Apple is looking at buying U.K. startup for $1 billion")
for ent in doc.ents:
print(ent.text, ent.start_char, ent.end_char, ent.label_)
Exercise 3: Natural Language Generation
Use the GPT-2 model to generate a short story based on a prompt of your choice. Experiment with different prompts and observe how the model responds.
Example:
from transformers import pipeline
generator = pipeline('text-generation', model='gpt2')
text = generator("Once upon a time in a far away land,", max_length=100, do_sample=True)
print(text[0]['generated_text'])
Exercise 4: Advanced Transformer Models
Fine-tune BERT, RoBERTa, and GPT-2 models for a specific NLP task (like named entity recognition, sentiment analysis, or text classification). Compare their performance.
Example:
# Fine-tuning BERT for named entity recognition
from transformers import BertForTokenClassification
model = BertForTokenClassification.from_pretrained("bert-base-uncased", num_labels=3)
# Input needs to be tokenized and encoded
inputs = tokenizer(["I love Paris", "Apple is a great company"], return_tensors="pt", padding=True, truncation=True)
labels = torch.tensor([[1,1,2,0,0], [1,1,1,1,2]]).unsqueeze(0) # Batch size 1
outputs = model(**inputs, labels=labels)
loss = outputs.loss
logits = outputs.logits
print(loss, logits)
The key to mastering these advanced topics is to practice and experiment. Don't be afraid to try different things and make mistakes. That's how you learn and grow in the field of NLP.
Remember that these are very basic examples and real use-cases might require more extensive preprocessing, model configuration, and training procedures. Always refer to the official documentation of the libraries for more detailed information and best practices.
13.5 Practical Exercises of Chapter 13: Advanced Topics
Exercise 1: Transfer Learning
Experiment with different pre-trained models available in the Hugging Face Transformers library and fine-tune them for a text classification task of your choice.
Example:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased", num_labels=2)
inputs = tokenizer("I love this place!", return_tensors="pt")
outputs = model(**inputs)
print(outputs.logits)
Exercise 2: Natural Language Understanding
Using the spaCy library, extract entities, dependencies, and semantic roles from a corpus of text documents.
Example:
import spacy
nlp = spacy.load("en_core_web_sm")
doc = nlp("Apple is looking at buying U.K. startup for $1 billion")
for ent in doc.ents:
print(ent.text, ent.start_char, ent.end_char, ent.label_)
Exercise 3: Natural Language Generation
Use the GPT-2 model to generate a short story based on a prompt of your choice. Experiment with different prompts and observe how the model responds.
Example:
from transformers import pipeline
generator = pipeline('text-generation', model='gpt2')
text = generator("Once upon a time in a far away land,", max_length=100, do_sample=True)
print(text[0]['generated_text'])
Exercise 4: Advanced Transformer Models
Fine-tune BERT, RoBERTa, and GPT-2 models for a specific NLP task (like named entity recognition, sentiment analysis, or text classification). Compare their performance.
Example:
# Fine-tuning BERT for named entity recognition
from transformers import BertForTokenClassification
model = BertForTokenClassification.from_pretrained("bert-base-uncased", num_labels=3)
# Input needs to be tokenized and encoded
inputs = tokenizer(["I love Paris", "Apple is a great company"], return_tensors="pt", padding=True, truncation=True)
labels = torch.tensor([[1,1,2,0,0], [1,1,1,1,2]]).unsqueeze(0) # Batch size 1
outputs = model(**inputs, labels=labels)
loss = outputs.loss
logits = outputs.logits
print(loss, logits)
The key to mastering these advanced topics is to practice and experiment. Don't be afraid to try different things and make mistakes. That's how you learn and grow in the field of NLP.
Remember that these are very basic examples and real use-cases might require more extensive preprocessing, model configuration, and training procedures. Always refer to the official documentation of the libraries for more detailed information and best practices.