Chapter 9: Text Summarization
9.4 Practical Exercises of Chapter 9: Text Summarization
Exercise 1: Extractive Summarization Using Gensim
In this exercise, we'll use Gensim, a Python library for topic modeling and document similarity analysis, to perform extractive summarization on a text.
from gensim.summarization import summarize
text = """
The history of natural language processing generally started in the 1950s, although work can be found from earlier periods.
In 1950, Alan Turing published an article titled "Computing Machinery and Intelligence" which proposed what is now called the Turing test as a criterion of intelligence.
The Georgetown experiment in 1954 involved fully automatic translation of more than sixty Russian sentences into English.
The authors claimed that within three or five years, machine translation would be a solved problem.
However, real progress was much slower, and after the ALPAC report in 1966, which found that ten-year-long research had failed to fulfill the expectations,
funding for machine translation was dramatically reduced.
Little further research in machine translation was conducted until the late 1980s when the first statistical machine translation systems were developed.
"""
print(summarize(text))
Exercise 2: Abstractive Summarization Using BART
In this exercise, we'll use the BART model from the Hugging Face transformers library to perform abstractive summarization.
from transformers import BartTokenizer, BartForConditionalGeneration
model = BartForConditionalGeneration.from_pretrained('facebook/bart-large-cnn')
tokenizer = BartTokenizer.from_pretrained('facebook/bart-large-cnn')
inputs = tokenizer([text], max_length=1024, return_tensors='pt')
summary_ids = model.generate(inputs['input_ids'], num_beams=4, max_length=5, early_stopping=True)
print([tokenizer.decode(g, skip_special_tokens=True, clean_up_tokenization_spaces=False) for g in summary_ids])
Exercise 3: Summarization Evaluation
Evaluate the quality of the summaries from the above exercises using ROUGE (Recall-Oriented Understudy for Gisting Evaluation) scores.
from rouge import Rouge
hypothesis = "The history of natural language processing started in the 1950s with the Turing test and the Georgetown experiment. However, real progress was slow, and funding was reduced after the ALPAC report in 1966. Research resumed in the late 1980s with the development of statistical machine translation systems."
reference = "The history of natural language processing started in the 1950s. Turing proposed the Turing test in 1950. The Georgetown experiment in 1954 involved automatic translation of Russian into English. The authors predicted that machine translation would be solved in a few years. However, the ALPAC report in 1966 found that research hadn't met expectations, leading to reduced funding. Research picked up again in the late 1980s with the advent of statistical machine translation."
rouge = Rouge()
scores = rouge.get_scores(hypothesis, reference)
print(scores)
Please adjust the parameters and inputs to fit your specific needs and experiment with different models and texts for a deeper understanding of text summarization.
9.4 Practical Exercises of Chapter 9: Text Summarization
Exercise 1: Extractive Summarization Using Gensim
In this exercise, we'll use Gensim, a Python library for topic modeling and document similarity analysis, to perform extractive summarization on a text.
from gensim.summarization import summarize
text = """
The history of natural language processing generally started in the 1950s, although work can be found from earlier periods.
In 1950, Alan Turing published an article titled "Computing Machinery and Intelligence" which proposed what is now called the Turing test as a criterion of intelligence.
The Georgetown experiment in 1954 involved fully automatic translation of more than sixty Russian sentences into English.
The authors claimed that within three or five years, machine translation would be a solved problem.
However, real progress was much slower, and after the ALPAC report in 1966, which found that ten-year-long research had failed to fulfill the expectations,
funding for machine translation was dramatically reduced.
Little further research in machine translation was conducted until the late 1980s when the first statistical machine translation systems were developed.
"""
print(summarize(text))
Exercise 2: Abstractive Summarization Using BART
In this exercise, we'll use the BART model from the Hugging Face transformers library to perform abstractive summarization.
from transformers import BartTokenizer, BartForConditionalGeneration
model = BartForConditionalGeneration.from_pretrained('facebook/bart-large-cnn')
tokenizer = BartTokenizer.from_pretrained('facebook/bart-large-cnn')
inputs = tokenizer([text], max_length=1024, return_tensors='pt')
summary_ids = model.generate(inputs['input_ids'], num_beams=4, max_length=5, early_stopping=True)
print([tokenizer.decode(g, skip_special_tokens=True, clean_up_tokenization_spaces=False) for g in summary_ids])
Exercise 3: Summarization Evaluation
Evaluate the quality of the summaries from the above exercises using ROUGE (Recall-Oriented Understudy for Gisting Evaluation) scores.
from rouge import Rouge
hypothesis = "The history of natural language processing started in the 1950s with the Turing test and the Georgetown experiment. However, real progress was slow, and funding was reduced after the ALPAC report in 1966. Research resumed in the late 1980s with the development of statistical machine translation systems."
reference = "The history of natural language processing started in the 1950s. Turing proposed the Turing test in 1950. The Georgetown experiment in 1954 involved automatic translation of Russian into English. The authors predicted that machine translation would be solved in a few years. However, the ALPAC report in 1966 found that research hadn't met expectations, leading to reduced funding. Research picked up again in the late 1980s with the advent of statistical machine translation."
rouge = Rouge()
scores = rouge.get_scores(hypothesis, reference)
print(scores)
Please adjust the parameters and inputs to fit your specific needs and experiment with different models and texts for a deeper understanding of text summarization.
9.4 Practical Exercises of Chapter 9: Text Summarization
Exercise 1: Extractive Summarization Using Gensim
In this exercise, we'll use Gensim, a Python library for topic modeling and document similarity analysis, to perform extractive summarization on a text.
from gensim.summarization import summarize
text = """
The history of natural language processing generally started in the 1950s, although work can be found from earlier periods.
In 1950, Alan Turing published an article titled "Computing Machinery and Intelligence" which proposed what is now called the Turing test as a criterion of intelligence.
The Georgetown experiment in 1954 involved fully automatic translation of more than sixty Russian sentences into English.
The authors claimed that within three or five years, machine translation would be a solved problem.
However, real progress was much slower, and after the ALPAC report in 1966, which found that ten-year-long research had failed to fulfill the expectations,
funding for machine translation was dramatically reduced.
Little further research in machine translation was conducted until the late 1980s when the first statistical machine translation systems were developed.
"""
print(summarize(text))
Exercise 2: Abstractive Summarization Using BART
In this exercise, we'll use the BART model from the Hugging Face transformers library to perform abstractive summarization.
from transformers import BartTokenizer, BartForConditionalGeneration
model = BartForConditionalGeneration.from_pretrained('facebook/bart-large-cnn')
tokenizer = BartTokenizer.from_pretrained('facebook/bart-large-cnn')
inputs = tokenizer([text], max_length=1024, return_tensors='pt')
summary_ids = model.generate(inputs['input_ids'], num_beams=4, max_length=5, early_stopping=True)
print([tokenizer.decode(g, skip_special_tokens=True, clean_up_tokenization_spaces=False) for g in summary_ids])
Exercise 3: Summarization Evaluation
Evaluate the quality of the summaries from the above exercises using ROUGE (Recall-Oriented Understudy for Gisting Evaluation) scores.
from rouge import Rouge
hypothesis = "The history of natural language processing started in the 1950s with the Turing test and the Georgetown experiment. However, real progress was slow, and funding was reduced after the ALPAC report in 1966. Research resumed in the late 1980s with the development of statistical machine translation systems."
reference = "The history of natural language processing started in the 1950s. Turing proposed the Turing test in 1950. The Georgetown experiment in 1954 involved automatic translation of Russian into English. The authors predicted that machine translation would be solved in a few years. However, the ALPAC report in 1966 found that research hadn't met expectations, leading to reduced funding. Research picked up again in the late 1980s with the advent of statistical machine translation."
rouge = Rouge()
scores = rouge.get_scores(hypothesis, reference)
print(scores)
Please adjust the parameters and inputs to fit your specific needs and experiment with different models and texts for a deeper understanding of text summarization.
9.4 Practical Exercises of Chapter 9: Text Summarization
Exercise 1: Extractive Summarization Using Gensim
In this exercise, we'll use Gensim, a Python library for topic modeling and document similarity analysis, to perform extractive summarization on a text.
from gensim.summarization import summarize
text = """
The history of natural language processing generally started in the 1950s, although work can be found from earlier periods.
In 1950, Alan Turing published an article titled "Computing Machinery and Intelligence" which proposed what is now called the Turing test as a criterion of intelligence.
The Georgetown experiment in 1954 involved fully automatic translation of more than sixty Russian sentences into English.
The authors claimed that within three or five years, machine translation would be a solved problem.
However, real progress was much slower, and after the ALPAC report in 1966, which found that ten-year-long research had failed to fulfill the expectations,
funding for machine translation was dramatically reduced.
Little further research in machine translation was conducted until the late 1980s when the first statistical machine translation systems were developed.
"""
print(summarize(text))
Exercise 2: Abstractive Summarization Using BART
In this exercise, we'll use the BART model from the Hugging Face transformers library to perform abstractive summarization.
from transformers import BartTokenizer, BartForConditionalGeneration
model = BartForConditionalGeneration.from_pretrained('facebook/bart-large-cnn')
tokenizer = BartTokenizer.from_pretrained('facebook/bart-large-cnn')
inputs = tokenizer([text], max_length=1024, return_tensors='pt')
summary_ids = model.generate(inputs['input_ids'], num_beams=4, max_length=5, early_stopping=True)
print([tokenizer.decode(g, skip_special_tokens=True, clean_up_tokenization_spaces=False) for g in summary_ids])
Exercise 3: Summarization Evaluation
Evaluate the quality of the summaries from the above exercises using ROUGE (Recall-Oriented Understudy for Gisting Evaluation) scores.
from rouge import Rouge
hypothesis = "The history of natural language processing started in the 1950s with the Turing test and the Georgetown experiment. However, real progress was slow, and funding was reduced after the ALPAC report in 1966. Research resumed in the late 1980s with the development of statistical machine translation systems."
reference = "The history of natural language processing started in the 1950s. Turing proposed the Turing test in 1950. The Georgetown experiment in 1954 involved automatic translation of Russian into English. The authors predicted that machine translation would be solved in a few years. However, the ALPAC report in 1966 found that research hadn't met expectations, leading to reduced funding. Research picked up again in the late 1980s with the advent of statistical machine translation."
rouge = Rouge()
scores = rouge.get_scores(hypothesis, reference)
print(scores)
Please adjust the parameters and inputs to fit your specific needs and experiment with different models and texts for a deeper understanding of text summarization.