Menu iconMenu iconNatural Language Processing with Python
Natural Language Processing with Python

Chapter 9: Text Summarization

9.4 Practical Exercises of Chapter 9: Text Summarization

Exercise 1: Extractive Summarization Using Gensim

In this exercise, we'll use Gensim, a Python library for topic modeling and document similarity analysis, to perform extractive summarization on a text.

from gensim.summarization import summarize

text = """
The history of natural language processing generally started in the 1950s, although work can be found from earlier periods.
In 1950, Alan Turing published an article titled "Computing Machinery and Intelligence" which proposed what is now called the Turing test as a criterion of intelligence.
The Georgetown experiment in 1954 involved fully automatic translation of more than sixty Russian sentences into English.
The authors claimed that within three or five years, machine translation would be a solved problem.
However, real progress was much slower, and after the ALPAC report in 1966, which found that ten-year-long research had failed to fulfill the expectations,
funding for machine translation was dramatically reduced.
Little further research in machine translation was conducted until the late 1980s when the first statistical machine translation systems were developed.
"""

print(summarize(text))

Exercise 2: Abstractive Summarization Using BART

In this exercise, we'll use the BART model from the Hugging Face transformers library to perform abstractive summarization.

from transformers import BartTokenizer, BartForConditionalGeneration

model = BartForConditionalGeneration.from_pretrained('facebook/bart-large-cnn')
tokenizer = BartTokenizer.from_pretrained('facebook/bart-large-cnn')

inputs = tokenizer([text], max_length=1024, return_tensors='pt')
summary_ids = model.generate(inputs['input_ids'], num_beams=4, max_length=5, early_stopping=True)
print([tokenizer.decode(g, skip_special_tokens=True, clean_up_tokenization_spaces=False) for g in summary_ids])

Exercise 3: Summarization Evaluation

Evaluate the quality of the summaries from the above exercises using ROUGE (Recall-Oriented Understudy for Gisting Evaluation) scores.

from rouge import Rouge

hypothesis = "The history of natural language processing started in the 1950s with the Turing test and the Georgetown experiment. However, real progress was slow, and funding was reduced after the ALPAC report in 1966. Research resumed in the late 1980s with the development of statistical machine translation systems."
reference = "The history of natural language processing started in the 1950s. Turing proposed the Turing test in 1950. The Georgetown experiment in 1954 involved automatic translation of Russian into English. The authors predicted that machine translation would be solved in a few years. However, the ALPAC report in 1966 found that research hadn't met expectations, leading to reduced funding. Research picked up again in the late 1980s with the advent of statistical machine translation."

rouge = Rouge()
scores = rouge.get_scores(hypothesis, reference)
print(scores)

Please adjust the parameters and inputs to fit your specific needs and experiment with different models and texts for a deeper understanding of text summarization.

9.4 Practical Exercises of Chapter 9: Text Summarization

Exercise 1: Extractive Summarization Using Gensim

In this exercise, we'll use Gensim, a Python library for topic modeling and document similarity analysis, to perform extractive summarization on a text.

from gensim.summarization import summarize

text = """
The history of natural language processing generally started in the 1950s, although work can be found from earlier periods.
In 1950, Alan Turing published an article titled "Computing Machinery and Intelligence" which proposed what is now called the Turing test as a criterion of intelligence.
The Georgetown experiment in 1954 involved fully automatic translation of more than sixty Russian sentences into English.
The authors claimed that within three or five years, machine translation would be a solved problem.
However, real progress was much slower, and after the ALPAC report in 1966, which found that ten-year-long research had failed to fulfill the expectations,
funding for machine translation was dramatically reduced.
Little further research in machine translation was conducted until the late 1980s when the first statistical machine translation systems were developed.
"""

print(summarize(text))

Exercise 2: Abstractive Summarization Using BART

In this exercise, we'll use the BART model from the Hugging Face transformers library to perform abstractive summarization.

from transformers import BartTokenizer, BartForConditionalGeneration

model = BartForConditionalGeneration.from_pretrained('facebook/bart-large-cnn')
tokenizer = BartTokenizer.from_pretrained('facebook/bart-large-cnn')

inputs = tokenizer([text], max_length=1024, return_tensors='pt')
summary_ids = model.generate(inputs['input_ids'], num_beams=4, max_length=5, early_stopping=True)
print([tokenizer.decode(g, skip_special_tokens=True, clean_up_tokenization_spaces=False) for g in summary_ids])

Exercise 3: Summarization Evaluation

Evaluate the quality of the summaries from the above exercises using ROUGE (Recall-Oriented Understudy for Gisting Evaluation) scores.

from rouge import Rouge

hypothesis = "The history of natural language processing started in the 1950s with the Turing test and the Georgetown experiment. However, real progress was slow, and funding was reduced after the ALPAC report in 1966. Research resumed in the late 1980s with the development of statistical machine translation systems."
reference = "The history of natural language processing started in the 1950s. Turing proposed the Turing test in 1950. The Georgetown experiment in 1954 involved automatic translation of Russian into English. The authors predicted that machine translation would be solved in a few years. However, the ALPAC report in 1966 found that research hadn't met expectations, leading to reduced funding. Research picked up again in the late 1980s with the advent of statistical machine translation."

rouge = Rouge()
scores = rouge.get_scores(hypothesis, reference)
print(scores)

Please adjust the parameters and inputs to fit your specific needs and experiment with different models and texts for a deeper understanding of text summarization.

9.4 Practical Exercises of Chapter 9: Text Summarization

Exercise 1: Extractive Summarization Using Gensim

In this exercise, we'll use Gensim, a Python library for topic modeling and document similarity analysis, to perform extractive summarization on a text.

from gensim.summarization import summarize

text = """
The history of natural language processing generally started in the 1950s, although work can be found from earlier periods.
In 1950, Alan Turing published an article titled "Computing Machinery and Intelligence" which proposed what is now called the Turing test as a criterion of intelligence.
The Georgetown experiment in 1954 involved fully automatic translation of more than sixty Russian sentences into English.
The authors claimed that within three or five years, machine translation would be a solved problem.
However, real progress was much slower, and after the ALPAC report in 1966, which found that ten-year-long research had failed to fulfill the expectations,
funding for machine translation was dramatically reduced.
Little further research in machine translation was conducted until the late 1980s when the first statistical machine translation systems were developed.
"""

print(summarize(text))

Exercise 2: Abstractive Summarization Using BART

In this exercise, we'll use the BART model from the Hugging Face transformers library to perform abstractive summarization.

from transformers import BartTokenizer, BartForConditionalGeneration

model = BartForConditionalGeneration.from_pretrained('facebook/bart-large-cnn')
tokenizer = BartTokenizer.from_pretrained('facebook/bart-large-cnn')

inputs = tokenizer([text], max_length=1024, return_tensors='pt')
summary_ids = model.generate(inputs['input_ids'], num_beams=4, max_length=5, early_stopping=True)
print([tokenizer.decode(g, skip_special_tokens=True, clean_up_tokenization_spaces=False) for g in summary_ids])

Exercise 3: Summarization Evaluation

Evaluate the quality of the summaries from the above exercises using ROUGE (Recall-Oriented Understudy for Gisting Evaluation) scores.

from rouge import Rouge

hypothesis = "The history of natural language processing started in the 1950s with the Turing test and the Georgetown experiment. However, real progress was slow, and funding was reduced after the ALPAC report in 1966. Research resumed in the late 1980s with the development of statistical machine translation systems."
reference = "The history of natural language processing started in the 1950s. Turing proposed the Turing test in 1950. The Georgetown experiment in 1954 involved automatic translation of Russian into English. The authors predicted that machine translation would be solved in a few years. However, the ALPAC report in 1966 found that research hadn't met expectations, leading to reduced funding. Research picked up again in the late 1980s with the advent of statistical machine translation."

rouge = Rouge()
scores = rouge.get_scores(hypothesis, reference)
print(scores)

Please adjust the parameters and inputs to fit your specific needs and experiment with different models and texts for a deeper understanding of text summarization.

9.4 Practical Exercises of Chapter 9: Text Summarization

Exercise 1: Extractive Summarization Using Gensim

In this exercise, we'll use Gensim, a Python library for topic modeling and document similarity analysis, to perform extractive summarization on a text.

from gensim.summarization import summarize

text = """
The history of natural language processing generally started in the 1950s, although work can be found from earlier periods.
In 1950, Alan Turing published an article titled "Computing Machinery and Intelligence" which proposed what is now called the Turing test as a criterion of intelligence.
The Georgetown experiment in 1954 involved fully automatic translation of more than sixty Russian sentences into English.
The authors claimed that within three or five years, machine translation would be a solved problem.
However, real progress was much slower, and after the ALPAC report in 1966, which found that ten-year-long research had failed to fulfill the expectations,
funding for machine translation was dramatically reduced.
Little further research in machine translation was conducted until the late 1980s when the first statistical machine translation systems were developed.
"""

print(summarize(text))

Exercise 2: Abstractive Summarization Using BART

In this exercise, we'll use the BART model from the Hugging Face transformers library to perform abstractive summarization.

from transformers import BartTokenizer, BartForConditionalGeneration

model = BartForConditionalGeneration.from_pretrained('facebook/bart-large-cnn')
tokenizer = BartTokenizer.from_pretrained('facebook/bart-large-cnn')

inputs = tokenizer([text], max_length=1024, return_tensors='pt')
summary_ids = model.generate(inputs['input_ids'], num_beams=4, max_length=5, early_stopping=True)
print([tokenizer.decode(g, skip_special_tokens=True, clean_up_tokenization_spaces=False) for g in summary_ids])

Exercise 3: Summarization Evaluation

Evaluate the quality of the summaries from the above exercises using ROUGE (Recall-Oriented Understudy for Gisting Evaluation) scores.

from rouge import Rouge

hypothesis = "The history of natural language processing started in the 1950s with the Turing test and the Georgetown experiment. However, real progress was slow, and funding was reduced after the ALPAC report in 1966. Research resumed in the late 1980s with the development of statistical machine translation systems."
reference = "The history of natural language processing started in the 1950s. Turing proposed the Turing test in 1950. The Georgetown experiment in 1954 involved automatic translation of Russian into English. The authors predicted that machine translation would be solved in a few years. However, the ALPAC report in 1966 found that research hadn't met expectations, leading to reduced funding. Research picked up again in the late 1980s with the advent of statistical machine translation."

rouge = Rouge()
scores = rouge.get_scores(hypothesis, reference)
print(scores)

Please adjust the parameters and inputs to fit your specific needs and experiment with different models and texts for a deeper understanding of text summarization.