Chapter 7: Sentiment Analysis
7.5 Practical Exercises of Chapter 7: Sentiment Analysis
Exercise 7.5.1: Rule-Based Sentiment Analysis with TextBlob
In this exercise, you will use the TextBlob library in Python to perform rule-based sentiment analysis on a text.
from textblob import TextBlob
text = "I love this book. It's amazing."
blob = TextBlob(text)
print(blob.sentiment)
The sentiment property returns a namedtuple of the form Sentiment(polarity, subjectivity)
. The polarity score is a float within the range [-1.0, 1.0]. The subjectivity is a float within the range [0.0, 1.0] where 0.0 is very objective and 1.0 is very subjective.
Exercise 7.5.2: Machine Learning Sentiment Analysis with Scikit-Learn
In this exercise, you will use the Scikit-learn library to create a sentiment analysis model. We'll use a simple CountVectorizer to convert text into a matrix of token counts, and a LinearSVC model for classification.
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.svm import LinearSVC
from sklearn.pipeline import Pipeline
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
# This is a very basic example with minimal preprocessing and no hyperparameter tuning
# In a real-world scenario, you would want to clean your text data and tune your model
# Assume X is your list of texts and y is the corresponding sentiments
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
text_clf = Pipeline([('vect', CountVectorizer()),
('clf', LinearSVC())])
text_clf.fit(X_train, y_train)
predictions = text_clf.predict(X_test)
print(classification_report(y_test, predictions))
Exercise 7.5.3: Deep Learning Sentiment Analysis with Keras
In this exercise, you will use Keras, a deep learning library in Python, to create a simple sentiment analysis model.
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
from keras.models import Sequential
from keras.layers import Dense, Flatten, LSTM, Conv1D, MaxPooling1D, Dropout, Activation
from keras.layers.embeddings import Embedding
# Assume X is your list of texts and y is the corresponding sentiments
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Prepare tokenizer
t = Tokenizer()
t.fit_on_texts(X_train)
# Convert text into sequences of integers
sequences = t.texts_to_sequences(X_train)
test_sequences = t.texts_to_sequences(X_test)
# Pad the sequences so they are all the same length
data = pad_sequences(sequences, maxlen=100)
test_data = pad_sequences(test_sequences, maxlen=100)
# Define the model
model = Sequential()
model.add(Embedding(20000, 100, input_length=100))
model.add(Dropout(0.2))
model.add(Conv1D(64, 5, activation='relu'))
model.add(MaxPooling1D(pool_size=4))
model.add(LSTM(100))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# Train the model
model.fit(data, np.array(y_train), validation_split=0.4, epochs=3)
# Evaluate on test set
loss, accuracy = model.evaluate(test_data, np.array(y_test))
print('Test Accuracy: %f' % (accuracy*100))
In the above code, we first tokenize the texts and convert them into sequences of integers. We then pad the sequences so they are all of the same length. After that, we define a Sequential model with an Embedding layer, a Dropout layer, a Conv1D layer, a MaxPooling1D layer, an LSTM layer, and a Dense layer. We then compile the model with the 'adam' optimizer and 'binary_crossentropy' as the loss function since this is a binary classification problem. After training the model, we evaluate it on the test set and print the test accuracy.
These exercises should provide a comprehensive overview of sentiment analysis using rule-based, machine learning, and deep learning approaches.
7.5 Practical Exercises of Chapter 7: Sentiment Analysis
Exercise 7.5.1: Rule-Based Sentiment Analysis with TextBlob
In this exercise, you will use the TextBlob library in Python to perform rule-based sentiment analysis on a text.
from textblob import TextBlob
text = "I love this book. It's amazing."
blob = TextBlob(text)
print(blob.sentiment)
The sentiment property returns a namedtuple of the form Sentiment(polarity, subjectivity)
. The polarity score is a float within the range [-1.0, 1.0]. The subjectivity is a float within the range [0.0, 1.0] where 0.0 is very objective and 1.0 is very subjective.
Exercise 7.5.2: Machine Learning Sentiment Analysis with Scikit-Learn
In this exercise, you will use the Scikit-learn library to create a sentiment analysis model. We'll use a simple CountVectorizer to convert text into a matrix of token counts, and a LinearSVC model for classification.
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.svm import LinearSVC
from sklearn.pipeline import Pipeline
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
# This is a very basic example with minimal preprocessing and no hyperparameter tuning
# In a real-world scenario, you would want to clean your text data and tune your model
# Assume X is your list of texts and y is the corresponding sentiments
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
text_clf = Pipeline([('vect', CountVectorizer()),
('clf', LinearSVC())])
text_clf.fit(X_train, y_train)
predictions = text_clf.predict(X_test)
print(classification_report(y_test, predictions))
Exercise 7.5.3: Deep Learning Sentiment Analysis with Keras
In this exercise, you will use Keras, a deep learning library in Python, to create a simple sentiment analysis model.
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
from keras.models import Sequential
from keras.layers import Dense, Flatten, LSTM, Conv1D, MaxPooling1D, Dropout, Activation
from keras.layers.embeddings import Embedding
# Assume X is your list of texts and y is the corresponding sentiments
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Prepare tokenizer
t = Tokenizer()
t.fit_on_texts(X_train)
# Convert text into sequences of integers
sequences = t.texts_to_sequences(X_train)
test_sequences = t.texts_to_sequences(X_test)
# Pad the sequences so they are all the same length
data = pad_sequences(sequences, maxlen=100)
test_data = pad_sequences(test_sequences, maxlen=100)
# Define the model
model = Sequential()
model.add(Embedding(20000, 100, input_length=100))
model.add(Dropout(0.2))
model.add(Conv1D(64, 5, activation='relu'))
model.add(MaxPooling1D(pool_size=4))
model.add(LSTM(100))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# Train the model
model.fit(data, np.array(y_train), validation_split=0.4, epochs=3)
# Evaluate on test set
loss, accuracy = model.evaluate(test_data, np.array(y_test))
print('Test Accuracy: %f' % (accuracy*100))
In the above code, we first tokenize the texts and convert them into sequences of integers. We then pad the sequences so they are all of the same length. After that, we define a Sequential model with an Embedding layer, a Dropout layer, a Conv1D layer, a MaxPooling1D layer, an LSTM layer, and a Dense layer. We then compile the model with the 'adam' optimizer and 'binary_crossentropy' as the loss function since this is a binary classification problem. After training the model, we evaluate it on the test set and print the test accuracy.
These exercises should provide a comprehensive overview of sentiment analysis using rule-based, machine learning, and deep learning approaches.
7.5 Practical Exercises of Chapter 7: Sentiment Analysis
Exercise 7.5.1: Rule-Based Sentiment Analysis with TextBlob
In this exercise, you will use the TextBlob library in Python to perform rule-based sentiment analysis on a text.
from textblob import TextBlob
text = "I love this book. It's amazing."
blob = TextBlob(text)
print(blob.sentiment)
The sentiment property returns a namedtuple of the form Sentiment(polarity, subjectivity)
. The polarity score is a float within the range [-1.0, 1.0]. The subjectivity is a float within the range [0.0, 1.0] where 0.0 is very objective and 1.0 is very subjective.
Exercise 7.5.2: Machine Learning Sentiment Analysis with Scikit-Learn
In this exercise, you will use the Scikit-learn library to create a sentiment analysis model. We'll use a simple CountVectorizer to convert text into a matrix of token counts, and a LinearSVC model for classification.
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.svm import LinearSVC
from sklearn.pipeline import Pipeline
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
# This is a very basic example with minimal preprocessing and no hyperparameter tuning
# In a real-world scenario, you would want to clean your text data and tune your model
# Assume X is your list of texts and y is the corresponding sentiments
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
text_clf = Pipeline([('vect', CountVectorizer()),
('clf', LinearSVC())])
text_clf.fit(X_train, y_train)
predictions = text_clf.predict(X_test)
print(classification_report(y_test, predictions))
Exercise 7.5.3: Deep Learning Sentiment Analysis with Keras
In this exercise, you will use Keras, a deep learning library in Python, to create a simple sentiment analysis model.
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
from keras.models import Sequential
from keras.layers import Dense, Flatten, LSTM, Conv1D, MaxPooling1D, Dropout, Activation
from keras.layers.embeddings import Embedding
# Assume X is your list of texts and y is the corresponding sentiments
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Prepare tokenizer
t = Tokenizer()
t.fit_on_texts(X_train)
# Convert text into sequences of integers
sequences = t.texts_to_sequences(X_train)
test_sequences = t.texts_to_sequences(X_test)
# Pad the sequences so they are all the same length
data = pad_sequences(sequences, maxlen=100)
test_data = pad_sequences(test_sequences, maxlen=100)
# Define the model
model = Sequential()
model.add(Embedding(20000, 100, input_length=100))
model.add(Dropout(0.2))
model.add(Conv1D(64, 5, activation='relu'))
model.add(MaxPooling1D(pool_size=4))
model.add(LSTM(100))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# Train the model
model.fit(data, np.array(y_train), validation_split=0.4, epochs=3)
# Evaluate on test set
loss, accuracy = model.evaluate(test_data, np.array(y_test))
print('Test Accuracy: %f' % (accuracy*100))
In the above code, we first tokenize the texts and convert them into sequences of integers. We then pad the sequences so they are all of the same length. After that, we define a Sequential model with an Embedding layer, a Dropout layer, a Conv1D layer, a MaxPooling1D layer, an LSTM layer, and a Dense layer. We then compile the model with the 'adam' optimizer and 'binary_crossentropy' as the loss function since this is a binary classification problem. After training the model, we evaluate it on the test set and print the test accuracy.
These exercises should provide a comprehensive overview of sentiment analysis using rule-based, machine learning, and deep learning approaches.
7.5 Practical Exercises of Chapter 7: Sentiment Analysis
Exercise 7.5.1: Rule-Based Sentiment Analysis with TextBlob
In this exercise, you will use the TextBlob library in Python to perform rule-based sentiment analysis on a text.
from textblob import TextBlob
text = "I love this book. It's amazing."
blob = TextBlob(text)
print(blob.sentiment)
The sentiment property returns a namedtuple of the form Sentiment(polarity, subjectivity)
. The polarity score is a float within the range [-1.0, 1.0]. The subjectivity is a float within the range [0.0, 1.0] where 0.0 is very objective and 1.0 is very subjective.
Exercise 7.5.2: Machine Learning Sentiment Analysis with Scikit-Learn
In this exercise, you will use the Scikit-learn library to create a sentiment analysis model. We'll use a simple CountVectorizer to convert text into a matrix of token counts, and a LinearSVC model for classification.
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.svm import LinearSVC
from sklearn.pipeline import Pipeline
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
# This is a very basic example with minimal preprocessing and no hyperparameter tuning
# In a real-world scenario, you would want to clean your text data and tune your model
# Assume X is your list of texts and y is the corresponding sentiments
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
text_clf = Pipeline([('vect', CountVectorizer()),
('clf', LinearSVC())])
text_clf.fit(X_train, y_train)
predictions = text_clf.predict(X_test)
print(classification_report(y_test, predictions))
Exercise 7.5.3: Deep Learning Sentiment Analysis with Keras
In this exercise, you will use Keras, a deep learning library in Python, to create a simple sentiment analysis model.
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
from keras.models import Sequential
from keras.layers import Dense, Flatten, LSTM, Conv1D, MaxPooling1D, Dropout, Activation
from keras.layers.embeddings import Embedding
# Assume X is your list of texts and y is the corresponding sentiments
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Prepare tokenizer
t = Tokenizer()
t.fit_on_texts(X_train)
# Convert text into sequences of integers
sequences = t.texts_to_sequences(X_train)
test_sequences = t.texts_to_sequences(X_test)
# Pad the sequences so they are all the same length
data = pad_sequences(sequences, maxlen=100)
test_data = pad_sequences(test_sequences, maxlen=100)
# Define the model
model = Sequential()
model.add(Embedding(20000, 100, input_length=100))
model.add(Dropout(0.2))
model.add(Conv1D(64, 5, activation='relu'))
model.add(MaxPooling1D(pool_size=4))
model.add(LSTM(100))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# Train the model
model.fit(data, np.array(y_train), validation_split=0.4, epochs=3)
# Evaluate on test set
loss, accuracy = model.evaluate(test_data, np.array(y_test))
print('Test Accuracy: %f' % (accuracy*100))
In the above code, we first tokenize the texts and convert them into sequences of integers. We then pad the sequences so they are all of the same length. After that, we define a Sequential model with an Embedding layer, a Dropout layer, a Conv1D layer, a MaxPooling1D layer, an LSTM layer, and a Dense layer. We then compile the model with the 'adam' optimizer and 'binary_crossentropy' as the loss function since this is a binary classification problem. After training the model, we evaluate it on the test set and print the test accuracy.
These exercises should provide a comprehensive overview of sentiment analysis using rule-based, machine learning, and deep learning approaches.