Chapter 6: Advance Level Exercises
Advance Level Exercises Part 2
Exercise 26: Machine Learning
Concepts:
- Machine Learning
- Scikit-Learn library
- Data Preprocessing
- Feature Engineering
- Model Training
- Model Evaluation
Description: Write a Python script that uses machine learning techniques to train a model and make predictions on new data.
Solution:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
# Read the data into a pandas dataframe
df = pd.read_csv('data.csv')
# Check for missing values
if df.isnull().sum().sum() > 0:
print("Warning: Missing values detected. Filling with mean values.")
df = df.fillna(df.mean()) # Alternatively, df.dropna() to remove rows with NaN values
# Ensure target column exists
if 'target' not in df.columns:
raise ValueError("Error: 'target' column not found in dataset.")
# Split the data into features and labels
X = df.drop(columns=['target'])
y = df['target']
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)
# Scale the data using standardization
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
# Train a logistic regression model with class balancing
model = LogisticRegression(random_state=42, class_weight='balanced')
model.fit(X_train_scaled, y_train)
# Make predictions on the test set
y_pred = model.predict(X_test_scaled)
# Evaluate the model performance
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred, average='weighted') # Supports multi-class
recall = recall_score(y_test, y_pred, average='weighted')
f1 = f1_score(y_test, y_pred, average='weighted')
# Print evaluation metrics
print('Accuracy:', round(accuracy, 4))
print('Precision:', round(precision, 4))
print('Recall:', round(recall, 4))
print('F1 score:', round(f1, 4))
In this exercise, we first read a dataset into a pandas dataframe. We split the data into training and testing sets using the train_test_split
function from the sklearn.model_selection
module. We scale the data using standardization using the StandardScaler
class from the sklearn.preprocessing
module. We train a logistic regression model using the LogisticRegression
class from the sklearn.linear_model
module and make predictions on the test set. Finally, we evaluate the performance of the model using metrics such as accuracy, precision, recall, and F1 score using the appropriate functions from the sklearn.metrics
module.
Exercise 27: Web Development
Concepts:
- Web Development
- Flask framework
- HTML templates
- Routing
- HTTP methods
- Form handling
Description: Write a Python script that creates a web application using the Flask framework.
Solution:
from flask import Flask, render_template, request
app = Flask(__name__)
# Define a route for the home page
@app.route('/')
def home():
return render_template('home.html')
# Define a route for the contact page
@app.route('/contact', methods=['GET', 'POST'])
def contact():
if request.method == 'POST':
name = request.form['name']
email = request.form['email']
message = request.form['message']
# TODO: Process the form data
return 'Thanks for contacting us!'
else:
return render_template('contact.html')
if __name__ == '__main__':
app.run(debug=True)
In this exercise, we first import the Flask
class from the flask
module and create a new Flask application. We define routes for the home page and contact page using the route
decorator. We use the render_template
function to render HTML templates for the home page and contact page. We handle form submissions on the contact page using the request
object and the POST
method. Finally, we start the Flask application using the run
method.
Exercise 28: Data Streaming
Concepts:
- Data Streaming
- Kafka
- PyKafka library
- Stream Processing
Description: Write a Python script that streams data from a source and processes it in real-time.
Solution:
from pykafka import KafkaClient
import json
# Kafka broker configuration
KAFKA_BROKER = 'localhost:9092'
TOPIC_NAME = 'test'
try:
# Connect to Kafka broker
client = KafkaClient(hosts=KAFKA_BROKER)
# Get a reference to the topic
topic = client.topics[TOPIC_NAME]
# Create a consumer
consumer = topic.get_simple_consumer()
print(f"Connected to Kafka broker at {KAFKA_BROKER}, consuming messages from topic '{TOPIC_NAME}'...")
# Process messages in real-time
for message in consumer:
if message is not None:
try:
data = json.loads(message.value.decode('utf-8')) # Decode & parse JSON safely
print("Received message:", data)
# TODO: Process the data in real-time
except json.JSONDecodeError as e:
print(f"Error decoding JSON: {e} - Raw message: {message.value}")
except Exception as e:
print(f"Kafka connection error: {e}")
finally:
if 'consumer' in locals():
consumer.stop() # Ensure consumer is properly stopped
print("Kafka consumer stopped.")
In this exercise, we first connect to a Kafka broker using the KafkaClient
class from the pykafka
library. We get a reference to a topic and create a consumer for the topic using the get_simple_consumer
method. We process messages in real-time using a loop and the value
attribute of the messages. We parse the message data using the json.loads
function and process the data in real-time.
Exercise 29: Natural Language Processing
Concepts:
- Natural Language Processing
- NLTK library
- Tokenization
- Stemming
- Stop Words Removal
Description: Write a Python script that performs natural language processing tasks on a text corpus.
Solution:
import nltk
from nltk.tokenize import word_tokenize
from nltk.stem import PorterStemmer
from nltk.corpus import stopwords
# Download NLTK data
nltk.download('punkt')
nltk.download('stopwords')
# Load the text corpus
with open('corpus.txt', 'r') as f:
corpus = f.read()
# Tokenize the corpus
tokens = word_tokenize(corpus)
# Remove stop words
stop_words = set(stopwords.words('english'))
filtered_tokens = [token for token in tokens if token.lower() not in stop_words]
# Stem the tokens
stemmer = PorterStemmer()
stemmed_tokens = [stemmer.stem(token) for token in filtered_tokens]
# Print the results
print('Original tokens:', tokens[:10])
print('Filtered tokens:', filtered_tokens[:10])
print('Stemmed tokens:', stemmed_tokens[:10])
In this exercise, we first download the necessary data from the NLTK library using the nltk.download
function. We load a text corpus from a file and tokenize the corpus using the word_tokenize
function from the nltk.tokenize
module. We remove stop words using the stopwords
corpus from the NLTK library and stem the tokens using the PorterStemmer
class from the nltk.stem
module. Finally, we print the results for the original, filtered, and stemmed tokens.
Exercise 30: Distributed Systems
Concepts:
- Distributed Systems
- Pyro library
- Remote Method Invocation
- Client-Server Architecture
Description: Write a Python script that implements a distributed system using the Pyro library.
Solution:
import Pyro4
# Define a remote object class
@Pyro4.expose
class MyObject:
def method1(self, arg1):
return f"Processed method1 with argument: {arg1}"
def method2(self, arg2):
return f"Processed method2 with argument: {arg2}"
# Start the server
if __name__ == '__main__':
# Locate the name server
ns = Pyro4.locateNS()
# Create a Pyro daemon
daemon = Pyro4.Daemon()
# Register the remote object with the daemon
uri = daemon.register(MyObject)
# Register the object with the name server
ns.register('myobject', uri)
print(f"MyObject is now available. URI: {uri}")
# Run the server loop
daemon.requestLoop()
In this exercise, we first define a remote object class using the expose
decorator from the Pyro4
library. We implement two methods that can be invoked remotely by a client. We register the remote object using the register
method of a Pyro4
daemon. We start the name server using the locateNS
function from the Pyro4
library and register the remote object with a name. Finally, we start the server using the requestLoop
method of the daemon.
I hope you find these exercises helpful! Let me know if you have any further questions.
Exercise 31: Data Visualization
Concepts:
- Data Visualization
- Plotly library
- Line Chart
- Scatter Chart
- Bar Chart
- Heatmap
- Subplots
Description: Write a Python script that creates interactive visualizations of data using the Plotly library.
Solution:
import plotly.graph_objs as go
import pandas as pd
from plotly.subplots import make_subplots # Correct import
# Load the data
df = pd.read_csv('data.csv')
# Ensure 'quarter' is a string (for heatmap y-axis)
df['quarter'] = df['quarter'].astype(str)
# Create traces
trace1 = go.Scatter(x=df['year'], y=df['sales'], mode='lines', name='Sales')
trace2 = go.Scatter(x=df['year'], y=df['profit'], mode='markers', name='Profit')
trace3 = go.Bar(x=df['year'], y=df['expenses'], name='Expenses')
trace4 = go.Heatmap(x=df['year'], y=df['quarter'], z=df['revenue'], colorscale='Viridis', name='Revenue')
# Create subplots
fig = make_subplots(rows=2, cols=2, subplot_titles=('Sales', 'Profit', 'Expenses', 'Revenue'))
# Add traces correctly
fig.add_trace(trace1, row=1, col=1)
fig.add_trace(trace2, row=1, col=2)
fig.add_trace(trace3, row=2, col=1)
fig.add_trace(trace4, row=2, col=2)
# Update layout for better visualization
fig.update_layout(title='Financial Performance', height=800, width=1000)
# Display the chart
fig.show()
In this exercise, we first load a dataset into a pandas dataframe. We create several chart objects using the Scatter
, Bar
, and Heatmap
classes from the plotly.graph_objs
module. We create subplots using the make_subplots
function from the plotly.subplots
module and add the chart objects to the subplots using the append_trace
method. We set the layout of the chart using the update_layout
method and display the chart using the show
method.
Exercise 32: Data Engineering
Concepts:
- Data Engineering
- SQLite
- Pandas library
- Data Transformation
- Data Integration
Description: Write a Python script that processes data from multiple sources and stores it in a database.
Solution:
import sqlite3
import pandas as pd
# Load data from multiple sources into pandas DataFrames
df1 = pd.read_csv('data1.csv')
df2 = pd.read_excel('data2.xlsx')
df3 = pd.read_json('data3.json')
# Standardize column names across datasets
expected_columns = ['date', 'amount', 'description'] # Adjust based on actual dataset
df1 = df1.reindex(columns=expected_columns, fill_value=None)
df2 = df2.reindex(columns=expected_columns, fill_value=None)
df3 = df3.reindex(columns=expected_columns, fill_value=None)
# Data Cleaning & Transformation
df1['date'] = pd.to_datetime(df1['date'], errors='coerce') # Handle invalid dates
df2['amount'] = df2['amount'].astype(float) / 100 # Convert to proper currency format
df3['description'] = df3['description'].astype(str).str.upper() # Ensure consistency
# Merge DataFrames while handling missing values
df = pd.concat([df1, df2, df3], axis=0).fillna({'amount': 0, 'description': 'UNKNOWN'})
# Store the data in a SQLite database safely
db_file = 'mydb.db'
table_name = 'mytable'
with sqlite3.connect(db_file) as conn:
df.to_sql(table_name, conn, if_exists='replace', index=False)
print(f"Data successfully saved to SQLite table '{table_name}' in '{db_file}'.")
In this exercise, we first load data from multiple sources into pandas dataframes using functions such as read_csv
, read_excel
, and read_json
. We transform the data using pandas functions such as to_datetime
, str.upper
, and arithmetic operations. We combine the data into a single pandas dataframe using the concat
function. Finally, we store the data in a SQLite database using the to_sql
method of the pandas dataframe.
Exercise 33: Natural Language Generation
Concepts:
- Natural Language Generation
- Markov Chains
- NLTK library
- Text Corpus
Description: Write a Python script that generates text using natural language generation techniques.
Solution:
import nltk
import random
import os
# Download necessary NLTK resources
nltk.download('punkt')
# Define corpus file
corpus_file = 'corpus.txt'
# Ensure the corpus file exists
if not os.path.exists(corpus_file):
raise FileNotFoundError(f"Error: The file '{corpus_file}' was not found.")
# Load the text corpus
with open(corpus_file, 'r', encoding='utf-8') as f:
corpus = f.read()
# Tokenize the corpus
tokens = nltk.word_tokenize(corpus)
# Build a dictionary of word transitions (Markov Chain)
chain = {}
for i in range(len(tokens) - 1):
word1 = tokens[i]
word2 = tokens[i + 1]
if word1 in chain:
chain[word1].append(word2)
else:
chain[word1] = [word2]
# Generate text using Markov chains
start_word = random.choice(list(chain.keys()))
sentence = [start_word.capitalize()]
while len(sentence) < 100: # Limit by word count
last_word = sentence[-1].lower() # Ensure consistent lookup
if last_word in chain:
next_word = random.choice(chain[last_word])
sentence.append(next_word)
else:
break # Stop if there are no next words
# Print the generated text
print(' '.join(sentence))
In this exercise, we first download the necessary data from the NLTK library using the nltk.download
function. We load a text corpus from a file and tokenize the corpus using the word_tokenize
function from the nltk
library. We build a dictionary of word transitions using a loop and generate text using Markov chains. We start by selecting a random word from the dictionary and then randomly select a next word from the list of possible transitions. We continue to add words to the sentence until it reaches a specified length. Finally, we print the generated text.
Exercise 34: Machine Learning
Concepts:
- Machine Learning
- Scikit-learn library
- Decision Tree Classifier
- Model Training
- Model Evaluation
Description: Write a Python script that trains a machine learning model using the scikit-learn library.
Solution:
from sklearn import datasets
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report
# Load the iris dataset
iris = datasets.load_iris()
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.3, random_state=42, stratify=iris.target)
# Train a decision tree classifier with hyperparameter tuning
clf = DecisionTreeClassifier(max_depth=4, min_samples_split=5, random_state=42)
clf.fit(X_train, y_train)
# Make predictions
y_pred = clf.predict(X_test)
# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print('Accuracy:', round(accuracy, 4))
print('\nClassification Report:\n', classification_report(y_test, y_pred, target_names=iris.target_names))
# Feature Importance Analysis
feature_importances = dict(zip(iris.feature_names, clf.feature_importances_))
print("\nFeature Importances:", feature_importances)
In this exercise, we first load the iris dataset from the scikit-learn library using the load_iris
function. We split the data into training and testing sets using the train_test_split
function. We train a decision tree classifier using the DecisionTreeClassifier
class and the fit
method. We evaluate the model using the predict
method and the accuracy_score
function from the sklearn.metrics
module.
Exercise 35: Computer Vision
Concepts:
- Computer Vision
- OpenCV library
- Image Loading
- Image Filtering
- Image Segmentation
Description: Write a Python script that performs computer vision tasks on images using the OpenCV library.
Solution:
import cv2
import os
# Load an image safely
image_path = 'image.jpg'
if not os.path.exists(image_path):
raise FileNotFoundError(f"Error: '{image_path}' not found.")
img = cv2.imread(image_path)
# Convert to grayscale
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# Apply median filter
filtered = cv2.medianBlur(gray, 5)
# Apply adaptive thresholding
thresh = cv2.adaptiveThreshold(filtered, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 11, 2)
# Apply morphological operations
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (3, 3))
closed = cv2.morphologyEx(thresh, cv2.MORPH_CLOSE, kernel)
# Find contours
contours_info = cv2.findContours(closed, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
contours = contours_info[0] if len(contours_info) == 2 else contours_info[1] # Safe unpacking
# Draw contours on the original image
cv2.drawContours(img, contours, -1, (0, 0, 255), 2)
# Save and display the processed images
cv2.imwrite('output_contours.jpg', img)
cv2.imwrite('output_thresholded.jpg', thresh)
cv2.imwrite('output_closed.jpg', closed)
print("Processing complete. Images saved as 'output_contours.jpg', 'output_thresholded.jpg', and 'output_closed.jpg'.")
# Display the images (comment out if running on a headless system)
cv2.imshow('Original', img)
cv2.imshow('Thresholded', thresh)
cv2.imshow('Closed', closed)
cv2.waitKey(0)
cv2.destroyAllWindows()
In this exercise, we first load an image using the imread
function from the OpenCV library. We convert the image to grayscale using the cvtColor
function and apply a median filter to the image using the medianBlur
function. We apply adaptive thresholding to the image using the adaptiveThreshold
function and morphological operations to the image using the getStructuringElement
and morphologyEx
functions. We find contours in the image using the findContours
function and draw the contours on the original image using the drawContours
function. Finally, we display the images using the imshow
function.
I hope you find these exercises helpful! Let me know if you have any further questions.
Exercise 36: Network Programming
Concepts:
- Network Programming
- Socket library
- Client-Server Architecture
- Protocol Implementation
Description: Write a Python script that communicates with a remote server using the socket library.
Solution:
import socket
# Create a socket object
s = socket.socket()
# Define the server address and port number
host = 'localhost'
port = 12345
# Connect to the server
s.connect((host, port))
# Send data to the server
s.send(b'Hello, server!')
# Receive data from the server
data = s.recv(1024)
# Close the socket
s.close()
# Print the received data
print('Received:', data.decode())
In this exercise, we first create a socket object using the socket
function from the socket library. We define the address and port number of the server we want to connect to. We connect to the server using the connect
method of the socket object. We send data to the server using the send
method and receive data from the server using the recv
method. Finally, we close the socket using the close
method and print the received data.
Exercise 37: Cloud Computing
Concepts:
- Cloud Computing
- Heroku
- Flask
- Web Application Deployment
Description: Write a Python script that deploys a Flask web application to the Heroku cloud platform.
Solution:
from flask import Flask
# Create a Flask application
app = Flask(__name__)
# Define a route
@app.route('/')
def hello():
return 'Hello, world!'
# Run the application (for development only)
if __name__ == '__main__':
app.run(host='0.0.0.0', port=5000, debug=True) # Set debug=True only for development
In this exercise, we first install the required libraries for deploying a Flask web application to the Heroku cloud platform. We create a simple Flask application that defines a single route. We use the run
method of the Flask object to run the application locally. To deploy the application to the Heroku cloud platform, we need to follow the instructions provided by Heroku and push our code to a remote repository.
Exercise 38: Natural Language Processing
Concepts:
- Natural Language Processing
- spaCy library
- Named Entity Recognition
- Text Processing
Description: Write a Python script that performs named entity recognition on text using the spaCy library.
Solution:
import spacy
# Ensure the model is installed before running the script:
# Run: python -m spacy download en_core_web_sm
# Load the English language model
try:
nlp = spacy.load('en_core_web_sm')
except OSError:
raise OSError("Spacy model 'en_core_web_sm' not found. Run 'python -m spacy download en_core_web_sm' and try again.")
# Define some text to process
text = 'Barack Obama was born in Hawaii.'
# Process the text
doc = nlp(text)
# Extract named entities from the text
entities = [(ent.text, ent.label_) for ent in doc.ents]
# Display results
if entities:
print("\nNamed Entities Found:")
for text, label in entities:
print(f" - {text}: {label}")
else:
print("\nNo named entities found in the text.")
In this exercise, we first load the English language model using the load
function from the spaCy library. We define some text to process and process the text using the nlp
function from the spaCy library. We extract named entities from the text using the ents
attribute of the processed text and print the text and label of each named entity.
Exercise 39: Deep Learning
Concepts:
- Deep Learning
- TensorFlow library
- Convolutional Neural Network
- Model Training
- Model Evaluation
Description: Write a Python script that trains a deep learning model using the TensorFlow library.
Solution:
import tensorflow as tf
from tensorflow.keras import datasets, layers, models
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.callbacks import EarlyStopping
# Load the CIFAR-10 dataset
(train_images, train_labels), (test_images, test_labels) = datasets.cifar10.load_data()
# Normalize the pixel values
train_images, test_images = train_images / 255.0, test_images / 255.0
# Data Augmentation to prevent overfitting
datagen = ImageDataGenerator(
rotation_range=15,
width_shift_range=0.1,
height_shift_range=0.1,
horizontal_flip=True
)
# Define the model architecture
model = models.Sequential([
layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)),
layers.MaxPooling2D((2, 2)),
layers.Conv2D(64, (3, 3), activation='relu'),
layers.MaxPooling2D((2, 2)),
layers.Conv2D(128, (3, 3), activation='relu'),
layers.MaxPooling2D((2, 2)),
layers.Flatten(),
layers.Dense(128, activation='relu'),
layers.Dropout(0.5), # Prevent overfitting
layers.Dense(10, activation='softmax') # Use Softmax for probabilities
])
# Compile the model
model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(),
metrics=['accuracy'])
# Define early stopping to stop training if no improvement
early_stopping = EarlyStopping(monitor='val_loss', patience=3, restore_best_weights=True)
# Train the model with data augmentation
model.fit(datagen.flow(train_images, train_labels, batch_size=64),
validation_data=(test_images, test_labels),
epochs=30, callbacks=[early_stopping])
# Evaluate the model
test_loss, test_acc = model.evaluate(test_images, test_labels, verbose=2)
print('Test accuracy:', round(test_acc * 100, 2), '%')
In this exercise, we first load the CIFAR-10 dataset from the TensorFlow library using the load_data
function. We normalize the pixel values of the images by dividing them by 255.0. We define a deep learning model architecture using the Sequential
class from the TensorFlow library and various layers such as Conv2D
, MaxPooling2D
, Flatten
, and Dense
. We compile the model using the compile
method and train the model using the fit
method. We evaluate the model using the evaluate
method and print the test accuracy.
Exercise 40: Data Analysis
Concepts:
- Data Analysis
- Pandas library
- Data Cleaning
- Data Manipulation
- Data Visualization
Description: Write a Python script that analyzes data using the pandas library.
Solution:
import pandas as pd
import matplotlib.pyplot as plt
# Load the data
df = pd.read_csv('data.csv')
# Convert 'date' column to datetime format
df['date'] = pd.to_datetime(df['date'], errors='coerce')
# Drop rows with missing or invalid dates
df.dropna(subset=['date'], inplace=True)
# Convert 'price' and 'quantity' to numeric values (if not already)
df['price'] = pd.to_numeric(df['price'], errors='coerce')
df['quantity'] = pd.to_numeric(df['quantity'], errors='coerce')
# Drop rows with missing or invalid price/quantity
df.dropna(subset=['price', 'quantity'], inplace=True)
# Compute total sales
df['total_sales'] = df['price'] * df['quantity']
# Set date as index for proper resampling
df.set_index('date', inplace=True)
# Group by month and sum sales
monthly_sales = df.resample('M').sum()
# Visualize the data
plt.figure(figsize=(10, 5))
plt.plot(monthly_sales.index, monthly_sales['total_sales'], marker='o', linestyle='-')
plt.xlabel('Month')
plt.ylabel('Total Sales')
plt.title('Monthly Sales Trend')
plt.grid()
plt.xticks(rotation=45)
plt.show()
In this exercise, we first load data from a CSV file using the read_csv
function from the pandas library. We clean the data by removing any rows with missing values using the dropna
method. We manipulate the data by calculating the total sales for each transaction and grouping the data by month using the groupby
method. We visualize the data by plotting the total sales for each month using the plot
function from the matplotlib library.
Exercise 41: Data Science
Concepts:
- Data Science
- NumPy library
- pandas library
- Matplotlib library
- Data Cleaning
- Data Manipulation
- Data Visualization
Description: Write a Python script that performs data analysis on a dataset using the NumPy, pandas, and Matplotlib libraries.
Solution:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# Load the data
df = pd.read_csv('data.csv')
# Convert 'date' column to datetime format
df['date'] = pd.to_datetime(df['date'], errors='coerce')
# Drop rows with missing or invalid dates
df.dropna(subset=['date'], inplace=True)
# Convert 'price' and 'quantity' to numeric values (if not already)
df['price'] = pd.to_numeric(df['price'], errors='coerce')
df['quantity'] = pd.to_numeric(df['quantity'], errors='coerce')
# Drop rows with missing or invalid price/quantity
df.dropna(subset=['price', 'quantity'], inplace=True)
# Compute total sales
df['total_sales'] = df['price'] * df['quantity']
# Set date as index for proper resampling
df.set_index('date', inplace=True)
# Group by month and sum sales
monthly_sales = df.resample('M').sum()
# Analyze the data
print('Total Sales:', round(df['total_sales'].sum(), 2))
print('Average Price:', round(df['price'].mean(), 2))
print('Median Quantity:', df['quantity'].median())
# Visualize the data
plt.figure(figsize=(10, 5))
plt.plot(monthly_sales.index, monthly_sales['total_sales'], marker='o', linestyle='-', color='b', label='Total Sales')
plt.xlabel('Month')
plt.ylabel('Total Sales')
plt.title('Monthly Sales Trend')
plt.legend()
plt.grid()
plt.xticks(rotation=45)
plt.show()
In this exercise, we first load data from a CSV file using the read_csv
function from the pandas library. We clean the data by removing any rows with missing values using the dropna
method. We manipulate the data by calculating the total sales for each transaction and grouping the data by month using the groupby
method. We perform some basic data analysis by calculating the total sales, average price, and median quantity. We visualize the data by plotting the total sales for each month using the plot
function from the matplotlib library.
Exercise 42: Machine Learning
Concepts:
- Machine Learning
- scikit-learn library
- Support Vector Machines
- Model Training
- Model Evaluation
Description: Write a Python script that trains a machine learning model using the scikit-learn library.
Solution:
import numpy as np
from sklearn import datasets, svm
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
# Load the iris dataset
iris = datasets.load_iris()
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.2, random_state=42, stratify=iris.target)
# Standardize the data (SVMs perform better with scaled data)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
# Train a Support Vector Machine classifier
clf = svm.SVC(kernel='linear', C=1.0, random_state=42)
clf.fit(X_train_scaled, y_train)
# Predict the labels
y_pred = clf.predict(X_test_scaled)
# Evaluate the classifier
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy:.4f}\n')
# Print detailed evaluation metrics
print("Classification Report:\n", classification_report(y_test, y_pred, target_names=iris.target_names))
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))
In this exercise, we first load the iris dataset from the scikit-learn library using the load_iris
function. We split the data into training and testing sets using the train_test_split
function from the scikit-learn library. We train a support vector machine classifier using the SVC
class from the scikit-learn library with a linear kernel. We evaluate the classifier using the score
method and print the accuracy.
Exercise 43: Web Scraping
Concepts:
- Web Scraping
- BeautifulSoup library
- HTML Parsing
- Data Extraction
Description: Write a Python script that scrapes data from a website using the BeautifulSoup library.
Solution:
import requests
from bs4 import BeautifulSoup
# Define the target URL
url = 'https://en.wikipedia.org/wiki/Python_(programming_language)'
# Add headers to prevent request blocking
headers = {'User-Agent': 'Mozilla/5.0'}
# Fetch the HTML content of the website
response = requests.get(url, headers=headers)
# Check if the request was successful
if response.status_code != 200:
print(f"Error: Unable to fetch the page (Status Code: {response.status_code})")
exit()
# Parse the HTML content using BeautifulSoup
soup = BeautifulSoup(response.text, 'html.parser')
# Extract the page title
title = soup.title.string
print(f"\nPage Title: {title}\n")
# Extract all valid links
base_url = 'https://en.wikipedia.org'
links = []
for link in soup.find_all('a', href=True): # Ensures 'href' exists
href = link.get('href')
# Convert relative Wikipedia links to absolute URLs
if href.startswith('/wiki/'):
full_url = base_url + href
links.append(full_url)
elif href.startswith('http'): # Keep only valid external links
links.append(href)
# Print the first 10 links for brevity
print("Extracted Links:")
for l in links[:10]: # Limit output for readability
print(l)
print(f"\nTotal Links Found: {len(links)}")
In this exercise, we first fetch the HTML content of a website using the get
function from the requests library. We parse the HTML content using the BeautifulSoup
class from the BeautifulSoup library. We extract data from the HTML content using various methods such as title
and find_all
.
Exercise 44: Database Programming
Concepts:
- Database Programming
- SQLite library
- SQL
- Data Retrieval
- Data Manipulation
Description: Write a Python script that interacts with a database using the SQLite library.
Solution:
import sqlite3
# Connect to the database using a context manager
with sqlite3.connect('data.db') as conn:
cursor = conn.cursor()
# Create a table (if it doesn't exist)
cursor.execute('''CREATE TABLE IF NOT EXISTS users (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT NOT NULL,
age INTEGER NOT NULL);''')
# Insert data into the table (use parameterized queries)
cursor.execute("INSERT INTO users (name, age) VALUES (?, ?)", ('John Doe', 30))
cursor.execute("INSERT INTO users (name, age) VALUES (?, ?)", ('Jane Doe', 25))
# Retrieve data from the table
cursor.execute('SELECT * FROM users')
users = cursor.fetchall() # Fetch all rows
print("\nUsers in database:")
for user in users:
print(user)
# Update data using a parameterized query
cursor.execute("UPDATE users SET age = ? WHERE name = ?", (35, 'John Doe'))
# Delete data using a parameterized query
cursor.execute("DELETE FROM users WHERE name = ?", ('Jane Doe',))
# Commit the changes (happens automatically with `with` statement)
conn.commit()
print("\nDatabase operations completed successfully.")
In this exercise, we first connect to a SQLite database using the connect
function from the SQLite library. We create a table using SQL commands and insert data into the table using SQL commands. We retrieve data from the table using SQL commands and print the data. We update data in the table and delete data from the table using SQL commands. Finally, we commit the changes to the database and close the connection.
Exercise 45: Cloud Computing
Concepts:
- Cloud Computing
- AWS
- Flask library
- Boto3 library
- Web Application Deployment
Description: Write a Python script that deploys a web application to the AWS cloud platform using the Flask and Boto3 libraries.
Solution:
from flask import Flask
import boto3
import os
# Create a Flask application
app = Flask(__name__)
# AWS S3 Configuration
AWS_BUCKET_NAME = 'my-bucket'
AWS_REGION = 'us-east-1' # Change to your region
# Upload function for AWS S3
def upload_to_s3(file_name, bucket_name, object_name=None):
"""Uploads a file to S3"""
try:
s3 = boto3.client('s3') # Ensure credentials are configured
object_name = object_name or file_name # Default object name
# Upload file
s3.upload_file(file_name, bucket_name, object_name)
print(f"File '{file_name}' uploaded successfully to S3 bucket '{bucket_name}'.")
except Exception as e:
print(f"Error uploading to S3: {e}")
# Define a route
@app.route('/')
def hello():
return 'Hello, world! Flask is running!'
# Run the application
if __name__ == '__main__':
# Upload a file to S3 before starting Flask (Optional)
if os.path.exists('app.py'):
upload_to_s3('app.py', AWS_BUCKET_NAME)
# Run Flask server
app.run(host='0.0.0.0', port=5000, debug=True)
In this exercise, we first install the required libraries for deploying a Flask web application to the AWS cloud platform. We create a simple Flask application that defines a single route. We use the upload_file
method from the Boto3 library to upload the application to an AWS S3 bucket. Note that this is only a basic example and there are many additional steps involved in deploying a web application to the AWS cloud platform, such as creating an EC2 instance, setting up a load balancer, configuring security groups, and more.
Exercise 46: Natural Language Processing
Concepts:
- Natural Language Processing
- NLTK library
- Tokenization
- Part-of-Speech Tagging
- Named Entity Recognition
Description: Write a Python script that performs natural language processing on text data using the NLTK library.
Solution:
import nltk
# Download required NLTK models
nltk.download('punkt')
nltk.download('maxent_ne_chunker')
nltk.download('words')
nltk.download('averaged_perceptron_tagger')
# Load the text data
text = '''Apple Inc. is an American multinational technology company headquartered in Cupertino, California, that designs, develops, and sells consumer electronics, computer software, and online services. The company's hardware products include the iPhone smartphone, the iPad tablet computer, the Mac personal computer, the iPod portable media player, the Apple Watch smartwatch, the Apple TV digital media player, and the HomePod smart speaker. Apple's software includes the macOS and iOS operating systems, the iTunes media player, the Safari web browser, and the iLife and iWork creativity and productivity suites. Its online services include the iTunes Store, the iOS App Store, and Mac App Store, Apple Music, and iCloud.'''
# Tokenize the text
tokens = nltk.word_tokenize(text)
# Perform part-of-speech tagging
pos_tags = nltk.pos_tag(tokens)
# Perform named entity recognition
ne_tags = nltk.ne_chunk(pos_tags)
# Extract named entities
named_entities = {}
for chunk in ne_tags:
if hasattr(chunk, 'label'):
entity_type = chunk.label() # Get entity type (e.g., ORGANIZATION, PERSON)
entity_name = ' '.join(c[0] for c in chunk) # Join words in entity
if entity_type not in named_entities:
named_entities[entity_type] = []
named_entities[entity_type].append(entity_name)
# Print structured named entities
print("\nNamed Entities Found:")
for entity_type, names in named_entities.items():
print(f"{entity_type}: {', '.join(set(names))}") # Use `set()` to remove duplicates
In this exercise, we first load some text data. We tokenize the text using the word_tokenize
function from the NLTK library. We perform part-of-speech tagging using the pos_tag
function from the NLTK library. We perform named entity recognition using the ne_chunk
function from the NLTK library. We print the named entities in the text data by checking if each chunk has a label of 'ORGANIZATION' or 'PERSON' using the hasattr
function and label
attribute.
Exercise 47: Big Data
Concepts:
- Big Data
- PySpark
- Apache Spark
- Data Processing
- MapReduce
Description: Write a PySpark script that processes data using the Spark framework.
Solution:
from pyspark import SparkContext, SparkConf
# Configure the Spark context
conf = SparkConf().setAppName('wordcount').setMaster('local[*]')
sc = SparkContext(conf=conf)
# Load the text data
text = sc.textFile('data.txt')
# Split the text into words and count the occurrences of each word
word_counts = text.flatMap(lambda line: line.split(' ')).map(lambda word: (word, 1)).reduceByKey(lambda a, b: a + b)
# Print the word counts
for word, count in word_counts.collect():
print(word, count)
# Stop the Spark context
sc.stop()
In this exercise, we first configure the Spark context using the SparkConf
and SparkContext
classes from the PySpark library. We load some text data using the textFile
method. We split the text into words and count the occurrences of each word using the flatMap
, map
, and reduceByKey
methods. We print the word counts using the collect
method. Finally, we stop the Spark context using the stop
method.
Exercise 48: Cybersecurity
Concepts:
- Cybersecurity
- Scapy library
- Network Analysis
- Packet Sniffing
Description: Write a Python script that performs security analysis on a network using the Scapy library.
Solution:
from scapy.all import *
# Define a packet handler function
def packet_handler(packet):
if packet.haslayer(TCP):
if packet[TCP].flags & 2:
print('SYN packet detected:', packet.summary())
# Start the packet sniffer
sniff(prn=packet_handler, filter='tcp', store=0)
In this exercise, we use the Scapy library to perform security analysis on a network. We define a packet handler function that is called for each packet that is sniffed. We check if the packet is a TCP packet and if it has the SYN flag set. If so, we print a message indicating that a SYN packet has been detected, along with a summary of the packet.
Exercise 49: Machine Learning
Concepts:
- Machine Learning
- Scikit-learn library
- Model Training
- Cross-Validation
- Grid Search
Description: Write a Python script that trains a machine learning model using the scikit-learn library.
Solution:
from sklearn import datasets
from sklearn.model_selection import cross_val_score, GridSearchCV
from sklearn.neighbors import KNeighborsClassifier
# Load the dataset
iris = datasets.load_iris()
# Split the dataset into features and target
X = iris.data
y = iris.target
# Define the hyperparameters to search
param_grid = {'n_neighbors': [3, 5, 7, 9], 'weights': ['uniform', 'distance']}
# Create a KNN classifier
knn = KNeighborsClassifier()
# Perform a grid search with cross-validation
grid_search = GridSearchCV(knn, param_grid, cv=5)
grid_search.fit(X, y)
# Print the best hyperparameters and the accuracy score
print('Best Hyperparameters:', grid_search.best_params_)
print('Accuracy Score:', grid_search.best_score_)
In this exercise, we use the scikit-learn library to train a machine learning model. We load a dataset using the load_iris
function from the datasets
module. We split the dataset into features and target. We define a dictionary of hyperparameters to search over using the param_grid
variable. We create a KNN classifier using the KNeighborsClassifier
class. We perform a grid search with cross-validation using the GridSearchCV
class. We print the best hyperparameters and the accuracy score using the best_params_
and best_score_
attributes.
Exercise 50: Computer Vision
Concepts:
- Computer Vision
- OpenCV library
- Image Processing
- Object Detection
Description: Write a Python script that performs image processing using the OpenCV library.
Solution:
import cv2
# Load the image
img = cv2.imread('image.jpg')
# Convert the image to grayscale
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# Define a classifier for face detection
face_cascade = cv2.CascadeClassifier('haarcascade_frontalface_default.xml')
# Detect faces in the image
faces = face_cascade.detectMultiScale(gray, scaleFactor=1.1, minNeighbors=5)
# Draw rectangles around the detected faces
for (x, y, w, h) in faces:
cv2.rectangle(img, (x, y), (x + w, y + h), (0, 255, 0), 2)
# Display the image with the detected faces
cv2.imshow('image', img)
cv2.waitKey(0)
cv2.destroyAllWindows()
In this exercise, we use the OpenCV library to perform image processing. We load an image using the imread
function. We convert the image to grayscale using the cvtColor
function. We define a classifier for face detection using the CascadeClassifier
class and a pre-trained classifier file. We detect faces in the image using the detectMultiScale
function. We draw rectangles around the detected faces using the rectangle
function. We display the image with the detected faces using the imshow
, waitKey
, and destroyAllWindows
functions.
Advance Level Exercises Part 2
Exercise 26: Machine Learning
Concepts:
- Machine Learning
- Scikit-Learn library
- Data Preprocessing
- Feature Engineering
- Model Training
- Model Evaluation
Description: Write a Python script that uses machine learning techniques to train a model and make predictions on new data.
Solution:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
# Read the data into a pandas dataframe
df = pd.read_csv('data.csv')
# Check for missing values
if df.isnull().sum().sum() > 0:
print("Warning: Missing values detected. Filling with mean values.")
df = df.fillna(df.mean()) # Alternatively, df.dropna() to remove rows with NaN values
# Ensure target column exists
if 'target' not in df.columns:
raise ValueError("Error: 'target' column not found in dataset.")
# Split the data into features and labels
X = df.drop(columns=['target'])
y = df['target']
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)
# Scale the data using standardization
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
# Train a logistic regression model with class balancing
model = LogisticRegression(random_state=42, class_weight='balanced')
model.fit(X_train_scaled, y_train)
# Make predictions on the test set
y_pred = model.predict(X_test_scaled)
# Evaluate the model performance
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred, average='weighted') # Supports multi-class
recall = recall_score(y_test, y_pred, average='weighted')
f1 = f1_score(y_test, y_pred, average='weighted')
# Print evaluation metrics
print('Accuracy:', round(accuracy, 4))
print('Precision:', round(precision, 4))
print('Recall:', round(recall, 4))
print('F1 score:', round(f1, 4))
In this exercise, we first read a dataset into a pandas dataframe. We split the data into training and testing sets using the train_test_split
function from the sklearn.model_selection
module. We scale the data using standardization using the StandardScaler
class from the sklearn.preprocessing
module. We train a logistic regression model using the LogisticRegression
class from the sklearn.linear_model
module and make predictions on the test set. Finally, we evaluate the performance of the model using metrics such as accuracy, precision, recall, and F1 score using the appropriate functions from the sklearn.metrics
module.
Exercise 27: Web Development
Concepts:
- Web Development
- Flask framework
- HTML templates
- Routing
- HTTP methods
- Form handling
Description: Write a Python script that creates a web application using the Flask framework.
Solution:
from flask import Flask, render_template, request
app = Flask(__name__)
# Define a route for the home page
@app.route('/')
def home():
return render_template('home.html')
# Define a route for the contact page
@app.route('/contact', methods=['GET', 'POST'])
def contact():
if request.method == 'POST':
name = request.form['name']
email = request.form['email']
message = request.form['message']
# TODO: Process the form data
return 'Thanks for contacting us!'
else:
return render_template('contact.html')
if __name__ == '__main__':
app.run(debug=True)
In this exercise, we first import the Flask
class from the flask
module and create a new Flask application. We define routes for the home page and contact page using the route
decorator. We use the render_template
function to render HTML templates for the home page and contact page. We handle form submissions on the contact page using the request
object and the POST
method. Finally, we start the Flask application using the run
method.
Exercise 28: Data Streaming
Concepts:
- Data Streaming
- Kafka
- PyKafka library
- Stream Processing
Description: Write a Python script that streams data from a source and processes it in real-time.
Solution:
from pykafka import KafkaClient
import json
# Kafka broker configuration
KAFKA_BROKER = 'localhost:9092'
TOPIC_NAME = 'test'
try:
# Connect to Kafka broker
client = KafkaClient(hosts=KAFKA_BROKER)
# Get a reference to the topic
topic = client.topics[TOPIC_NAME]
# Create a consumer
consumer = topic.get_simple_consumer()
print(f"Connected to Kafka broker at {KAFKA_BROKER}, consuming messages from topic '{TOPIC_NAME}'...")
# Process messages in real-time
for message in consumer:
if message is not None:
try:
data = json.loads(message.value.decode('utf-8')) # Decode & parse JSON safely
print("Received message:", data)
# TODO: Process the data in real-time
except json.JSONDecodeError as e:
print(f"Error decoding JSON: {e} - Raw message: {message.value}")
except Exception as e:
print(f"Kafka connection error: {e}")
finally:
if 'consumer' in locals():
consumer.stop() # Ensure consumer is properly stopped
print("Kafka consumer stopped.")
In this exercise, we first connect to a Kafka broker using the KafkaClient
class from the pykafka
library. We get a reference to a topic and create a consumer for the topic using the get_simple_consumer
method. We process messages in real-time using a loop and the value
attribute of the messages. We parse the message data using the json.loads
function and process the data in real-time.
Exercise 29: Natural Language Processing
Concepts:
- Natural Language Processing
- NLTK library
- Tokenization
- Stemming
- Stop Words Removal
Description: Write a Python script that performs natural language processing tasks on a text corpus.
Solution:
import nltk
from nltk.tokenize import word_tokenize
from nltk.stem import PorterStemmer
from nltk.corpus import stopwords
# Download NLTK data
nltk.download('punkt')
nltk.download('stopwords')
# Load the text corpus
with open('corpus.txt', 'r') as f:
corpus = f.read()
# Tokenize the corpus
tokens = word_tokenize(corpus)
# Remove stop words
stop_words = set(stopwords.words('english'))
filtered_tokens = [token for token in tokens if token.lower() not in stop_words]
# Stem the tokens
stemmer = PorterStemmer()
stemmed_tokens = [stemmer.stem(token) for token in filtered_tokens]
# Print the results
print('Original tokens:', tokens[:10])
print('Filtered tokens:', filtered_tokens[:10])
print('Stemmed tokens:', stemmed_tokens[:10])
In this exercise, we first download the necessary data from the NLTK library using the nltk.download
function. We load a text corpus from a file and tokenize the corpus using the word_tokenize
function from the nltk.tokenize
module. We remove stop words using the stopwords
corpus from the NLTK library and stem the tokens using the PorterStemmer
class from the nltk.stem
module. Finally, we print the results for the original, filtered, and stemmed tokens.
Exercise 30: Distributed Systems
Concepts:
- Distributed Systems
- Pyro library
- Remote Method Invocation
- Client-Server Architecture
Description: Write a Python script that implements a distributed system using the Pyro library.
Solution:
import Pyro4
# Define a remote object class
@Pyro4.expose
class MyObject:
def method1(self, arg1):
return f"Processed method1 with argument: {arg1}"
def method2(self, arg2):
return f"Processed method2 with argument: {arg2}"
# Start the server
if __name__ == '__main__':
# Locate the name server
ns = Pyro4.locateNS()
# Create a Pyro daemon
daemon = Pyro4.Daemon()
# Register the remote object with the daemon
uri = daemon.register(MyObject)
# Register the object with the name server
ns.register('myobject', uri)
print(f"MyObject is now available. URI: {uri}")
# Run the server loop
daemon.requestLoop()
In this exercise, we first define a remote object class using the expose
decorator from the Pyro4
library. We implement two methods that can be invoked remotely by a client. We register the remote object using the register
method of a Pyro4
daemon. We start the name server using the locateNS
function from the Pyro4
library and register the remote object with a name. Finally, we start the server using the requestLoop
method of the daemon.
I hope you find these exercises helpful! Let me know if you have any further questions.
Exercise 31: Data Visualization
Concepts:
- Data Visualization
- Plotly library
- Line Chart
- Scatter Chart
- Bar Chart
- Heatmap
- Subplots
Description: Write a Python script that creates interactive visualizations of data using the Plotly library.
Solution:
import plotly.graph_objs as go
import pandas as pd
from plotly.subplots import make_subplots # Correct import
# Load the data
df = pd.read_csv('data.csv')
# Ensure 'quarter' is a string (for heatmap y-axis)
df['quarter'] = df['quarter'].astype(str)
# Create traces
trace1 = go.Scatter(x=df['year'], y=df['sales'], mode='lines', name='Sales')
trace2 = go.Scatter(x=df['year'], y=df['profit'], mode='markers', name='Profit')
trace3 = go.Bar(x=df['year'], y=df['expenses'], name='Expenses')
trace4 = go.Heatmap(x=df['year'], y=df['quarter'], z=df['revenue'], colorscale='Viridis', name='Revenue')
# Create subplots
fig = make_subplots(rows=2, cols=2, subplot_titles=('Sales', 'Profit', 'Expenses', 'Revenue'))
# Add traces correctly
fig.add_trace(trace1, row=1, col=1)
fig.add_trace(trace2, row=1, col=2)
fig.add_trace(trace3, row=2, col=1)
fig.add_trace(trace4, row=2, col=2)
# Update layout for better visualization
fig.update_layout(title='Financial Performance', height=800, width=1000)
# Display the chart
fig.show()
In this exercise, we first load a dataset into a pandas dataframe. We create several chart objects using the Scatter
, Bar
, and Heatmap
classes from the plotly.graph_objs
module. We create subplots using the make_subplots
function from the plotly.subplots
module and add the chart objects to the subplots using the append_trace
method. We set the layout of the chart using the update_layout
method and display the chart using the show
method.
Exercise 32: Data Engineering
Concepts:
- Data Engineering
- SQLite
- Pandas library
- Data Transformation
- Data Integration
Description: Write a Python script that processes data from multiple sources and stores it in a database.
Solution:
import sqlite3
import pandas as pd
# Load data from multiple sources into pandas DataFrames
df1 = pd.read_csv('data1.csv')
df2 = pd.read_excel('data2.xlsx')
df3 = pd.read_json('data3.json')
# Standardize column names across datasets
expected_columns = ['date', 'amount', 'description'] # Adjust based on actual dataset
df1 = df1.reindex(columns=expected_columns, fill_value=None)
df2 = df2.reindex(columns=expected_columns, fill_value=None)
df3 = df3.reindex(columns=expected_columns, fill_value=None)
# Data Cleaning & Transformation
df1['date'] = pd.to_datetime(df1['date'], errors='coerce') # Handle invalid dates
df2['amount'] = df2['amount'].astype(float) / 100 # Convert to proper currency format
df3['description'] = df3['description'].astype(str).str.upper() # Ensure consistency
# Merge DataFrames while handling missing values
df = pd.concat([df1, df2, df3], axis=0).fillna({'amount': 0, 'description': 'UNKNOWN'})
# Store the data in a SQLite database safely
db_file = 'mydb.db'
table_name = 'mytable'
with sqlite3.connect(db_file) as conn:
df.to_sql(table_name, conn, if_exists='replace', index=False)
print(f"Data successfully saved to SQLite table '{table_name}' in '{db_file}'.")
In this exercise, we first load data from multiple sources into pandas dataframes using functions such as read_csv
, read_excel
, and read_json
. We transform the data using pandas functions such as to_datetime
, str.upper
, and arithmetic operations. We combine the data into a single pandas dataframe using the concat
function. Finally, we store the data in a SQLite database using the to_sql
method of the pandas dataframe.
Exercise 33: Natural Language Generation
Concepts:
- Natural Language Generation
- Markov Chains
- NLTK library
- Text Corpus
Description: Write a Python script that generates text using natural language generation techniques.
Solution:
import nltk
import random
import os
# Download necessary NLTK resources
nltk.download('punkt')
# Define corpus file
corpus_file = 'corpus.txt'
# Ensure the corpus file exists
if not os.path.exists(corpus_file):
raise FileNotFoundError(f"Error: The file '{corpus_file}' was not found.")
# Load the text corpus
with open(corpus_file, 'r', encoding='utf-8') as f:
corpus = f.read()
# Tokenize the corpus
tokens = nltk.word_tokenize(corpus)
# Build a dictionary of word transitions (Markov Chain)
chain = {}
for i in range(len(tokens) - 1):
word1 = tokens[i]
word2 = tokens[i + 1]
if word1 in chain:
chain[word1].append(word2)
else:
chain[word1] = [word2]
# Generate text using Markov chains
start_word = random.choice(list(chain.keys()))
sentence = [start_word.capitalize()]
while len(sentence) < 100: # Limit by word count
last_word = sentence[-1].lower() # Ensure consistent lookup
if last_word in chain:
next_word = random.choice(chain[last_word])
sentence.append(next_word)
else:
break # Stop if there are no next words
# Print the generated text
print(' '.join(sentence))
In this exercise, we first download the necessary data from the NLTK library using the nltk.download
function. We load a text corpus from a file and tokenize the corpus using the word_tokenize
function from the nltk
library. We build a dictionary of word transitions using a loop and generate text using Markov chains. We start by selecting a random word from the dictionary and then randomly select a next word from the list of possible transitions. We continue to add words to the sentence until it reaches a specified length. Finally, we print the generated text.
Exercise 34: Machine Learning
Concepts:
- Machine Learning
- Scikit-learn library
- Decision Tree Classifier
- Model Training
- Model Evaluation
Description: Write a Python script that trains a machine learning model using the scikit-learn library.
Solution:
from sklearn import datasets
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report
# Load the iris dataset
iris = datasets.load_iris()
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.3, random_state=42, stratify=iris.target)
# Train a decision tree classifier with hyperparameter tuning
clf = DecisionTreeClassifier(max_depth=4, min_samples_split=5, random_state=42)
clf.fit(X_train, y_train)
# Make predictions
y_pred = clf.predict(X_test)
# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print('Accuracy:', round(accuracy, 4))
print('\nClassification Report:\n', classification_report(y_test, y_pred, target_names=iris.target_names))
# Feature Importance Analysis
feature_importances = dict(zip(iris.feature_names, clf.feature_importances_))
print("\nFeature Importances:", feature_importances)
In this exercise, we first load the iris dataset from the scikit-learn library using the load_iris
function. We split the data into training and testing sets using the train_test_split
function. We train a decision tree classifier using the DecisionTreeClassifier
class and the fit
method. We evaluate the model using the predict
method and the accuracy_score
function from the sklearn.metrics
module.
Exercise 35: Computer Vision
Concepts:
- Computer Vision
- OpenCV library
- Image Loading
- Image Filtering
- Image Segmentation
Description: Write a Python script that performs computer vision tasks on images using the OpenCV library.
Solution:
import cv2
import os
# Load an image safely
image_path = 'image.jpg'
if not os.path.exists(image_path):
raise FileNotFoundError(f"Error: '{image_path}' not found.")
img = cv2.imread(image_path)
# Convert to grayscale
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# Apply median filter
filtered = cv2.medianBlur(gray, 5)
# Apply adaptive thresholding
thresh = cv2.adaptiveThreshold(filtered, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 11, 2)
# Apply morphological operations
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (3, 3))
closed = cv2.morphologyEx(thresh, cv2.MORPH_CLOSE, kernel)
# Find contours
contours_info = cv2.findContours(closed, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
contours = contours_info[0] if len(contours_info) == 2 else contours_info[1] # Safe unpacking
# Draw contours on the original image
cv2.drawContours(img, contours, -1, (0, 0, 255), 2)
# Save and display the processed images
cv2.imwrite('output_contours.jpg', img)
cv2.imwrite('output_thresholded.jpg', thresh)
cv2.imwrite('output_closed.jpg', closed)
print("Processing complete. Images saved as 'output_contours.jpg', 'output_thresholded.jpg', and 'output_closed.jpg'.")
# Display the images (comment out if running on a headless system)
cv2.imshow('Original', img)
cv2.imshow('Thresholded', thresh)
cv2.imshow('Closed', closed)
cv2.waitKey(0)
cv2.destroyAllWindows()
In this exercise, we first load an image using the imread
function from the OpenCV library. We convert the image to grayscale using the cvtColor
function and apply a median filter to the image using the medianBlur
function. We apply adaptive thresholding to the image using the adaptiveThreshold
function and morphological operations to the image using the getStructuringElement
and morphologyEx
functions. We find contours in the image using the findContours
function and draw the contours on the original image using the drawContours
function. Finally, we display the images using the imshow
function.
I hope you find these exercises helpful! Let me know if you have any further questions.
Exercise 36: Network Programming
Concepts:
- Network Programming
- Socket library
- Client-Server Architecture
- Protocol Implementation
Description: Write a Python script that communicates with a remote server using the socket library.
Solution:
import socket
# Create a socket object
s = socket.socket()
# Define the server address and port number
host = 'localhost'
port = 12345
# Connect to the server
s.connect((host, port))
# Send data to the server
s.send(b'Hello, server!')
# Receive data from the server
data = s.recv(1024)
# Close the socket
s.close()
# Print the received data
print('Received:', data.decode())
In this exercise, we first create a socket object using the socket
function from the socket library. We define the address and port number of the server we want to connect to. We connect to the server using the connect
method of the socket object. We send data to the server using the send
method and receive data from the server using the recv
method. Finally, we close the socket using the close
method and print the received data.
Exercise 37: Cloud Computing
Concepts:
- Cloud Computing
- Heroku
- Flask
- Web Application Deployment
Description: Write a Python script that deploys a Flask web application to the Heroku cloud platform.
Solution:
from flask import Flask
# Create a Flask application
app = Flask(__name__)
# Define a route
@app.route('/')
def hello():
return 'Hello, world!'
# Run the application (for development only)
if __name__ == '__main__':
app.run(host='0.0.0.0', port=5000, debug=True) # Set debug=True only for development
In this exercise, we first install the required libraries for deploying a Flask web application to the Heroku cloud platform. We create a simple Flask application that defines a single route. We use the run
method of the Flask object to run the application locally. To deploy the application to the Heroku cloud platform, we need to follow the instructions provided by Heroku and push our code to a remote repository.
Exercise 38: Natural Language Processing
Concepts:
- Natural Language Processing
- spaCy library
- Named Entity Recognition
- Text Processing
Description: Write a Python script that performs named entity recognition on text using the spaCy library.
Solution:
import spacy
# Ensure the model is installed before running the script:
# Run: python -m spacy download en_core_web_sm
# Load the English language model
try:
nlp = spacy.load('en_core_web_sm')
except OSError:
raise OSError("Spacy model 'en_core_web_sm' not found. Run 'python -m spacy download en_core_web_sm' and try again.")
# Define some text to process
text = 'Barack Obama was born in Hawaii.'
# Process the text
doc = nlp(text)
# Extract named entities from the text
entities = [(ent.text, ent.label_) for ent in doc.ents]
# Display results
if entities:
print("\nNamed Entities Found:")
for text, label in entities:
print(f" - {text}: {label}")
else:
print("\nNo named entities found in the text.")
In this exercise, we first load the English language model using the load
function from the spaCy library. We define some text to process and process the text using the nlp
function from the spaCy library. We extract named entities from the text using the ents
attribute of the processed text and print the text and label of each named entity.
Exercise 39: Deep Learning
Concepts:
- Deep Learning
- TensorFlow library
- Convolutional Neural Network
- Model Training
- Model Evaluation
Description: Write a Python script that trains a deep learning model using the TensorFlow library.
Solution:
import tensorflow as tf
from tensorflow.keras import datasets, layers, models
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.callbacks import EarlyStopping
# Load the CIFAR-10 dataset
(train_images, train_labels), (test_images, test_labels) = datasets.cifar10.load_data()
# Normalize the pixel values
train_images, test_images = train_images / 255.0, test_images / 255.0
# Data Augmentation to prevent overfitting
datagen = ImageDataGenerator(
rotation_range=15,
width_shift_range=0.1,
height_shift_range=0.1,
horizontal_flip=True
)
# Define the model architecture
model = models.Sequential([
layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)),
layers.MaxPooling2D((2, 2)),
layers.Conv2D(64, (3, 3), activation='relu'),
layers.MaxPooling2D((2, 2)),
layers.Conv2D(128, (3, 3), activation='relu'),
layers.MaxPooling2D((2, 2)),
layers.Flatten(),
layers.Dense(128, activation='relu'),
layers.Dropout(0.5), # Prevent overfitting
layers.Dense(10, activation='softmax') # Use Softmax for probabilities
])
# Compile the model
model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(),
metrics=['accuracy'])
# Define early stopping to stop training if no improvement
early_stopping = EarlyStopping(monitor='val_loss', patience=3, restore_best_weights=True)
# Train the model with data augmentation
model.fit(datagen.flow(train_images, train_labels, batch_size=64),
validation_data=(test_images, test_labels),
epochs=30, callbacks=[early_stopping])
# Evaluate the model
test_loss, test_acc = model.evaluate(test_images, test_labels, verbose=2)
print('Test accuracy:', round(test_acc * 100, 2), '%')
In this exercise, we first load the CIFAR-10 dataset from the TensorFlow library using the load_data
function. We normalize the pixel values of the images by dividing them by 255.0. We define a deep learning model architecture using the Sequential
class from the TensorFlow library and various layers such as Conv2D
, MaxPooling2D
, Flatten
, and Dense
. We compile the model using the compile
method and train the model using the fit
method. We evaluate the model using the evaluate
method and print the test accuracy.
Exercise 40: Data Analysis
Concepts:
- Data Analysis
- Pandas library
- Data Cleaning
- Data Manipulation
- Data Visualization
Description: Write a Python script that analyzes data using the pandas library.
Solution:
import pandas as pd
import matplotlib.pyplot as plt
# Load the data
df = pd.read_csv('data.csv')
# Convert 'date' column to datetime format
df['date'] = pd.to_datetime(df['date'], errors='coerce')
# Drop rows with missing or invalid dates
df.dropna(subset=['date'], inplace=True)
# Convert 'price' and 'quantity' to numeric values (if not already)
df['price'] = pd.to_numeric(df['price'], errors='coerce')
df['quantity'] = pd.to_numeric(df['quantity'], errors='coerce')
# Drop rows with missing or invalid price/quantity
df.dropna(subset=['price', 'quantity'], inplace=True)
# Compute total sales
df['total_sales'] = df['price'] * df['quantity']
# Set date as index for proper resampling
df.set_index('date', inplace=True)
# Group by month and sum sales
monthly_sales = df.resample('M').sum()
# Visualize the data
plt.figure(figsize=(10, 5))
plt.plot(monthly_sales.index, monthly_sales['total_sales'], marker='o', linestyle='-')
plt.xlabel('Month')
plt.ylabel('Total Sales')
plt.title('Monthly Sales Trend')
plt.grid()
plt.xticks(rotation=45)
plt.show()
In this exercise, we first load data from a CSV file using the read_csv
function from the pandas library. We clean the data by removing any rows with missing values using the dropna
method. We manipulate the data by calculating the total sales for each transaction and grouping the data by month using the groupby
method. We visualize the data by plotting the total sales for each month using the plot
function from the matplotlib library.
Exercise 41: Data Science
Concepts:
- Data Science
- NumPy library
- pandas library
- Matplotlib library
- Data Cleaning
- Data Manipulation
- Data Visualization
Description: Write a Python script that performs data analysis on a dataset using the NumPy, pandas, and Matplotlib libraries.
Solution:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# Load the data
df = pd.read_csv('data.csv')
# Convert 'date' column to datetime format
df['date'] = pd.to_datetime(df['date'], errors='coerce')
# Drop rows with missing or invalid dates
df.dropna(subset=['date'], inplace=True)
# Convert 'price' and 'quantity' to numeric values (if not already)
df['price'] = pd.to_numeric(df['price'], errors='coerce')
df['quantity'] = pd.to_numeric(df['quantity'], errors='coerce')
# Drop rows with missing or invalid price/quantity
df.dropna(subset=['price', 'quantity'], inplace=True)
# Compute total sales
df['total_sales'] = df['price'] * df['quantity']
# Set date as index for proper resampling
df.set_index('date', inplace=True)
# Group by month and sum sales
monthly_sales = df.resample('M').sum()
# Analyze the data
print('Total Sales:', round(df['total_sales'].sum(), 2))
print('Average Price:', round(df['price'].mean(), 2))
print('Median Quantity:', df['quantity'].median())
# Visualize the data
plt.figure(figsize=(10, 5))
plt.plot(monthly_sales.index, monthly_sales['total_sales'], marker='o', linestyle='-', color='b', label='Total Sales')
plt.xlabel('Month')
plt.ylabel('Total Sales')
plt.title('Monthly Sales Trend')
plt.legend()
plt.grid()
plt.xticks(rotation=45)
plt.show()
In this exercise, we first load data from a CSV file using the read_csv
function from the pandas library. We clean the data by removing any rows with missing values using the dropna
method. We manipulate the data by calculating the total sales for each transaction and grouping the data by month using the groupby
method. We perform some basic data analysis by calculating the total sales, average price, and median quantity. We visualize the data by plotting the total sales for each month using the plot
function from the matplotlib library.
Exercise 42: Machine Learning
Concepts:
- Machine Learning
- scikit-learn library
- Support Vector Machines
- Model Training
- Model Evaluation
Description: Write a Python script that trains a machine learning model using the scikit-learn library.
Solution:
import numpy as np
from sklearn import datasets, svm
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
# Load the iris dataset
iris = datasets.load_iris()
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.2, random_state=42, stratify=iris.target)
# Standardize the data (SVMs perform better with scaled data)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
# Train a Support Vector Machine classifier
clf = svm.SVC(kernel='linear', C=1.0, random_state=42)
clf.fit(X_train_scaled, y_train)
# Predict the labels
y_pred = clf.predict(X_test_scaled)
# Evaluate the classifier
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy:.4f}\n')
# Print detailed evaluation metrics
print("Classification Report:\n", classification_report(y_test, y_pred, target_names=iris.target_names))
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))
In this exercise, we first load the iris dataset from the scikit-learn library using the load_iris
function. We split the data into training and testing sets using the train_test_split
function from the scikit-learn library. We train a support vector machine classifier using the SVC
class from the scikit-learn library with a linear kernel. We evaluate the classifier using the score
method and print the accuracy.
Exercise 43: Web Scraping
Concepts:
- Web Scraping
- BeautifulSoup library
- HTML Parsing
- Data Extraction
Description: Write a Python script that scrapes data from a website using the BeautifulSoup library.
Solution:
import requests
from bs4 import BeautifulSoup
# Define the target URL
url = 'https://en.wikipedia.org/wiki/Python_(programming_language)'
# Add headers to prevent request blocking
headers = {'User-Agent': 'Mozilla/5.0'}
# Fetch the HTML content of the website
response = requests.get(url, headers=headers)
# Check if the request was successful
if response.status_code != 200:
print(f"Error: Unable to fetch the page (Status Code: {response.status_code})")
exit()
# Parse the HTML content using BeautifulSoup
soup = BeautifulSoup(response.text, 'html.parser')
# Extract the page title
title = soup.title.string
print(f"\nPage Title: {title}\n")
# Extract all valid links
base_url = 'https://en.wikipedia.org'
links = []
for link in soup.find_all('a', href=True): # Ensures 'href' exists
href = link.get('href')
# Convert relative Wikipedia links to absolute URLs
if href.startswith('/wiki/'):
full_url = base_url + href
links.append(full_url)
elif href.startswith('http'): # Keep only valid external links
links.append(href)
# Print the first 10 links for brevity
print("Extracted Links:")
for l in links[:10]: # Limit output for readability
print(l)
print(f"\nTotal Links Found: {len(links)}")
In this exercise, we first fetch the HTML content of a website using the get
function from the requests library. We parse the HTML content using the BeautifulSoup
class from the BeautifulSoup library. We extract data from the HTML content using various methods such as title
and find_all
.
Exercise 44: Database Programming
Concepts:
- Database Programming
- SQLite library
- SQL
- Data Retrieval
- Data Manipulation
Description: Write a Python script that interacts with a database using the SQLite library.
Solution:
import sqlite3
# Connect to the database using a context manager
with sqlite3.connect('data.db') as conn:
cursor = conn.cursor()
# Create a table (if it doesn't exist)
cursor.execute('''CREATE TABLE IF NOT EXISTS users (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT NOT NULL,
age INTEGER NOT NULL);''')
# Insert data into the table (use parameterized queries)
cursor.execute("INSERT INTO users (name, age) VALUES (?, ?)", ('John Doe', 30))
cursor.execute("INSERT INTO users (name, age) VALUES (?, ?)", ('Jane Doe', 25))
# Retrieve data from the table
cursor.execute('SELECT * FROM users')
users = cursor.fetchall() # Fetch all rows
print("\nUsers in database:")
for user in users:
print(user)
# Update data using a parameterized query
cursor.execute("UPDATE users SET age = ? WHERE name = ?", (35, 'John Doe'))
# Delete data using a parameterized query
cursor.execute("DELETE FROM users WHERE name = ?", ('Jane Doe',))
# Commit the changes (happens automatically with `with` statement)
conn.commit()
print("\nDatabase operations completed successfully.")
In this exercise, we first connect to a SQLite database using the connect
function from the SQLite library. We create a table using SQL commands and insert data into the table using SQL commands. We retrieve data from the table using SQL commands and print the data. We update data in the table and delete data from the table using SQL commands. Finally, we commit the changes to the database and close the connection.
Exercise 45: Cloud Computing
Concepts:
- Cloud Computing
- AWS
- Flask library
- Boto3 library
- Web Application Deployment
Description: Write a Python script that deploys a web application to the AWS cloud platform using the Flask and Boto3 libraries.
Solution:
from flask import Flask
import boto3
import os
# Create a Flask application
app = Flask(__name__)
# AWS S3 Configuration
AWS_BUCKET_NAME = 'my-bucket'
AWS_REGION = 'us-east-1' # Change to your region
# Upload function for AWS S3
def upload_to_s3(file_name, bucket_name, object_name=None):
"""Uploads a file to S3"""
try:
s3 = boto3.client('s3') # Ensure credentials are configured
object_name = object_name or file_name # Default object name
# Upload file
s3.upload_file(file_name, bucket_name, object_name)
print(f"File '{file_name}' uploaded successfully to S3 bucket '{bucket_name}'.")
except Exception as e:
print(f"Error uploading to S3: {e}")
# Define a route
@app.route('/')
def hello():
return 'Hello, world! Flask is running!'
# Run the application
if __name__ == '__main__':
# Upload a file to S3 before starting Flask (Optional)
if os.path.exists('app.py'):
upload_to_s3('app.py', AWS_BUCKET_NAME)
# Run Flask server
app.run(host='0.0.0.0', port=5000, debug=True)
In this exercise, we first install the required libraries for deploying a Flask web application to the AWS cloud platform. We create a simple Flask application that defines a single route. We use the upload_file
method from the Boto3 library to upload the application to an AWS S3 bucket. Note that this is only a basic example and there are many additional steps involved in deploying a web application to the AWS cloud platform, such as creating an EC2 instance, setting up a load balancer, configuring security groups, and more.
Exercise 46: Natural Language Processing
Concepts:
- Natural Language Processing
- NLTK library
- Tokenization
- Part-of-Speech Tagging
- Named Entity Recognition
Description: Write a Python script that performs natural language processing on text data using the NLTK library.
Solution:
import nltk
# Download required NLTK models
nltk.download('punkt')
nltk.download('maxent_ne_chunker')
nltk.download('words')
nltk.download('averaged_perceptron_tagger')
# Load the text data
text = '''Apple Inc. is an American multinational technology company headquartered in Cupertino, California, that designs, develops, and sells consumer electronics, computer software, and online services. The company's hardware products include the iPhone smartphone, the iPad tablet computer, the Mac personal computer, the iPod portable media player, the Apple Watch smartwatch, the Apple TV digital media player, and the HomePod smart speaker. Apple's software includes the macOS and iOS operating systems, the iTunes media player, the Safari web browser, and the iLife and iWork creativity and productivity suites. Its online services include the iTunes Store, the iOS App Store, and Mac App Store, Apple Music, and iCloud.'''
# Tokenize the text
tokens = nltk.word_tokenize(text)
# Perform part-of-speech tagging
pos_tags = nltk.pos_tag(tokens)
# Perform named entity recognition
ne_tags = nltk.ne_chunk(pos_tags)
# Extract named entities
named_entities = {}
for chunk in ne_tags:
if hasattr(chunk, 'label'):
entity_type = chunk.label() # Get entity type (e.g., ORGANIZATION, PERSON)
entity_name = ' '.join(c[0] for c in chunk) # Join words in entity
if entity_type not in named_entities:
named_entities[entity_type] = []
named_entities[entity_type].append(entity_name)
# Print structured named entities
print("\nNamed Entities Found:")
for entity_type, names in named_entities.items():
print(f"{entity_type}: {', '.join(set(names))}") # Use `set()` to remove duplicates
In this exercise, we first load some text data. We tokenize the text using the word_tokenize
function from the NLTK library. We perform part-of-speech tagging using the pos_tag
function from the NLTK library. We perform named entity recognition using the ne_chunk
function from the NLTK library. We print the named entities in the text data by checking if each chunk has a label of 'ORGANIZATION' or 'PERSON' using the hasattr
function and label
attribute.
Exercise 47: Big Data
Concepts:
- Big Data
- PySpark
- Apache Spark
- Data Processing
- MapReduce
Description: Write a PySpark script that processes data using the Spark framework.
Solution:
from pyspark import SparkContext, SparkConf
# Configure the Spark context
conf = SparkConf().setAppName('wordcount').setMaster('local[*]')
sc = SparkContext(conf=conf)
# Load the text data
text = sc.textFile('data.txt')
# Split the text into words and count the occurrences of each word
word_counts = text.flatMap(lambda line: line.split(' ')).map(lambda word: (word, 1)).reduceByKey(lambda a, b: a + b)
# Print the word counts
for word, count in word_counts.collect():
print(word, count)
# Stop the Spark context
sc.stop()
In this exercise, we first configure the Spark context using the SparkConf
and SparkContext
classes from the PySpark library. We load some text data using the textFile
method. We split the text into words and count the occurrences of each word using the flatMap
, map
, and reduceByKey
methods. We print the word counts using the collect
method. Finally, we stop the Spark context using the stop
method.
Exercise 48: Cybersecurity
Concepts:
- Cybersecurity
- Scapy library
- Network Analysis
- Packet Sniffing
Description: Write a Python script that performs security analysis on a network using the Scapy library.
Solution:
from scapy.all import *
# Define a packet handler function
def packet_handler(packet):
if packet.haslayer(TCP):
if packet[TCP].flags & 2:
print('SYN packet detected:', packet.summary())
# Start the packet sniffer
sniff(prn=packet_handler, filter='tcp', store=0)
In this exercise, we use the Scapy library to perform security analysis on a network. We define a packet handler function that is called for each packet that is sniffed. We check if the packet is a TCP packet and if it has the SYN flag set. If so, we print a message indicating that a SYN packet has been detected, along with a summary of the packet.
Exercise 49: Machine Learning
Concepts:
- Machine Learning
- Scikit-learn library
- Model Training
- Cross-Validation
- Grid Search
Description: Write a Python script that trains a machine learning model using the scikit-learn library.
Solution:
from sklearn import datasets
from sklearn.model_selection import cross_val_score, GridSearchCV
from sklearn.neighbors import KNeighborsClassifier
# Load the dataset
iris = datasets.load_iris()
# Split the dataset into features and target
X = iris.data
y = iris.target
# Define the hyperparameters to search
param_grid = {'n_neighbors': [3, 5, 7, 9], 'weights': ['uniform', 'distance']}
# Create a KNN classifier
knn = KNeighborsClassifier()
# Perform a grid search with cross-validation
grid_search = GridSearchCV(knn, param_grid, cv=5)
grid_search.fit(X, y)
# Print the best hyperparameters and the accuracy score
print('Best Hyperparameters:', grid_search.best_params_)
print('Accuracy Score:', grid_search.best_score_)
In this exercise, we use the scikit-learn library to train a machine learning model. We load a dataset using the load_iris
function from the datasets
module. We split the dataset into features and target. We define a dictionary of hyperparameters to search over using the param_grid
variable. We create a KNN classifier using the KNeighborsClassifier
class. We perform a grid search with cross-validation using the GridSearchCV
class. We print the best hyperparameters and the accuracy score using the best_params_
and best_score_
attributes.
Exercise 50: Computer Vision
Concepts:
- Computer Vision
- OpenCV library
- Image Processing
- Object Detection
Description: Write a Python script that performs image processing using the OpenCV library.
Solution:
import cv2
# Load the image
img = cv2.imread('image.jpg')
# Convert the image to grayscale
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# Define a classifier for face detection
face_cascade = cv2.CascadeClassifier('haarcascade_frontalface_default.xml')
# Detect faces in the image
faces = face_cascade.detectMultiScale(gray, scaleFactor=1.1, minNeighbors=5)
# Draw rectangles around the detected faces
for (x, y, w, h) in faces:
cv2.rectangle(img, (x, y), (x + w, y + h), (0, 255, 0), 2)
# Display the image with the detected faces
cv2.imshow('image', img)
cv2.waitKey(0)
cv2.destroyAllWindows()
In this exercise, we use the OpenCV library to perform image processing. We load an image using the imread
function. We convert the image to grayscale using the cvtColor
function. We define a classifier for face detection using the CascadeClassifier
class and a pre-trained classifier file. We detect faces in the image using the detectMultiScale
function. We draw rectangles around the detected faces using the rectangle
function. We display the image with the detected faces using the imshow
, waitKey
, and destroyAllWindows
functions.
Advance Level Exercises Part 2
Exercise 26: Machine Learning
Concepts:
- Machine Learning
- Scikit-Learn library
- Data Preprocessing
- Feature Engineering
- Model Training
- Model Evaluation
Description: Write a Python script that uses machine learning techniques to train a model and make predictions on new data.
Solution:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
# Read the data into a pandas dataframe
df = pd.read_csv('data.csv')
# Check for missing values
if df.isnull().sum().sum() > 0:
print("Warning: Missing values detected. Filling with mean values.")
df = df.fillna(df.mean()) # Alternatively, df.dropna() to remove rows with NaN values
# Ensure target column exists
if 'target' not in df.columns:
raise ValueError("Error: 'target' column not found in dataset.")
# Split the data into features and labels
X = df.drop(columns=['target'])
y = df['target']
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)
# Scale the data using standardization
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
# Train a logistic regression model with class balancing
model = LogisticRegression(random_state=42, class_weight='balanced')
model.fit(X_train_scaled, y_train)
# Make predictions on the test set
y_pred = model.predict(X_test_scaled)
# Evaluate the model performance
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred, average='weighted') # Supports multi-class
recall = recall_score(y_test, y_pred, average='weighted')
f1 = f1_score(y_test, y_pred, average='weighted')
# Print evaluation metrics
print('Accuracy:', round(accuracy, 4))
print('Precision:', round(precision, 4))
print('Recall:', round(recall, 4))
print('F1 score:', round(f1, 4))
In this exercise, we first read a dataset into a pandas dataframe. We split the data into training and testing sets using the train_test_split
function from the sklearn.model_selection
module. We scale the data using standardization using the StandardScaler
class from the sklearn.preprocessing
module. We train a logistic regression model using the LogisticRegression
class from the sklearn.linear_model
module and make predictions on the test set. Finally, we evaluate the performance of the model using metrics such as accuracy, precision, recall, and F1 score using the appropriate functions from the sklearn.metrics
module.
Exercise 27: Web Development
Concepts:
- Web Development
- Flask framework
- HTML templates
- Routing
- HTTP methods
- Form handling
Description: Write a Python script that creates a web application using the Flask framework.
Solution:
from flask import Flask, render_template, request
app = Flask(__name__)
# Define a route for the home page
@app.route('/')
def home():
return render_template('home.html')
# Define a route for the contact page
@app.route('/contact', methods=['GET', 'POST'])
def contact():
if request.method == 'POST':
name = request.form['name']
email = request.form['email']
message = request.form['message']
# TODO: Process the form data
return 'Thanks for contacting us!'
else:
return render_template('contact.html')
if __name__ == '__main__':
app.run(debug=True)
In this exercise, we first import the Flask
class from the flask
module and create a new Flask application. We define routes for the home page and contact page using the route
decorator. We use the render_template
function to render HTML templates for the home page and contact page. We handle form submissions on the contact page using the request
object and the POST
method. Finally, we start the Flask application using the run
method.
Exercise 28: Data Streaming
Concepts:
- Data Streaming
- Kafka
- PyKafka library
- Stream Processing
Description: Write a Python script that streams data from a source and processes it in real-time.
Solution:
from pykafka import KafkaClient
import json
# Kafka broker configuration
KAFKA_BROKER = 'localhost:9092'
TOPIC_NAME = 'test'
try:
# Connect to Kafka broker
client = KafkaClient(hosts=KAFKA_BROKER)
# Get a reference to the topic
topic = client.topics[TOPIC_NAME]
# Create a consumer
consumer = topic.get_simple_consumer()
print(f"Connected to Kafka broker at {KAFKA_BROKER}, consuming messages from topic '{TOPIC_NAME}'...")
# Process messages in real-time
for message in consumer:
if message is not None:
try:
data = json.loads(message.value.decode('utf-8')) # Decode & parse JSON safely
print("Received message:", data)
# TODO: Process the data in real-time
except json.JSONDecodeError as e:
print(f"Error decoding JSON: {e} - Raw message: {message.value}")
except Exception as e:
print(f"Kafka connection error: {e}")
finally:
if 'consumer' in locals():
consumer.stop() # Ensure consumer is properly stopped
print("Kafka consumer stopped.")
In this exercise, we first connect to a Kafka broker using the KafkaClient
class from the pykafka
library. We get a reference to a topic and create a consumer for the topic using the get_simple_consumer
method. We process messages in real-time using a loop and the value
attribute of the messages. We parse the message data using the json.loads
function and process the data in real-time.
Exercise 29: Natural Language Processing
Concepts:
- Natural Language Processing
- NLTK library
- Tokenization
- Stemming
- Stop Words Removal
Description: Write a Python script that performs natural language processing tasks on a text corpus.
Solution:
import nltk
from nltk.tokenize import word_tokenize
from nltk.stem import PorterStemmer
from nltk.corpus import stopwords
# Download NLTK data
nltk.download('punkt')
nltk.download('stopwords')
# Load the text corpus
with open('corpus.txt', 'r') as f:
corpus = f.read()
# Tokenize the corpus
tokens = word_tokenize(corpus)
# Remove stop words
stop_words = set(stopwords.words('english'))
filtered_tokens = [token for token in tokens if token.lower() not in stop_words]
# Stem the tokens
stemmer = PorterStemmer()
stemmed_tokens = [stemmer.stem(token) for token in filtered_tokens]
# Print the results
print('Original tokens:', tokens[:10])
print('Filtered tokens:', filtered_tokens[:10])
print('Stemmed tokens:', stemmed_tokens[:10])
In this exercise, we first download the necessary data from the NLTK library using the nltk.download
function. We load a text corpus from a file and tokenize the corpus using the word_tokenize
function from the nltk.tokenize
module. We remove stop words using the stopwords
corpus from the NLTK library and stem the tokens using the PorterStemmer
class from the nltk.stem
module. Finally, we print the results for the original, filtered, and stemmed tokens.
Exercise 30: Distributed Systems
Concepts:
- Distributed Systems
- Pyro library
- Remote Method Invocation
- Client-Server Architecture
Description: Write a Python script that implements a distributed system using the Pyro library.
Solution:
import Pyro4
# Define a remote object class
@Pyro4.expose
class MyObject:
def method1(self, arg1):
return f"Processed method1 with argument: {arg1}"
def method2(self, arg2):
return f"Processed method2 with argument: {arg2}"
# Start the server
if __name__ == '__main__':
# Locate the name server
ns = Pyro4.locateNS()
# Create a Pyro daemon
daemon = Pyro4.Daemon()
# Register the remote object with the daemon
uri = daemon.register(MyObject)
# Register the object with the name server
ns.register('myobject', uri)
print(f"MyObject is now available. URI: {uri}")
# Run the server loop
daemon.requestLoop()
In this exercise, we first define a remote object class using the expose
decorator from the Pyro4
library. We implement two methods that can be invoked remotely by a client. We register the remote object using the register
method of a Pyro4
daemon. We start the name server using the locateNS
function from the Pyro4
library and register the remote object with a name. Finally, we start the server using the requestLoop
method of the daemon.
I hope you find these exercises helpful! Let me know if you have any further questions.
Exercise 31: Data Visualization
Concepts:
- Data Visualization
- Plotly library
- Line Chart
- Scatter Chart
- Bar Chart
- Heatmap
- Subplots
Description: Write a Python script that creates interactive visualizations of data using the Plotly library.
Solution:
import plotly.graph_objs as go
import pandas as pd
from plotly.subplots import make_subplots # Correct import
# Load the data
df = pd.read_csv('data.csv')
# Ensure 'quarter' is a string (for heatmap y-axis)
df['quarter'] = df['quarter'].astype(str)
# Create traces
trace1 = go.Scatter(x=df['year'], y=df['sales'], mode='lines', name='Sales')
trace2 = go.Scatter(x=df['year'], y=df['profit'], mode='markers', name='Profit')
trace3 = go.Bar(x=df['year'], y=df['expenses'], name='Expenses')
trace4 = go.Heatmap(x=df['year'], y=df['quarter'], z=df['revenue'], colorscale='Viridis', name='Revenue')
# Create subplots
fig = make_subplots(rows=2, cols=2, subplot_titles=('Sales', 'Profit', 'Expenses', 'Revenue'))
# Add traces correctly
fig.add_trace(trace1, row=1, col=1)
fig.add_trace(trace2, row=1, col=2)
fig.add_trace(trace3, row=2, col=1)
fig.add_trace(trace4, row=2, col=2)
# Update layout for better visualization
fig.update_layout(title='Financial Performance', height=800, width=1000)
# Display the chart
fig.show()
In this exercise, we first load a dataset into a pandas dataframe. We create several chart objects using the Scatter
, Bar
, and Heatmap
classes from the plotly.graph_objs
module. We create subplots using the make_subplots
function from the plotly.subplots
module and add the chart objects to the subplots using the append_trace
method. We set the layout of the chart using the update_layout
method and display the chart using the show
method.
Exercise 32: Data Engineering
Concepts:
- Data Engineering
- SQLite
- Pandas library
- Data Transformation
- Data Integration
Description: Write a Python script that processes data from multiple sources and stores it in a database.
Solution:
import sqlite3
import pandas as pd
# Load data from multiple sources into pandas DataFrames
df1 = pd.read_csv('data1.csv')
df2 = pd.read_excel('data2.xlsx')
df3 = pd.read_json('data3.json')
# Standardize column names across datasets
expected_columns = ['date', 'amount', 'description'] # Adjust based on actual dataset
df1 = df1.reindex(columns=expected_columns, fill_value=None)
df2 = df2.reindex(columns=expected_columns, fill_value=None)
df3 = df3.reindex(columns=expected_columns, fill_value=None)
# Data Cleaning & Transformation
df1['date'] = pd.to_datetime(df1['date'], errors='coerce') # Handle invalid dates
df2['amount'] = df2['amount'].astype(float) / 100 # Convert to proper currency format
df3['description'] = df3['description'].astype(str).str.upper() # Ensure consistency
# Merge DataFrames while handling missing values
df = pd.concat([df1, df2, df3], axis=0).fillna({'amount': 0, 'description': 'UNKNOWN'})
# Store the data in a SQLite database safely
db_file = 'mydb.db'
table_name = 'mytable'
with sqlite3.connect(db_file) as conn:
df.to_sql(table_name, conn, if_exists='replace', index=False)
print(f"Data successfully saved to SQLite table '{table_name}' in '{db_file}'.")
In this exercise, we first load data from multiple sources into pandas dataframes using functions such as read_csv
, read_excel
, and read_json
. We transform the data using pandas functions such as to_datetime
, str.upper
, and arithmetic operations. We combine the data into a single pandas dataframe using the concat
function. Finally, we store the data in a SQLite database using the to_sql
method of the pandas dataframe.
Exercise 33: Natural Language Generation
Concepts:
- Natural Language Generation
- Markov Chains
- NLTK library
- Text Corpus
Description: Write a Python script that generates text using natural language generation techniques.
Solution:
import nltk
import random
import os
# Download necessary NLTK resources
nltk.download('punkt')
# Define corpus file
corpus_file = 'corpus.txt'
# Ensure the corpus file exists
if not os.path.exists(corpus_file):
raise FileNotFoundError(f"Error: The file '{corpus_file}' was not found.")
# Load the text corpus
with open(corpus_file, 'r', encoding='utf-8') as f:
corpus = f.read()
# Tokenize the corpus
tokens = nltk.word_tokenize(corpus)
# Build a dictionary of word transitions (Markov Chain)
chain = {}
for i in range(len(tokens) - 1):
word1 = tokens[i]
word2 = tokens[i + 1]
if word1 in chain:
chain[word1].append(word2)
else:
chain[word1] = [word2]
# Generate text using Markov chains
start_word = random.choice(list(chain.keys()))
sentence = [start_word.capitalize()]
while len(sentence) < 100: # Limit by word count
last_word = sentence[-1].lower() # Ensure consistent lookup
if last_word in chain:
next_word = random.choice(chain[last_word])
sentence.append(next_word)
else:
break # Stop if there are no next words
# Print the generated text
print(' '.join(sentence))
In this exercise, we first download the necessary data from the NLTK library using the nltk.download
function. We load a text corpus from a file and tokenize the corpus using the word_tokenize
function from the nltk
library. We build a dictionary of word transitions using a loop and generate text using Markov chains. We start by selecting a random word from the dictionary and then randomly select a next word from the list of possible transitions. We continue to add words to the sentence until it reaches a specified length. Finally, we print the generated text.
Exercise 34: Machine Learning
Concepts:
- Machine Learning
- Scikit-learn library
- Decision Tree Classifier
- Model Training
- Model Evaluation
Description: Write a Python script that trains a machine learning model using the scikit-learn library.
Solution:
from sklearn import datasets
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report
# Load the iris dataset
iris = datasets.load_iris()
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.3, random_state=42, stratify=iris.target)
# Train a decision tree classifier with hyperparameter tuning
clf = DecisionTreeClassifier(max_depth=4, min_samples_split=5, random_state=42)
clf.fit(X_train, y_train)
# Make predictions
y_pred = clf.predict(X_test)
# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print('Accuracy:', round(accuracy, 4))
print('\nClassification Report:\n', classification_report(y_test, y_pred, target_names=iris.target_names))
# Feature Importance Analysis
feature_importances = dict(zip(iris.feature_names, clf.feature_importances_))
print("\nFeature Importances:", feature_importances)
In this exercise, we first load the iris dataset from the scikit-learn library using the load_iris
function. We split the data into training and testing sets using the train_test_split
function. We train a decision tree classifier using the DecisionTreeClassifier
class and the fit
method. We evaluate the model using the predict
method and the accuracy_score
function from the sklearn.metrics
module.
Exercise 35: Computer Vision
Concepts:
- Computer Vision
- OpenCV library
- Image Loading
- Image Filtering
- Image Segmentation
Description: Write a Python script that performs computer vision tasks on images using the OpenCV library.
Solution:
import cv2
import os
# Load an image safely
image_path = 'image.jpg'
if not os.path.exists(image_path):
raise FileNotFoundError(f"Error: '{image_path}' not found.")
img = cv2.imread(image_path)
# Convert to grayscale
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# Apply median filter
filtered = cv2.medianBlur(gray, 5)
# Apply adaptive thresholding
thresh = cv2.adaptiveThreshold(filtered, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 11, 2)
# Apply morphological operations
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (3, 3))
closed = cv2.morphologyEx(thresh, cv2.MORPH_CLOSE, kernel)
# Find contours
contours_info = cv2.findContours(closed, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
contours = contours_info[0] if len(contours_info) == 2 else contours_info[1] # Safe unpacking
# Draw contours on the original image
cv2.drawContours(img, contours, -1, (0, 0, 255), 2)
# Save and display the processed images
cv2.imwrite('output_contours.jpg', img)
cv2.imwrite('output_thresholded.jpg', thresh)
cv2.imwrite('output_closed.jpg', closed)
print("Processing complete. Images saved as 'output_contours.jpg', 'output_thresholded.jpg', and 'output_closed.jpg'.")
# Display the images (comment out if running on a headless system)
cv2.imshow('Original', img)
cv2.imshow('Thresholded', thresh)
cv2.imshow('Closed', closed)
cv2.waitKey(0)
cv2.destroyAllWindows()
In this exercise, we first load an image using the imread
function from the OpenCV library. We convert the image to grayscale using the cvtColor
function and apply a median filter to the image using the medianBlur
function. We apply adaptive thresholding to the image using the adaptiveThreshold
function and morphological operations to the image using the getStructuringElement
and morphologyEx
functions. We find contours in the image using the findContours
function and draw the contours on the original image using the drawContours
function. Finally, we display the images using the imshow
function.
I hope you find these exercises helpful! Let me know if you have any further questions.
Exercise 36: Network Programming
Concepts:
- Network Programming
- Socket library
- Client-Server Architecture
- Protocol Implementation
Description: Write a Python script that communicates with a remote server using the socket library.
Solution:
import socket
# Create a socket object
s = socket.socket()
# Define the server address and port number
host = 'localhost'
port = 12345
# Connect to the server
s.connect((host, port))
# Send data to the server
s.send(b'Hello, server!')
# Receive data from the server
data = s.recv(1024)
# Close the socket
s.close()
# Print the received data
print('Received:', data.decode())
In this exercise, we first create a socket object using the socket
function from the socket library. We define the address and port number of the server we want to connect to. We connect to the server using the connect
method of the socket object. We send data to the server using the send
method and receive data from the server using the recv
method. Finally, we close the socket using the close
method and print the received data.
Exercise 37: Cloud Computing
Concepts:
- Cloud Computing
- Heroku
- Flask
- Web Application Deployment
Description: Write a Python script that deploys a Flask web application to the Heroku cloud platform.
Solution:
from flask import Flask
# Create a Flask application
app = Flask(__name__)
# Define a route
@app.route('/')
def hello():
return 'Hello, world!'
# Run the application (for development only)
if __name__ == '__main__':
app.run(host='0.0.0.0', port=5000, debug=True) # Set debug=True only for development
In this exercise, we first install the required libraries for deploying a Flask web application to the Heroku cloud platform. We create a simple Flask application that defines a single route. We use the run
method of the Flask object to run the application locally. To deploy the application to the Heroku cloud platform, we need to follow the instructions provided by Heroku and push our code to a remote repository.
Exercise 38: Natural Language Processing
Concepts:
- Natural Language Processing
- spaCy library
- Named Entity Recognition
- Text Processing
Description: Write a Python script that performs named entity recognition on text using the spaCy library.
Solution:
import spacy
# Ensure the model is installed before running the script:
# Run: python -m spacy download en_core_web_sm
# Load the English language model
try:
nlp = spacy.load('en_core_web_sm')
except OSError:
raise OSError("Spacy model 'en_core_web_sm' not found. Run 'python -m spacy download en_core_web_sm' and try again.")
# Define some text to process
text = 'Barack Obama was born in Hawaii.'
# Process the text
doc = nlp(text)
# Extract named entities from the text
entities = [(ent.text, ent.label_) for ent in doc.ents]
# Display results
if entities:
print("\nNamed Entities Found:")
for text, label in entities:
print(f" - {text}: {label}")
else:
print("\nNo named entities found in the text.")
In this exercise, we first load the English language model using the load
function from the spaCy library. We define some text to process and process the text using the nlp
function from the spaCy library. We extract named entities from the text using the ents
attribute of the processed text and print the text and label of each named entity.
Exercise 39: Deep Learning
Concepts:
- Deep Learning
- TensorFlow library
- Convolutional Neural Network
- Model Training
- Model Evaluation
Description: Write a Python script that trains a deep learning model using the TensorFlow library.
Solution:
import tensorflow as tf
from tensorflow.keras import datasets, layers, models
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.callbacks import EarlyStopping
# Load the CIFAR-10 dataset
(train_images, train_labels), (test_images, test_labels) = datasets.cifar10.load_data()
# Normalize the pixel values
train_images, test_images = train_images / 255.0, test_images / 255.0
# Data Augmentation to prevent overfitting
datagen = ImageDataGenerator(
rotation_range=15,
width_shift_range=0.1,
height_shift_range=0.1,
horizontal_flip=True
)
# Define the model architecture
model = models.Sequential([
layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)),
layers.MaxPooling2D((2, 2)),
layers.Conv2D(64, (3, 3), activation='relu'),
layers.MaxPooling2D((2, 2)),
layers.Conv2D(128, (3, 3), activation='relu'),
layers.MaxPooling2D((2, 2)),
layers.Flatten(),
layers.Dense(128, activation='relu'),
layers.Dropout(0.5), # Prevent overfitting
layers.Dense(10, activation='softmax') # Use Softmax for probabilities
])
# Compile the model
model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(),
metrics=['accuracy'])
# Define early stopping to stop training if no improvement
early_stopping = EarlyStopping(monitor='val_loss', patience=3, restore_best_weights=True)
# Train the model with data augmentation
model.fit(datagen.flow(train_images, train_labels, batch_size=64),
validation_data=(test_images, test_labels),
epochs=30, callbacks=[early_stopping])
# Evaluate the model
test_loss, test_acc = model.evaluate(test_images, test_labels, verbose=2)
print('Test accuracy:', round(test_acc * 100, 2), '%')
In this exercise, we first load the CIFAR-10 dataset from the TensorFlow library using the load_data
function. We normalize the pixel values of the images by dividing them by 255.0. We define a deep learning model architecture using the Sequential
class from the TensorFlow library and various layers such as Conv2D
, MaxPooling2D
, Flatten
, and Dense
. We compile the model using the compile
method and train the model using the fit
method. We evaluate the model using the evaluate
method and print the test accuracy.
Exercise 40: Data Analysis
Concepts:
- Data Analysis
- Pandas library
- Data Cleaning
- Data Manipulation
- Data Visualization
Description: Write a Python script that analyzes data using the pandas library.
Solution:
import pandas as pd
import matplotlib.pyplot as plt
# Load the data
df = pd.read_csv('data.csv')
# Convert 'date' column to datetime format
df['date'] = pd.to_datetime(df['date'], errors='coerce')
# Drop rows with missing or invalid dates
df.dropna(subset=['date'], inplace=True)
# Convert 'price' and 'quantity' to numeric values (if not already)
df['price'] = pd.to_numeric(df['price'], errors='coerce')
df['quantity'] = pd.to_numeric(df['quantity'], errors='coerce')
# Drop rows with missing or invalid price/quantity
df.dropna(subset=['price', 'quantity'], inplace=True)
# Compute total sales
df['total_sales'] = df['price'] * df['quantity']
# Set date as index for proper resampling
df.set_index('date', inplace=True)
# Group by month and sum sales
monthly_sales = df.resample('M').sum()
# Visualize the data
plt.figure(figsize=(10, 5))
plt.plot(monthly_sales.index, monthly_sales['total_sales'], marker='o', linestyle='-')
plt.xlabel('Month')
plt.ylabel('Total Sales')
plt.title('Monthly Sales Trend')
plt.grid()
plt.xticks(rotation=45)
plt.show()
In this exercise, we first load data from a CSV file using the read_csv
function from the pandas library. We clean the data by removing any rows with missing values using the dropna
method. We manipulate the data by calculating the total sales for each transaction and grouping the data by month using the groupby
method. We visualize the data by plotting the total sales for each month using the plot
function from the matplotlib library.
Exercise 41: Data Science
Concepts:
- Data Science
- NumPy library
- pandas library
- Matplotlib library
- Data Cleaning
- Data Manipulation
- Data Visualization
Description: Write a Python script that performs data analysis on a dataset using the NumPy, pandas, and Matplotlib libraries.
Solution:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# Load the data
df = pd.read_csv('data.csv')
# Convert 'date' column to datetime format
df['date'] = pd.to_datetime(df['date'], errors='coerce')
# Drop rows with missing or invalid dates
df.dropna(subset=['date'], inplace=True)
# Convert 'price' and 'quantity' to numeric values (if not already)
df['price'] = pd.to_numeric(df['price'], errors='coerce')
df['quantity'] = pd.to_numeric(df['quantity'], errors='coerce')
# Drop rows with missing or invalid price/quantity
df.dropna(subset=['price', 'quantity'], inplace=True)
# Compute total sales
df['total_sales'] = df['price'] * df['quantity']
# Set date as index for proper resampling
df.set_index('date', inplace=True)
# Group by month and sum sales
monthly_sales = df.resample('M').sum()
# Analyze the data
print('Total Sales:', round(df['total_sales'].sum(), 2))
print('Average Price:', round(df['price'].mean(), 2))
print('Median Quantity:', df['quantity'].median())
# Visualize the data
plt.figure(figsize=(10, 5))
plt.plot(monthly_sales.index, monthly_sales['total_sales'], marker='o', linestyle='-', color='b', label='Total Sales')
plt.xlabel('Month')
plt.ylabel('Total Sales')
plt.title('Monthly Sales Trend')
plt.legend()
plt.grid()
plt.xticks(rotation=45)
plt.show()
In this exercise, we first load data from a CSV file using the read_csv
function from the pandas library. We clean the data by removing any rows with missing values using the dropna
method. We manipulate the data by calculating the total sales for each transaction and grouping the data by month using the groupby
method. We perform some basic data analysis by calculating the total sales, average price, and median quantity. We visualize the data by plotting the total sales for each month using the plot
function from the matplotlib library.
Exercise 42: Machine Learning
Concepts:
- Machine Learning
- scikit-learn library
- Support Vector Machines
- Model Training
- Model Evaluation
Description: Write a Python script that trains a machine learning model using the scikit-learn library.
Solution:
import numpy as np
from sklearn import datasets, svm
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
# Load the iris dataset
iris = datasets.load_iris()
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.2, random_state=42, stratify=iris.target)
# Standardize the data (SVMs perform better with scaled data)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
# Train a Support Vector Machine classifier
clf = svm.SVC(kernel='linear', C=1.0, random_state=42)
clf.fit(X_train_scaled, y_train)
# Predict the labels
y_pred = clf.predict(X_test_scaled)
# Evaluate the classifier
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy:.4f}\n')
# Print detailed evaluation metrics
print("Classification Report:\n", classification_report(y_test, y_pred, target_names=iris.target_names))
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))
In this exercise, we first load the iris dataset from the scikit-learn library using the load_iris
function. We split the data into training and testing sets using the train_test_split
function from the scikit-learn library. We train a support vector machine classifier using the SVC
class from the scikit-learn library with a linear kernel. We evaluate the classifier using the score
method and print the accuracy.
Exercise 43: Web Scraping
Concepts:
- Web Scraping
- BeautifulSoup library
- HTML Parsing
- Data Extraction
Description: Write a Python script that scrapes data from a website using the BeautifulSoup library.
Solution:
import requests
from bs4 import BeautifulSoup
# Define the target URL
url = 'https://en.wikipedia.org/wiki/Python_(programming_language)'
# Add headers to prevent request blocking
headers = {'User-Agent': 'Mozilla/5.0'}
# Fetch the HTML content of the website
response = requests.get(url, headers=headers)
# Check if the request was successful
if response.status_code != 200:
print(f"Error: Unable to fetch the page (Status Code: {response.status_code})")
exit()
# Parse the HTML content using BeautifulSoup
soup = BeautifulSoup(response.text, 'html.parser')
# Extract the page title
title = soup.title.string
print(f"\nPage Title: {title}\n")
# Extract all valid links
base_url = 'https://en.wikipedia.org'
links = []
for link in soup.find_all('a', href=True): # Ensures 'href' exists
href = link.get('href')
# Convert relative Wikipedia links to absolute URLs
if href.startswith('/wiki/'):
full_url = base_url + href
links.append(full_url)
elif href.startswith('http'): # Keep only valid external links
links.append(href)
# Print the first 10 links for brevity
print("Extracted Links:")
for l in links[:10]: # Limit output for readability
print(l)
print(f"\nTotal Links Found: {len(links)}")
In this exercise, we first fetch the HTML content of a website using the get
function from the requests library. We parse the HTML content using the BeautifulSoup
class from the BeautifulSoup library. We extract data from the HTML content using various methods such as title
and find_all
.
Exercise 44: Database Programming
Concepts:
- Database Programming
- SQLite library
- SQL
- Data Retrieval
- Data Manipulation
Description: Write a Python script that interacts with a database using the SQLite library.
Solution:
import sqlite3
# Connect to the database using a context manager
with sqlite3.connect('data.db') as conn:
cursor = conn.cursor()
# Create a table (if it doesn't exist)
cursor.execute('''CREATE TABLE IF NOT EXISTS users (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT NOT NULL,
age INTEGER NOT NULL);''')
# Insert data into the table (use parameterized queries)
cursor.execute("INSERT INTO users (name, age) VALUES (?, ?)", ('John Doe', 30))
cursor.execute("INSERT INTO users (name, age) VALUES (?, ?)", ('Jane Doe', 25))
# Retrieve data from the table
cursor.execute('SELECT * FROM users')
users = cursor.fetchall() # Fetch all rows
print("\nUsers in database:")
for user in users:
print(user)
# Update data using a parameterized query
cursor.execute("UPDATE users SET age = ? WHERE name = ?", (35, 'John Doe'))
# Delete data using a parameterized query
cursor.execute("DELETE FROM users WHERE name = ?", ('Jane Doe',))
# Commit the changes (happens automatically with `with` statement)
conn.commit()
print("\nDatabase operations completed successfully.")
In this exercise, we first connect to a SQLite database using the connect
function from the SQLite library. We create a table using SQL commands and insert data into the table using SQL commands. We retrieve data from the table using SQL commands and print the data. We update data in the table and delete data from the table using SQL commands. Finally, we commit the changes to the database and close the connection.
Exercise 45: Cloud Computing
Concepts:
- Cloud Computing
- AWS
- Flask library
- Boto3 library
- Web Application Deployment
Description: Write a Python script that deploys a web application to the AWS cloud platform using the Flask and Boto3 libraries.
Solution:
from flask import Flask
import boto3
import os
# Create a Flask application
app = Flask(__name__)
# AWS S3 Configuration
AWS_BUCKET_NAME = 'my-bucket'
AWS_REGION = 'us-east-1' # Change to your region
# Upload function for AWS S3
def upload_to_s3(file_name, bucket_name, object_name=None):
"""Uploads a file to S3"""
try:
s3 = boto3.client('s3') # Ensure credentials are configured
object_name = object_name or file_name # Default object name
# Upload file
s3.upload_file(file_name, bucket_name, object_name)
print(f"File '{file_name}' uploaded successfully to S3 bucket '{bucket_name}'.")
except Exception as e:
print(f"Error uploading to S3: {e}")
# Define a route
@app.route('/')
def hello():
return 'Hello, world! Flask is running!'
# Run the application
if __name__ == '__main__':
# Upload a file to S3 before starting Flask (Optional)
if os.path.exists('app.py'):
upload_to_s3('app.py', AWS_BUCKET_NAME)
# Run Flask server
app.run(host='0.0.0.0', port=5000, debug=True)
In this exercise, we first install the required libraries for deploying a Flask web application to the AWS cloud platform. We create a simple Flask application that defines a single route. We use the upload_file
method from the Boto3 library to upload the application to an AWS S3 bucket. Note that this is only a basic example and there are many additional steps involved in deploying a web application to the AWS cloud platform, such as creating an EC2 instance, setting up a load balancer, configuring security groups, and more.
Exercise 46: Natural Language Processing
Concepts:
- Natural Language Processing
- NLTK library
- Tokenization
- Part-of-Speech Tagging
- Named Entity Recognition
Description: Write a Python script that performs natural language processing on text data using the NLTK library.
Solution:
import nltk
# Download required NLTK models
nltk.download('punkt')
nltk.download('maxent_ne_chunker')
nltk.download('words')
nltk.download('averaged_perceptron_tagger')
# Load the text data
text = '''Apple Inc. is an American multinational technology company headquartered in Cupertino, California, that designs, develops, and sells consumer electronics, computer software, and online services. The company's hardware products include the iPhone smartphone, the iPad tablet computer, the Mac personal computer, the iPod portable media player, the Apple Watch smartwatch, the Apple TV digital media player, and the HomePod smart speaker. Apple's software includes the macOS and iOS operating systems, the iTunes media player, the Safari web browser, and the iLife and iWork creativity and productivity suites. Its online services include the iTunes Store, the iOS App Store, and Mac App Store, Apple Music, and iCloud.'''
# Tokenize the text
tokens = nltk.word_tokenize(text)
# Perform part-of-speech tagging
pos_tags = nltk.pos_tag(tokens)
# Perform named entity recognition
ne_tags = nltk.ne_chunk(pos_tags)
# Extract named entities
named_entities = {}
for chunk in ne_tags:
if hasattr(chunk, 'label'):
entity_type = chunk.label() # Get entity type (e.g., ORGANIZATION, PERSON)
entity_name = ' '.join(c[0] for c in chunk) # Join words in entity
if entity_type not in named_entities:
named_entities[entity_type] = []
named_entities[entity_type].append(entity_name)
# Print structured named entities
print("\nNamed Entities Found:")
for entity_type, names in named_entities.items():
print(f"{entity_type}: {', '.join(set(names))}") # Use `set()` to remove duplicates
In this exercise, we first load some text data. We tokenize the text using the word_tokenize
function from the NLTK library. We perform part-of-speech tagging using the pos_tag
function from the NLTK library. We perform named entity recognition using the ne_chunk
function from the NLTK library. We print the named entities in the text data by checking if each chunk has a label of 'ORGANIZATION' or 'PERSON' using the hasattr
function and label
attribute.
Exercise 47: Big Data
Concepts:
- Big Data
- PySpark
- Apache Spark
- Data Processing
- MapReduce
Description: Write a PySpark script that processes data using the Spark framework.
Solution:
from pyspark import SparkContext, SparkConf
# Configure the Spark context
conf = SparkConf().setAppName('wordcount').setMaster('local[*]')
sc = SparkContext(conf=conf)
# Load the text data
text = sc.textFile('data.txt')
# Split the text into words and count the occurrences of each word
word_counts = text.flatMap(lambda line: line.split(' ')).map(lambda word: (word, 1)).reduceByKey(lambda a, b: a + b)
# Print the word counts
for word, count in word_counts.collect():
print(word, count)
# Stop the Spark context
sc.stop()
In this exercise, we first configure the Spark context using the SparkConf
and SparkContext
classes from the PySpark library. We load some text data using the textFile
method. We split the text into words and count the occurrences of each word using the flatMap
, map
, and reduceByKey
methods. We print the word counts using the collect
method. Finally, we stop the Spark context using the stop
method.
Exercise 48: Cybersecurity
Concepts:
- Cybersecurity
- Scapy library
- Network Analysis
- Packet Sniffing
Description: Write a Python script that performs security analysis on a network using the Scapy library.
Solution:
from scapy.all import *
# Define a packet handler function
def packet_handler(packet):
if packet.haslayer(TCP):
if packet[TCP].flags & 2:
print('SYN packet detected:', packet.summary())
# Start the packet sniffer
sniff(prn=packet_handler, filter='tcp', store=0)
In this exercise, we use the Scapy library to perform security analysis on a network. We define a packet handler function that is called for each packet that is sniffed. We check if the packet is a TCP packet and if it has the SYN flag set. If so, we print a message indicating that a SYN packet has been detected, along with a summary of the packet.
Exercise 49: Machine Learning
Concepts:
- Machine Learning
- Scikit-learn library
- Model Training
- Cross-Validation
- Grid Search
Description: Write a Python script that trains a machine learning model using the scikit-learn library.
Solution:
from sklearn import datasets
from sklearn.model_selection import cross_val_score, GridSearchCV
from sklearn.neighbors import KNeighborsClassifier
# Load the dataset
iris = datasets.load_iris()
# Split the dataset into features and target
X = iris.data
y = iris.target
# Define the hyperparameters to search
param_grid = {'n_neighbors': [3, 5, 7, 9], 'weights': ['uniform', 'distance']}
# Create a KNN classifier
knn = KNeighborsClassifier()
# Perform a grid search with cross-validation
grid_search = GridSearchCV(knn, param_grid, cv=5)
grid_search.fit(X, y)
# Print the best hyperparameters and the accuracy score
print('Best Hyperparameters:', grid_search.best_params_)
print('Accuracy Score:', grid_search.best_score_)
In this exercise, we use the scikit-learn library to train a machine learning model. We load a dataset using the load_iris
function from the datasets
module. We split the dataset into features and target. We define a dictionary of hyperparameters to search over using the param_grid
variable. We create a KNN classifier using the KNeighborsClassifier
class. We perform a grid search with cross-validation using the GridSearchCV
class. We print the best hyperparameters and the accuracy score using the best_params_
and best_score_
attributes.
Exercise 50: Computer Vision
Concepts:
- Computer Vision
- OpenCV library
- Image Processing
- Object Detection
Description: Write a Python script that performs image processing using the OpenCV library.
Solution:
import cv2
# Load the image
img = cv2.imread('image.jpg')
# Convert the image to grayscale
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# Define a classifier for face detection
face_cascade = cv2.CascadeClassifier('haarcascade_frontalface_default.xml')
# Detect faces in the image
faces = face_cascade.detectMultiScale(gray, scaleFactor=1.1, minNeighbors=5)
# Draw rectangles around the detected faces
for (x, y, w, h) in faces:
cv2.rectangle(img, (x, y), (x + w, y + h), (0, 255, 0), 2)
# Display the image with the detected faces
cv2.imshow('image', img)
cv2.waitKey(0)
cv2.destroyAllWindows()
In this exercise, we use the OpenCV library to perform image processing. We load an image using the imread
function. We convert the image to grayscale using the cvtColor
function. We define a classifier for face detection using the CascadeClassifier
class and a pre-trained classifier file. We detect faces in the image using the detectMultiScale
function. We draw rectangles around the detected faces using the rectangle
function. We display the image with the detected faces using the imshow
, waitKey
, and destroyAllWindows
functions.
Advance Level Exercises Part 2
Exercise 26: Machine Learning
Concepts:
- Machine Learning
- Scikit-Learn library
- Data Preprocessing
- Feature Engineering
- Model Training
- Model Evaluation
Description: Write a Python script that uses machine learning techniques to train a model and make predictions on new data.
Solution:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
# Read the data into a pandas dataframe
df = pd.read_csv('data.csv')
# Check for missing values
if df.isnull().sum().sum() > 0:
print("Warning: Missing values detected. Filling with mean values.")
df = df.fillna(df.mean()) # Alternatively, df.dropna() to remove rows with NaN values
# Ensure target column exists
if 'target' not in df.columns:
raise ValueError("Error: 'target' column not found in dataset.")
# Split the data into features and labels
X = df.drop(columns=['target'])
y = df['target']
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)
# Scale the data using standardization
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
# Train a logistic regression model with class balancing
model = LogisticRegression(random_state=42, class_weight='balanced')
model.fit(X_train_scaled, y_train)
# Make predictions on the test set
y_pred = model.predict(X_test_scaled)
# Evaluate the model performance
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred, average='weighted') # Supports multi-class
recall = recall_score(y_test, y_pred, average='weighted')
f1 = f1_score(y_test, y_pred, average='weighted')
# Print evaluation metrics
print('Accuracy:', round(accuracy, 4))
print('Precision:', round(precision, 4))
print('Recall:', round(recall, 4))
print('F1 score:', round(f1, 4))
In this exercise, we first read a dataset into a pandas dataframe. We split the data into training and testing sets using the train_test_split
function from the sklearn.model_selection
module. We scale the data using standardization using the StandardScaler
class from the sklearn.preprocessing
module. We train a logistic regression model using the LogisticRegression
class from the sklearn.linear_model
module and make predictions on the test set. Finally, we evaluate the performance of the model using metrics such as accuracy, precision, recall, and F1 score using the appropriate functions from the sklearn.metrics
module.
Exercise 27: Web Development
Concepts:
- Web Development
- Flask framework
- HTML templates
- Routing
- HTTP methods
- Form handling
Description: Write a Python script that creates a web application using the Flask framework.
Solution:
from flask import Flask, render_template, request
app = Flask(__name__)
# Define a route for the home page
@app.route('/')
def home():
return render_template('home.html')
# Define a route for the contact page
@app.route('/contact', methods=['GET', 'POST'])
def contact():
if request.method == 'POST':
name = request.form['name']
email = request.form['email']
message = request.form['message']
# TODO: Process the form data
return 'Thanks for contacting us!'
else:
return render_template('contact.html')
if __name__ == '__main__':
app.run(debug=True)
In this exercise, we first import the Flask
class from the flask
module and create a new Flask application. We define routes for the home page and contact page using the route
decorator. We use the render_template
function to render HTML templates for the home page and contact page. We handle form submissions on the contact page using the request
object and the POST
method. Finally, we start the Flask application using the run
method.
Exercise 28: Data Streaming
Concepts:
- Data Streaming
- Kafka
- PyKafka library
- Stream Processing
Description: Write a Python script that streams data from a source and processes it in real-time.
Solution:
from pykafka import KafkaClient
import json
# Kafka broker configuration
KAFKA_BROKER = 'localhost:9092'
TOPIC_NAME = 'test'
try:
# Connect to Kafka broker
client = KafkaClient(hosts=KAFKA_BROKER)
# Get a reference to the topic
topic = client.topics[TOPIC_NAME]
# Create a consumer
consumer = topic.get_simple_consumer()
print(f"Connected to Kafka broker at {KAFKA_BROKER}, consuming messages from topic '{TOPIC_NAME}'...")
# Process messages in real-time
for message in consumer:
if message is not None:
try:
data = json.loads(message.value.decode('utf-8')) # Decode & parse JSON safely
print("Received message:", data)
# TODO: Process the data in real-time
except json.JSONDecodeError as e:
print(f"Error decoding JSON: {e} - Raw message: {message.value}")
except Exception as e:
print(f"Kafka connection error: {e}")
finally:
if 'consumer' in locals():
consumer.stop() # Ensure consumer is properly stopped
print("Kafka consumer stopped.")
In this exercise, we first connect to a Kafka broker using the KafkaClient
class from the pykafka
library. We get a reference to a topic and create a consumer for the topic using the get_simple_consumer
method. We process messages in real-time using a loop and the value
attribute of the messages. We parse the message data using the json.loads
function and process the data in real-time.
Exercise 29: Natural Language Processing
Concepts:
- Natural Language Processing
- NLTK library
- Tokenization
- Stemming
- Stop Words Removal
Description: Write a Python script that performs natural language processing tasks on a text corpus.
Solution:
import nltk
from nltk.tokenize import word_tokenize
from nltk.stem import PorterStemmer
from nltk.corpus import stopwords
# Download NLTK data
nltk.download('punkt')
nltk.download('stopwords')
# Load the text corpus
with open('corpus.txt', 'r') as f:
corpus = f.read()
# Tokenize the corpus
tokens = word_tokenize(corpus)
# Remove stop words
stop_words = set(stopwords.words('english'))
filtered_tokens = [token for token in tokens if token.lower() not in stop_words]
# Stem the tokens
stemmer = PorterStemmer()
stemmed_tokens = [stemmer.stem(token) for token in filtered_tokens]
# Print the results
print('Original tokens:', tokens[:10])
print('Filtered tokens:', filtered_tokens[:10])
print('Stemmed tokens:', stemmed_tokens[:10])
In this exercise, we first download the necessary data from the NLTK library using the nltk.download
function. We load a text corpus from a file and tokenize the corpus using the word_tokenize
function from the nltk.tokenize
module. We remove stop words using the stopwords
corpus from the NLTK library and stem the tokens using the PorterStemmer
class from the nltk.stem
module. Finally, we print the results for the original, filtered, and stemmed tokens.
Exercise 30: Distributed Systems
Concepts:
- Distributed Systems
- Pyro library
- Remote Method Invocation
- Client-Server Architecture
Description: Write a Python script that implements a distributed system using the Pyro library.
Solution:
import Pyro4
# Define a remote object class
@Pyro4.expose
class MyObject:
def method1(self, arg1):
return f"Processed method1 with argument: {arg1}"
def method2(self, arg2):
return f"Processed method2 with argument: {arg2}"
# Start the server
if __name__ == '__main__':
# Locate the name server
ns = Pyro4.locateNS()
# Create a Pyro daemon
daemon = Pyro4.Daemon()
# Register the remote object with the daemon
uri = daemon.register(MyObject)
# Register the object with the name server
ns.register('myobject', uri)
print(f"MyObject is now available. URI: {uri}")
# Run the server loop
daemon.requestLoop()
In this exercise, we first define a remote object class using the expose
decorator from the Pyro4
library. We implement two methods that can be invoked remotely by a client. We register the remote object using the register
method of a Pyro4
daemon. We start the name server using the locateNS
function from the Pyro4
library and register the remote object with a name. Finally, we start the server using the requestLoop
method of the daemon.
I hope you find these exercises helpful! Let me know if you have any further questions.
Exercise 31: Data Visualization
Concepts:
- Data Visualization
- Plotly library
- Line Chart
- Scatter Chart
- Bar Chart
- Heatmap
- Subplots
Description: Write a Python script that creates interactive visualizations of data using the Plotly library.
Solution:
import plotly.graph_objs as go
import pandas as pd
from plotly.subplots import make_subplots # Correct import
# Load the data
df = pd.read_csv('data.csv')
# Ensure 'quarter' is a string (for heatmap y-axis)
df['quarter'] = df['quarter'].astype(str)
# Create traces
trace1 = go.Scatter(x=df['year'], y=df['sales'], mode='lines', name='Sales')
trace2 = go.Scatter(x=df['year'], y=df['profit'], mode='markers', name='Profit')
trace3 = go.Bar(x=df['year'], y=df['expenses'], name='Expenses')
trace4 = go.Heatmap(x=df['year'], y=df['quarter'], z=df['revenue'], colorscale='Viridis', name='Revenue')
# Create subplots
fig = make_subplots(rows=2, cols=2, subplot_titles=('Sales', 'Profit', 'Expenses', 'Revenue'))
# Add traces correctly
fig.add_trace(trace1, row=1, col=1)
fig.add_trace(trace2, row=1, col=2)
fig.add_trace(trace3, row=2, col=1)
fig.add_trace(trace4, row=2, col=2)
# Update layout for better visualization
fig.update_layout(title='Financial Performance', height=800, width=1000)
# Display the chart
fig.show()
In this exercise, we first load a dataset into a pandas dataframe. We create several chart objects using the Scatter
, Bar
, and Heatmap
classes from the plotly.graph_objs
module. We create subplots using the make_subplots
function from the plotly.subplots
module and add the chart objects to the subplots using the append_trace
method. We set the layout of the chart using the update_layout
method and display the chart using the show
method.
Exercise 32: Data Engineering
Concepts:
- Data Engineering
- SQLite
- Pandas library
- Data Transformation
- Data Integration
Description: Write a Python script that processes data from multiple sources and stores it in a database.
Solution:
import sqlite3
import pandas as pd
# Load data from multiple sources into pandas DataFrames
df1 = pd.read_csv('data1.csv')
df2 = pd.read_excel('data2.xlsx')
df3 = pd.read_json('data3.json')
# Standardize column names across datasets
expected_columns = ['date', 'amount', 'description'] # Adjust based on actual dataset
df1 = df1.reindex(columns=expected_columns, fill_value=None)
df2 = df2.reindex(columns=expected_columns, fill_value=None)
df3 = df3.reindex(columns=expected_columns, fill_value=None)
# Data Cleaning & Transformation
df1['date'] = pd.to_datetime(df1['date'], errors='coerce') # Handle invalid dates
df2['amount'] = df2['amount'].astype(float) / 100 # Convert to proper currency format
df3['description'] = df3['description'].astype(str).str.upper() # Ensure consistency
# Merge DataFrames while handling missing values
df = pd.concat([df1, df2, df3], axis=0).fillna({'amount': 0, 'description': 'UNKNOWN'})
# Store the data in a SQLite database safely
db_file = 'mydb.db'
table_name = 'mytable'
with sqlite3.connect(db_file) as conn:
df.to_sql(table_name, conn, if_exists='replace', index=False)
print(f"Data successfully saved to SQLite table '{table_name}' in '{db_file}'.")
In this exercise, we first load data from multiple sources into pandas dataframes using functions such as read_csv
, read_excel
, and read_json
. We transform the data using pandas functions such as to_datetime
, str.upper
, and arithmetic operations. We combine the data into a single pandas dataframe using the concat
function. Finally, we store the data in a SQLite database using the to_sql
method of the pandas dataframe.
Exercise 33: Natural Language Generation
Concepts:
- Natural Language Generation
- Markov Chains
- NLTK library
- Text Corpus
Description: Write a Python script that generates text using natural language generation techniques.
Solution:
import nltk
import random
import os
# Download necessary NLTK resources
nltk.download('punkt')
# Define corpus file
corpus_file = 'corpus.txt'
# Ensure the corpus file exists
if not os.path.exists(corpus_file):
raise FileNotFoundError(f"Error: The file '{corpus_file}' was not found.")
# Load the text corpus
with open(corpus_file, 'r', encoding='utf-8') as f:
corpus = f.read()
# Tokenize the corpus
tokens = nltk.word_tokenize(corpus)
# Build a dictionary of word transitions (Markov Chain)
chain = {}
for i in range(len(tokens) - 1):
word1 = tokens[i]
word2 = tokens[i + 1]
if word1 in chain:
chain[word1].append(word2)
else:
chain[word1] = [word2]
# Generate text using Markov chains
start_word = random.choice(list(chain.keys()))
sentence = [start_word.capitalize()]
while len(sentence) < 100: # Limit by word count
last_word = sentence[-1].lower() # Ensure consistent lookup
if last_word in chain:
next_word = random.choice(chain[last_word])
sentence.append(next_word)
else:
break # Stop if there are no next words
# Print the generated text
print(' '.join(sentence))
In this exercise, we first download the necessary data from the NLTK library using the nltk.download
function. We load a text corpus from a file and tokenize the corpus using the word_tokenize
function from the nltk
library. We build a dictionary of word transitions using a loop and generate text using Markov chains. We start by selecting a random word from the dictionary and then randomly select a next word from the list of possible transitions. We continue to add words to the sentence until it reaches a specified length. Finally, we print the generated text.
Exercise 34: Machine Learning
Concepts:
- Machine Learning
- Scikit-learn library
- Decision Tree Classifier
- Model Training
- Model Evaluation
Description: Write a Python script that trains a machine learning model using the scikit-learn library.
Solution:
from sklearn import datasets
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report
# Load the iris dataset
iris = datasets.load_iris()
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.3, random_state=42, stratify=iris.target)
# Train a decision tree classifier with hyperparameter tuning
clf = DecisionTreeClassifier(max_depth=4, min_samples_split=5, random_state=42)
clf.fit(X_train, y_train)
# Make predictions
y_pred = clf.predict(X_test)
# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print('Accuracy:', round(accuracy, 4))
print('\nClassification Report:\n', classification_report(y_test, y_pred, target_names=iris.target_names))
# Feature Importance Analysis
feature_importances = dict(zip(iris.feature_names, clf.feature_importances_))
print("\nFeature Importances:", feature_importances)
In this exercise, we first load the iris dataset from the scikit-learn library using the load_iris
function. We split the data into training and testing sets using the train_test_split
function. We train a decision tree classifier using the DecisionTreeClassifier
class and the fit
method. We evaluate the model using the predict
method and the accuracy_score
function from the sklearn.metrics
module.
Exercise 35: Computer Vision
Concepts:
- Computer Vision
- OpenCV library
- Image Loading
- Image Filtering
- Image Segmentation
Description: Write a Python script that performs computer vision tasks on images using the OpenCV library.
Solution:
import cv2
import os
# Load an image safely
image_path = 'image.jpg'
if not os.path.exists(image_path):
raise FileNotFoundError(f"Error: '{image_path}' not found.")
img = cv2.imread(image_path)
# Convert to grayscale
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# Apply median filter
filtered = cv2.medianBlur(gray, 5)
# Apply adaptive thresholding
thresh = cv2.adaptiveThreshold(filtered, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 11, 2)
# Apply morphological operations
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (3, 3))
closed = cv2.morphologyEx(thresh, cv2.MORPH_CLOSE, kernel)
# Find contours
contours_info = cv2.findContours(closed, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
contours = contours_info[0] if len(contours_info) == 2 else contours_info[1] # Safe unpacking
# Draw contours on the original image
cv2.drawContours(img, contours, -1, (0, 0, 255), 2)
# Save and display the processed images
cv2.imwrite('output_contours.jpg', img)
cv2.imwrite('output_thresholded.jpg', thresh)
cv2.imwrite('output_closed.jpg', closed)
print("Processing complete. Images saved as 'output_contours.jpg', 'output_thresholded.jpg', and 'output_closed.jpg'.")
# Display the images (comment out if running on a headless system)
cv2.imshow('Original', img)
cv2.imshow('Thresholded', thresh)
cv2.imshow('Closed', closed)
cv2.waitKey(0)
cv2.destroyAllWindows()
In this exercise, we first load an image using the imread
function from the OpenCV library. We convert the image to grayscale using the cvtColor
function and apply a median filter to the image using the medianBlur
function. We apply adaptive thresholding to the image using the adaptiveThreshold
function and morphological operations to the image using the getStructuringElement
and morphologyEx
functions. We find contours in the image using the findContours
function and draw the contours on the original image using the drawContours
function. Finally, we display the images using the imshow
function.
I hope you find these exercises helpful! Let me know if you have any further questions.
Exercise 36: Network Programming
Concepts:
- Network Programming
- Socket library
- Client-Server Architecture
- Protocol Implementation
Description: Write a Python script that communicates with a remote server using the socket library.
Solution:
import socket
# Create a socket object
s = socket.socket()
# Define the server address and port number
host = 'localhost'
port = 12345
# Connect to the server
s.connect((host, port))
# Send data to the server
s.send(b'Hello, server!')
# Receive data from the server
data = s.recv(1024)
# Close the socket
s.close()
# Print the received data
print('Received:', data.decode())
In this exercise, we first create a socket object using the socket
function from the socket library. We define the address and port number of the server we want to connect to. We connect to the server using the connect
method of the socket object. We send data to the server using the send
method and receive data from the server using the recv
method. Finally, we close the socket using the close
method and print the received data.
Exercise 37: Cloud Computing
Concepts:
- Cloud Computing
- Heroku
- Flask
- Web Application Deployment
Description: Write a Python script that deploys a Flask web application to the Heroku cloud platform.
Solution:
from flask import Flask
# Create a Flask application
app = Flask(__name__)
# Define a route
@app.route('/')
def hello():
return 'Hello, world!'
# Run the application (for development only)
if __name__ == '__main__':
app.run(host='0.0.0.0', port=5000, debug=True) # Set debug=True only for development
In this exercise, we first install the required libraries for deploying a Flask web application to the Heroku cloud platform. We create a simple Flask application that defines a single route. We use the run
method of the Flask object to run the application locally. To deploy the application to the Heroku cloud platform, we need to follow the instructions provided by Heroku and push our code to a remote repository.
Exercise 38: Natural Language Processing
Concepts:
- Natural Language Processing
- spaCy library
- Named Entity Recognition
- Text Processing
Description: Write a Python script that performs named entity recognition on text using the spaCy library.
Solution:
import spacy
# Ensure the model is installed before running the script:
# Run: python -m spacy download en_core_web_sm
# Load the English language model
try:
nlp = spacy.load('en_core_web_sm')
except OSError:
raise OSError("Spacy model 'en_core_web_sm' not found. Run 'python -m spacy download en_core_web_sm' and try again.")
# Define some text to process
text = 'Barack Obama was born in Hawaii.'
# Process the text
doc = nlp(text)
# Extract named entities from the text
entities = [(ent.text, ent.label_) for ent in doc.ents]
# Display results
if entities:
print("\nNamed Entities Found:")
for text, label in entities:
print(f" - {text}: {label}")
else:
print("\nNo named entities found in the text.")
In this exercise, we first load the English language model using the load
function from the spaCy library. We define some text to process and process the text using the nlp
function from the spaCy library. We extract named entities from the text using the ents
attribute of the processed text and print the text and label of each named entity.
Exercise 39: Deep Learning
Concepts:
- Deep Learning
- TensorFlow library
- Convolutional Neural Network
- Model Training
- Model Evaluation
Description: Write a Python script that trains a deep learning model using the TensorFlow library.
Solution:
import tensorflow as tf
from tensorflow.keras import datasets, layers, models
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.callbacks import EarlyStopping
# Load the CIFAR-10 dataset
(train_images, train_labels), (test_images, test_labels) = datasets.cifar10.load_data()
# Normalize the pixel values
train_images, test_images = train_images / 255.0, test_images / 255.0
# Data Augmentation to prevent overfitting
datagen = ImageDataGenerator(
rotation_range=15,
width_shift_range=0.1,
height_shift_range=0.1,
horizontal_flip=True
)
# Define the model architecture
model = models.Sequential([
layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)),
layers.MaxPooling2D((2, 2)),
layers.Conv2D(64, (3, 3), activation='relu'),
layers.MaxPooling2D((2, 2)),
layers.Conv2D(128, (3, 3), activation='relu'),
layers.MaxPooling2D((2, 2)),
layers.Flatten(),
layers.Dense(128, activation='relu'),
layers.Dropout(0.5), # Prevent overfitting
layers.Dense(10, activation='softmax') # Use Softmax for probabilities
])
# Compile the model
model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(),
metrics=['accuracy'])
# Define early stopping to stop training if no improvement
early_stopping = EarlyStopping(monitor='val_loss', patience=3, restore_best_weights=True)
# Train the model with data augmentation
model.fit(datagen.flow(train_images, train_labels, batch_size=64),
validation_data=(test_images, test_labels),
epochs=30, callbacks=[early_stopping])
# Evaluate the model
test_loss, test_acc = model.evaluate(test_images, test_labels, verbose=2)
print('Test accuracy:', round(test_acc * 100, 2), '%')
In this exercise, we first load the CIFAR-10 dataset from the TensorFlow library using the load_data
function. We normalize the pixel values of the images by dividing them by 255.0. We define a deep learning model architecture using the Sequential
class from the TensorFlow library and various layers such as Conv2D
, MaxPooling2D
, Flatten
, and Dense
. We compile the model using the compile
method and train the model using the fit
method. We evaluate the model using the evaluate
method and print the test accuracy.
Exercise 40: Data Analysis
Concepts:
- Data Analysis
- Pandas library
- Data Cleaning
- Data Manipulation
- Data Visualization
Description: Write a Python script that analyzes data using the pandas library.
Solution:
import pandas as pd
import matplotlib.pyplot as plt
# Load the data
df = pd.read_csv('data.csv')
# Convert 'date' column to datetime format
df['date'] = pd.to_datetime(df['date'], errors='coerce')
# Drop rows with missing or invalid dates
df.dropna(subset=['date'], inplace=True)
# Convert 'price' and 'quantity' to numeric values (if not already)
df['price'] = pd.to_numeric(df['price'], errors='coerce')
df['quantity'] = pd.to_numeric(df['quantity'], errors='coerce')
# Drop rows with missing or invalid price/quantity
df.dropna(subset=['price', 'quantity'], inplace=True)
# Compute total sales
df['total_sales'] = df['price'] * df['quantity']
# Set date as index for proper resampling
df.set_index('date', inplace=True)
# Group by month and sum sales
monthly_sales = df.resample('M').sum()
# Visualize the data
plt.figure(figsize=(10, 5))
plt.plot(monthly_sales.index, monthly_sales['total_sales'], marker='o', linestyle='-')
plt.xlabel('Month')
plt.ylabel('Total Sales')
plt.title('Monthly Sales Trend')
plt.grid()
plt.xticks(rotation=45)
plt.show()
In this exercise, we first load data from a CSV file using the read_csv
function from the pandas library. We clean the data by removing any rows with missing values using the dropna
method. We manipulate the data by calculating the total sales for each transaction and grouping the data by month using the groupby
method. We visualize the data by plotting the total sales for each month using the plot
function from the matplotlib library.
Exercise 41: Data Science
Concepts:
- Data Science
- NumPy library
- pandas library
- Matplotlib library
- Data Cleaning
- Data Manipulation
- Data Visualization
Description: Write a Python script that performs data analysis on a dataset using the NumPy, pandas, and Matplotlib libraries.
Solution:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# Load the data
df = pd.read_csv('data.csv')
# Convert 'date' column to datetime format
df['date'] = pd.to_datetime(df['date'], errors='coerce')
# Drop rows with missing or invalid dates
df.dropna(subset=['date'], inplace=True)
# Convert 'price' and 'quantity' to numeric values (if not already)
df['price'] = pd.to_numeric(df['price'], errors='coerce')
df['quantity'] = pd.to_numeric(df['quantity'], errors='coerce')
# Drop rows with missing or invalid price/quantity
df.dropna(subset=['price', 'quantity'], inplace=True)
# Compute total sales
df['total_sales'] = df['price'] * df['quantity']
# Set date as index for proper resampling
df.set_index('date', inplace=True)
# Group by month and sum sales
monthly_sales = df.resample('M').sum()
# Analyze the data
print('Total Sales:', round(df['total_sales'].sum(), 2))
print('Average Price:', round(df['price'].mean(), 2))
print('Median Quantity:', df['quantity'].median())
# Visualize the data
plt.figure(figsize=(10, 5))
plt.plot(monthly_sales.index, monthly_sales['total_sales'], marker='o', linestyle='-', color='b', label='Total Sales')
plt.xlabel('Month')
plt.ylabel('Total Sales')
plt.title('Monthly Sales Trend')
plt.legend()
plt.grid()
plt.xticks(rotation=45)
plt.show()
In this exercise, we first load data from a CSV file using the read_csv
function from the pandas library. We clean the data by removing any rows with missing values using the dropna
method. We manipulate the data by calculating the total sales for each transaction and grouping the data by month using the groupby
method. We perform some basic data analysis by calculating the total sales, average price, and median quantity. We visualize the data by plotting the total sales for each month using the plot
function from the matplotlib library.
Exercise 42: Machine Learning
Concepts:
- Machine Learning
- scikit-learn library
- Support Vector Machines
- Model Training
- Model Evaluation
Description: Write a Python script that trains a machine learning model using the scikit-learn library.
Solution:
import numpy as np
from sklearn import datasets, svm
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
# Load the iris dataset
iris = datasets.load_iris()
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.2, random_state=42, stratify=iris.target)
# Standardize the data (SVMs perform better with scaled data)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
# Train a Support Vector Machine classifier
clf = svm.SVC(kernel='linear', C=1.0, random_state=42)
clf.fit(X_train_scaled, y_train)
# Predict the labels
y_pred = clf.predict(X_test_scaled)
# Evaluate the classifier
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy:.4f}\n')
# Print detailed evaluation metrics
print("Classification Report:\n", classification_report(y_test, y_pred, target_names=iris.target_names))
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))
In this exercise, we first load the iris dataset from the scikit-learn library using the load_iris
function. We split the data into training and testing sets using the train_test_split
function from the scikit-learn library. We train a support vector machine classifier using the SVC
class from the scikit-learn library with a linear kernel. We evaluate the classifier using the score
method and print the accuracy.
Exercise 43: Web Scraping
Concepts:
- Web Scraping
- BeautifulSoup library
- HTML Parsing
- Data Extraction
Description: Write a Python script that scrapes data from a website using the BeautifulSoup library.
Solution:
import requests
from bs4 import BeautifulSoup
# Define the target URL
url = 'https://en.wikipedia.org/wiki/Python_(programming_language)'
# Add headers to prevent request blocking
headers = {'User-Agent': 'Mozilla/5.0'}
# Fetch the HTML content of the website
response = requests.get(url, headers=headers)
# Check if the request was successful
if response.status_code != 200:
print(f"Error: Unable to fetch the page (Status Code: {response.status_code})")
exit()
# Parse the HTML content using BeautifulSoup
soup = BeautifulSoup(response.text, 'html.parser')
# Extract the page title
title = soup.title.string
print(f"\nPage Title: {title}\n")
# Extract all valid links
base_url = 'https://en.wikipedia.org'
links = []
for link in soup.find_all('a', href=True): # Ensures 'href' exists
href = link.get('href')
# Convert relative Wikipedia links to absolute URLs
if href.startswith('/wiki/'):
full_url = base_url + href
links.append(full_url)
elif href.startswith('http'): # Keep only valid external links
links.append(href)
# Print the first 10 links for brevity
print("Extracted Links:")
for l in links[:10]: # Limit output for readability
print(l)
print(f"\nTotal Links Found: {len(links)}")
In this exercise, we first fetch the HTML content of a website using the get
function from the requests library. We parse the HTML content using the BeautifulSoup
class from the BeautifulSoup library. We extract data from the HTML content using various methods such as title
and find_all
.
Exercise 44: Database Programming
Concepts:
- Database Programming
- SQLite library
- SQL
- Data Retrieval
- Data Manipulation
Description: Write a Python script that interacts with a database using the SQLite library.
Solution:
import sqlite3
# Connect to the database using a context manager
with sqlite3.connect('data.db') as conn:
cursor = conn.cursor()
# Create a table (if it doesn't exist)
cursor.execute('''CREATE TABLE IF NOT EXISTS users (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT NOT NULL,
age INTEGER NOT NULL);''')
# Insert data into the table (use parameterized queries)
cursor.execute("INSERT INTO users (name, age) VALUES (?, ?)", ('John Doe', 30))
cursor.execute("INSERT INTO users (name, age) VALUES (?, ?)", ('Jane Doe', 25))
# Retrieve data from the table
cursor.execute('SELECT * FROM users')
users = cursor.fetchall() # Fetch all rows
print("\nUsers in database:")
for user in users:
print(user)
# Update data using a parameterized query
cursor.execute("UPDATE users SET age = ? WHERE name = ?", (35, 'John Doe'))
# Delete data using a parameterized query
cursor.execute("DELETE FROM users WHERE name = ?", ('Jane Doe',))
# Commit the changes (happens automatically with `with` statement)
conn.commit()
print("\nDatabase operations completed successfully.")
In this exercise, we first connect to a SQLite database using the connect
function from the SQLite library. We create a table using SQL commands and insert data into the table using SQL commands. We retrieve data from the table using SQL commands and print the data. We update data in the table and delete data from the table using SQL commands. Finally, we commit the changes to the database and close the connection.
Exercise 45: Cloud Computing
Concepts:
- Cloud Computing
- AWS
- Flask library
- Boto3 library
- Web Application Deployment
Description: Write a Python script that deploys a web application to the AWS cloud platform using the Flask and Boto3 libraries.
Solution:
from flask import Flask
import boto3
import os
# Create a Flask application
app = Flask(__name__)
# AWS S3 Configuration
AWS_BUCKET_NAME = 'my-bucket'
AWS_REGION = 'us-east-1' # Change to your region
# Upload function for AWS S3
def upload_to_s3(file_name, bucket_name, object_name=None):
"""Uploads a file to S3"""
try:
s3 = boto3.client('s3') # Ensure credentials are configured
object_name = object_name or file_name # Default object name
# Upload file
s3.upload_file(file_name, bucket_name, object_name)
print(f"File '{file_name}' uploaded successfully to S3 bucket '{bucket_name}'.")
except Exception as e:
print(f"Error uploading to S3: {e}")
# Define a route
@app.route('/')
def hello():
return 'Hello, world! Flask is running!'
# Run the application
if __name__ == '__main__':
# Upload a file to S3 before starting Flask (Optional)
if os.path.exists('app.py'):
upload_to_s3('app.py', AWS_BUCKET_NAME)
# Run Flask server
app.run(host='0.0.0.0', port=5000, debug=True)
In this exercise, we first install the required libraries for deploying a Flask web application to the AWS cloud platform. We create a simple Flask application that defines a single route. We use the upload_file
method from the Boto3 library to upload the application to an AWS S3 bucket. Note that this is only a basic example and there are many additional steps involved in deploying a web application to the AWS cloud platform, such as creating an EC2 instance, setting up a load balancer, configuring security groups, and more.
Exercise 46: Natural Language Processing
Concepts:
- Natural Language Processing
- NLTK library
- Tokenization
- Part-of-Speech Tagging
- Named Entity Recognition
Description: Write a Python script that performs natural language processing on text data using the NLTK library.
Solution:
import nltk
# Download required NLTK models
nltk.download('punkt')
nltk.download('maxent_ne_chunker')
nltk.download('words')
nltk.download('averaged_perceptron_tagger')
# Load the text data
text = '''Apple Inc. is an American multinational technology company headquartered in Cupertino, California, that designs, develops, and sells consumer electronics, computer software, and online services. The company's hardware products include the iPhone smartphone, the iPad tablet computer, the Mac personal computer, the iPod portable media player, the Apple Watch smartwatch, the Apple TV digital media player, and the HomePod smart speaker. Apple's software includes the macOS and iOS operating systems, the iTunes media player, the Safari web browser, and the iLife and iWork creativity and productivity suites. Its online services include the iTunes Store, the iOS App Store, and Mac App Store, Apple Music, and iCloud.'''
# Tokenize the text
tokens = nltk.word_tokenize(text)
# Perform part-of-speech tagging
pos_tags = nltk.pos_tag(tokens)
# Perform named entity recognition
ne_tags = nltk.ne_chunk(pos_tags)
# Extract named entities
named_entities = {}
for chunk in ne_tags:
if hasattr(chunk, 'label'):
entity_type = chunk.label() # Get entity type (e.g., ORGANIZATION, PERSON)
entity_name = ' '.join(c[0] for c in chunk) # Join words in entity
if entity_type not in named_entities:
named_entities[entity_type] = []
named_entities[entity_type].append(entity_name)
# Print structured named entities
print("\nNamed Entities Found:")
for entity_type, names in named_entities.items():
print(f"{entity_type}: {', '.join(set(names))}") # Use `set()` to remove duplicates
In this exercise, we first load some text data. We tokenize the text using the word_tokenize
function from the NLTK library. We perform part-of-speech tagging using the pos_tag
function from the NLTK library. We perform named entity recognition using the ne_chunk
function from the NLTK library. We print the named entities in the text data by checking if each chunk has a label of 'ORGANIZATION' or 'PERSON' using the hasattr
function and label
attribute.
Exercise 47: Big Data
Concepts:
- Big Data
- PySpark
- Apache Spark
- Data Processing
- MapReduce
Description: Write a PySpark script that processes data using the Spark framework.
Solution:
from pyspark import SparkContext, SparkConf
# Configure the Spark context
conf = SparkConf().setAppName('wordcount').setMaster('local[*]')
sc = SparkContext(conf=conf)
# Load the text data
text = sc.textFile('data.txt')
# Split the text into words and count the occurrences of each word
word_counts = text.flatMap(lambda line: line.split(' ')).map(lambda word: (word, 1)).reduceByKey(lambda a, b: a + b)
# Print the word counts
for word, count in word_counts.collect():
print(word, count)
# Stop the Spark context
sc.stop()
In this exercise, we first configure the Spark context using the SparkConf
and SparkContext
classes from the PySpark library. We load some text data using the textFile
method. We split the text into words and count the occurrences of each word using the flatMap
, map
, and reduceByKey
methods. We print the word counts using the collect
method. Finally, we stop the Spark context using the stop
method.
Exercise 48: Cybersecurity
Concepts:
- Cybersecurity
- Scapy library
- Network Analysis
- Packet Sniffing
Description: Write a Python script that performs security analysis on a network using the Scapy library.
Solution:
from scapy.all import *
# Define a packet handler function
def packet_handler(packet):
if packet.haslayer(TCP):
if packet[TCP].flags & 2:
print('SYN packet detected:', packet.summary())
# Start the packet sniffer
sniff(prn=packet_handler, filter='tcp', store=0)
In this exercise, we use the Scapy library to perform security analysis on a network. We define a packet handler function that is called for each packet that is sniffed. We check if the packet is a TCP packet and if it has the SYN flag set. If so, we print a message indicating that a SYN packet has been detected, along with a summary of the packet.
Exercise 49: Machine Learning
Concepts:
- Machine Learning
- Scikit-learn library
- Model Training
- Cross-Validation
- Grid Search
Description: Write a Python script that trains a machine learning model using the scikit-learn library.
Solution:
from sklearn import datasets
from sklearn.model_selection import cross_val_score, GridSearchCV
from sklearn.neighbors import KNeighborsClassifier
# Load the dataset
iris = datasets.load_iris()
# Split the dataset into features and target
X = iris.data
y = iris.target
# Define the hyperparameters to search
param_grid = {'n_neighbors': [3, 5, 7, 9], 'weights': ['uniform', 'distance']}
# Create a KNN classifier
knn = KNeighborsClassifier()
# Perform a grid search with cross-validation
grid_search = GridSearchCV(knn, param_grid, cv=5)
grid_search.fit(X, y)
# Print the best hyperparameters and the accuracy score
print('Best Hyperparameters:', grid_search.best_params_)
print('Accuracy Score:', grid_search.best_score_)
In this exercise, we use the scikit-learn library to train a machine learning model. We load a dataset using the load_iris
function from the datasets
module. We split the dataset into features and target. We define a dictionary of hyperparameters to search over using the param_grid
variable. We create a KNN classifier using the KNeighborsClassifier
class. We perform a grid search with cross-validation using the GridSearchCV
class. We print the best hyperparameters and the accuracy score using the best_params_
and best_score_
attributes.
Exercise 50: Computer Vision
Concepts:
- Computer Vision
- OpenCV library
- Image Processing
- Object Detection
Description: Write a Python script that performs image processing using the OpenCV library.
Solution:
import cv2
# Load the image
img = cv2.imread('image.jpg')
# Convert the image to grayscale
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# Define a classifier for face detection
face_cascade = cv2.CascadeClassifier('haarcascade_frontalface_default.xml')
# Detect faces in the image
faces = face_cascade.detectMultiScale(gray, scaleFactor=1.1, minNeighbors=5)
# Draw rectangles around the detected faces
for (x, y, w, h) in faces:
cv2.rectangle(img, (x, y), (x + w, y + h), (0, 255, 0), 2)
# Display the image with the detected faces
cv2.imshow('image', img)
cv2.waitKey(0)
cv2.destroyAllWindows()
In this exercise, we use the OpenCV library to perform image processing. We load an image using the imread
function. We convert the image to grayscale using the cvtColor
function. We define a classifier for face detection using the CascadeClassifier
class and a pre-trained classifier file. We detect faces in the image using the detectMultiScale
function. We draw rectangles around the detected faces using the rectangle
function. We display the image with the detected faces using the imshow
, waitKey
, and destroyAllWindows
functions.