Click here to view the next lesson.

Chapter 6: Advance Level Exercises

Advance Level Exercises Part 1

Exercise 1: File Parsing

Concepts:

File I/O
Regular expressions

Description: Write a Python script that reads a text file and extracts all URLs that are present in the file. The output should be a list of URLs.

Solution:

import re

# Open the file for reading
with open('input_file.txt', 'r') as f:
    # Read the file contents
    file_contents = f.read()

    # Use regular expression to extract URLs
    urls = re.findall(r'http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+', file_contents)

# Print the list of URLs
print(urls)

Exercise 2: Data Analysis

Concepts:

File I/O
Data manipulation
Pandas library

Description: Write a Python script that reads a CSV file containing sales data and calculates the total sales revenue for each product category.

Solution:

import pandas as pd

# Read the CSV file into a pandas dataframe
df = pd.read_csv('sales_data.csv')

# Group the data by product category and sum the sales revenue
total_revenue = df.groupby('Product Category')['Sales Revenue'].sum()

# Print the total revenue for each product category
print(total_revenue)

Exercise 3: Web Scraping

Concepts

Web scraping
Requests library
Beautiful Soup library
CSV file I/O

Description: Write a Python script that scrapes the title and price of all products listed on an e-commerce website and stores them in a CSV file.

Solution:

import requests
from bs4 import BeautifulSoup
import csv

# Define the target URL
url = 'https://www.example.com/products'

# Headers to mimic a real browser request
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)'}

# Make a GET request to the website
response = requests.get(url, headers=headers)

# Check if the request was successful
if response.status_code == 200:
    # Parse the HTML content using Beautiful Soup
    soup = BeautifulSoup(response.content, 'html.parser')

    # Find all product titles and prices
    titles = [title.get_text(strip=True) for title in soup.find_all('h3', class_='product-title')]
    prices = [price.get_text(strip=True) for price in soup.find_all('div', class_='product-price')]

    # Zip the titles and prices together
    data = list(zip(titles, prices))

    # Write the data to a CSV file with headers
    with open('product_data.csv', 'w', newline='', encoding='utf-8') as f:
        writer = csv.writer(f)
        writer.writerow(['Product Title', 'Price'])  # Add headers
        writer.writerows(data)

    print("Scraping completed. Data saved to 'product_data.csv'.")

else:
    print(f"Failed to retrieve the webpage. Status code: {response.status_code}")

Exercise 4: Multithreading

Concepts:

Multithreading
Requests library
Threading library

Description: Write a Python script that uses multithreading to download multiple images from a URL list simultaneously.

Solution:

import requests
import threading

# URL list of images to download
url_list = ['https://www.example.com/image1.jpg', 'https://www.example.com/image2.jpg', 'https://www.example.com/image3.jpg']

# Function to download an image from a URL
def download_image(url):
    response = requests.get(url)
    with open(url.split('/')[-1], 'wb') as f:
        f.write(response.content)

# Create a thread for each URL and start them all simultaneously
threads = []
for url in url_list:
    thread = threading.Thread(target=download_image, args=(url,))
    threads.append(thread)
    thread.start()

# Wait for all threads to finish
for thread in threads:
    thread.join()

Exercise 5: Machine Learning

Concepts:

Machine learning
Scikit-learn library

Description: Write a Python script that trains a machine learning model on a dataset and uses it to predict the output for new data.

Solution:

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

# Read the dataset into a pandas dataframe
df = pd.read_csv('dataset.csv')

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(df[['feature1', 'feature2']], df['target'], test_size=0.2, random_state=42)

# Train a linear regression model on the training data
model = LinearRegression()
model.fit(X_train, y_train)

# Use the model to predict the output for the testing data
y_pred = model.predict(X_test)

# Evaluate the model performance using the mean squared error metric
mse = ((y_test - y_pred) ** 2).mean()
print("Mean squared error:", mse)

In this exercise, we first read a dataset into a pandas dataframe. Then, we split the data into training and testing sets using the train_test_split function from the sklearn.model_selection module. We trained a linear regression model on the training data using the LinearRegression class from the sklearn.linear_model module. Finally, we used the trained model to predict the output for the testing data and evaluated the model performance using the mean squared error metric.

Exercise 6: Natural Language Processing

Concepts:

Natural Language Processing
Sentiment Analysis
NLTK library

Description: Write a Python script that reads a text file and performs sentiment analysis on the text using a pre-trained NLP model.

Solution:

import nltk
from nltk.sentiment.vader import SentimentIntensityAnalyzer

# Ensure the VADER lexicon is downloaded
nltk.download('vader_lexicon')

# Read the text file into a string
with open('input_file.txt', 'r', encoding='utf-8') as f:
    text = f.read()

# Create a SentimentIntensityAnalyzer object
sid = SentimentIntensityAnalyzer()

# Perform sentiment analysis on the text
scores = sid.polarity_scores(text)

# Print the sentiment scores
print(scores)

In this exercise, we first read a text file into a string. Then, we create a SentimentIntensityAnalyzer object from the nltk.sentiment.vader module. We use the polarity_scores method of the SentimentIntensityAnalyzer object to perform sentiment analysis on the text and get a dictionary of sentiment scores.

Exercise 7: Web Development

Concepts:

Web Development
Flask framework
File Uploads

Description: Write a Python script that creates a web application using the Flask framework that allows users to upload a file and performs some processing on the file.

Solution:

from flask import Flask, render_template, request
import os

app = Flask(__name__)

# Set the path for file uploads
UPLOAD_FOLDER = 'uploads'
app.config['UPLOAD_FOLDER'] = UPLOAD_FOLDER

# Ensure the upload directory exists
if not os.path.exists(UPLOAD_FOLDER):
    os.makedirs(UPLOAD_FOLDER)

# Route for the home page
@app.route('/')
def index():
    return render_template('index.html')

# Route for file uploads
@app.route('/upload', methods=['POST'])
def upload():
    if 'file' not in request.files:
        return 'No file part', 400

    file = request.files['file']

    if file.filename == '':
        return 'No selected file', 400

    # Save the file to the uploads folder
    file.save(os.path.join(app.config['UPLOAD_FOLDER'], file.filename))

    return 'File uploaded successfully'

if __name__ == '__main__':
    app.run(debug=True)

In this exercise, we first import the Flask module and create a Flask application. We set up a route for the home page that returns an HTML template. We set up a route for file uploads that receives an uploaded file and saves it to a designated uploads folder. We can perform processing on the uploaded file inside the upload function.

Exercise 8: Data Visualization

Concepts:

Data Visualization
Matplotlib library
Candlestick Charts

Description: Write a Python script that reads a CSV file containing stock market data and plots a candlestick chart of the data.

Solution:

import pandas as pd
import matplotlib.pyplot as plt
import mplfinance as mpf

# Read the CSV file into a pandas dataframe
df = pd.read_csv('stock_data.csv', parse_dates=['Date'])
df.set_index('Date', inplace=True)  # Set Date as index

# Plot the candlestick chart using mplfinance
mpf.plot(df, type='candle', style='charles', title='Stock Market Data', ylabel='Price')

# Display the chart
plt.show()

In this exercise, we first read a CSV file containing stock market data into a pandas dataframe. We convert the date column to Matplotlib dates format and create a figure and axis objects. We plot the candlestick chart using the candlestick_ohlc function from the mpl_finance module. We format the x-axis as dates and set the axis labels and title. Finally, we display the chart using the show function from the matplotlib.pyplot module.

Exercise 9: Machine Learning

Concepts:

Machine Learning
Scikit-learn library

Description: Write a Python script that reads a dataset containing information about different types of flowers and trains a machine learning model to predict the type of a flower based on its features.

Solution:

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# Read the dataset into a pandas dataframe
df = pd.read_csv('flower_data.csv')

# Check for missing values
if df.isnull().sum().sum() > 0:
    df = df.dropna()  # Drop rows with missing values

# Define feature columns and target column
X = df[['sepal_length', 'sepal_width', 'petal_length', 'petal_width']]
y = df['species']

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize the feature values
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Train a logistic regression model on the training data
model = LogisticRegression(solver='saga', max_iter=5000)  # Increased iterations & changed solver
model.fit(X_train, y_train)

# Use the model to predict the output for the testing data
y_pred = model.predict(X_test)

# Evaluate the model performance using the accuracy score metric
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

In this exercise, we first read a dataset containing information about different types of flowers into a pandas dataframe. We split the data into training and testing sets using the train_test_split function from the sklearn.model_selection module. We trained a logistic regression model on the training data using the LogisticRegression class from the sklearn.linear_model module. Finally, we used the trained model to predict the output for the testing data and evaluated the model performance using the accuracy score metric.

Exercise 10: Data Analysis

Concepts:

Data Analysis
Recommendation Systems
Collaborative Filtering
Surprise library

Description: Write a Python script that reads a CSV file containing customer purchase data and generates a recommendation system that recommends products to customers based on their purchase history.

Solution:

import pandas as pd
from surprise import Dataset, Reader, SVD, accuracy
from surprise.model_selection import train_test_split

# Read the CSV file into a pandas dataframe
df = pd.read_csv('purchase_data.csv')

# Ensure that the dataset has no missing values
df = df.dropna(subset=['customer_id', 'product_id', 'rating'])

# Convert the pandas dataframe to a Surprise dataset
reader = Reader(rating_scale=(1, 5))
data = Dataset.load_from_df(df[['customer_id', 'product_id', 'rating']], reader)

# Split the data into training and testing sets
trainset, testset = train_test_split(data, test_size=0.2)

# Train an SVD model on the training data
model = SVD(n_factors=50, n_epochs=20, lr_all=0.005, reg_all=0.02)
model.fit(trainset)

# Use the model to predict the output for the testing data
predictions = model.test(testset)

# Evaluate the model performance using the root mean squared error metric
rmse = accuracy.rmse(predictions)
print("RMSE:", rmse)

# Recommend products to customers based on their purchase history
customer_ids = df['customer_id'].unique()
product_ids = df['product_id'].unique()

recommendations = {}

for customer_id in customer_ids:
    purchased_products = set(df[df['customer_id'] == customer_id]['product_id'].values)
    potential_recommendations = []

    for product_id in product_ids:
        if product_id not in purchased_products:
            pred = model.predict(customer_id, product_id)
            potential_recommendations.append((product_id, pred.est))

    # Sort by predicted rating and take the top 5 recommendations
    top_recommendations = sorted(potential_recommendations, key=lambda x: x[1], reverse=True)[:5]
    recommendations[customer_id] = top_recommendations

# Display recommendations
for customer, recs in recommendations.items():
    print(f"Customer {customer} recommended products: {recs}")

In this exercise, we first read a CSV file containing customer purchase data into a pandas dataframe. We convert the pandas dataframe to a surprise dataset using the Reader and Dataset classes from the surprise module. We split the data into training and testing sets using the train_test_split function from the surprise.model_selection module. We trained an SVD model on the training data using the SVD class from the surprise module. We used the trained model to predict the output for the testing data and evaluated the model performance using the root mean squared error metric. Finally, we recommended products to customers based on their purchase history using the trained model.

Exercise 11: Computer Vision

Concepts:

Computer Vision
Object Detection
OpenCV library
Pre-trained models

Description: Write a Python script that reads an image and performs object detection on the image using a pre-trained object detection model.

Solution:

import cv2
import numpy as np

# Read the image file
img = cv2.imread('image.jpg')

# Check if the image is loaded correctly
if img is None:
    raise FileNotFoundError("Error: Image file not found or unable to load.")

# Load the pre-trained object detection model
model = cv2.dnn.readNetFromTensorflow('frozen_inference_graph.pb', 'ssd_mobilenet_v2_coco_2018_03_29.pbtxt')

# Prepare the input image for the model
blob = cv2.dnn.blobFromImage(img, size=(300, 300), swapRB=True, crop=False)
model.setInput(blob)

# Perform object detection
output = model.forward()

# Loop through detected objects and draw bounding boxes
h, w, _ = img.shape  # Get image dimensions
for detection in output[0, 0, :, :]:
    confidence = float(detection[2])
    if confidence > 0.5:
        x1 = int(detection[3] * w)
        y1 = int(detection[4] * h)
        x2 = int(detection[5] * w)
        y2 = int(detection[6] * h)

        # Draw bounding box with label and confidence score
        cv2.rectangle(img, (x1, y1), (x2, y2), (0, 255, 0), 2)
        label = f'Confidence: {confidence:.2f}'
        cv2.putText(img, label, (x1, y1 - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)

# Display the image with detections
cv2.imshow('Object Detection', img)
cv2.waitKey(0)
cv2.destroyAllWindows()

In this exercise, we first read an image file into a NumPy array using the imread function from the cv2 module of OpenCV. We load a pre-trained object detection model using the readNetFromTensorflow function from the cv2.dnn module. We set the input image to the model and perform object detection using the setInput and forward methods of the model object. Finally, we loop through the detected objects and draw bounding boxes around them using the rectangle function from the cv2 module.

Exercise 12: Natural Language Processing

Concepts:

Natural Language Processing
Topic Modeling
Latent Dirichlet Allocation
Gensim library

Description: Write a Python script that reads a text file and performs topic modeling on the text using Latent Dirichlet Allocation (LDA).

Solution:

import gensim
from gensim import corpora
from gensim.models import LdaModel

# Read the text file into a list of strings
with open('input_file.txt', 'r') as f:
    text = f.readlines()

# Remove newlines and convert to lowercase
text = [line.strip().lower() for line in text]

# Tokenize the text into words
tokens = [line.split() for line in text]

# Create a dictionary of words and their frequency
dictionary = corpora.Dictionary(tokens)

# Create a bag-of-words representation of the text
corpus = [dictionary.doc2bow(token) for token in tokens]

# Train an LDA model on the text
model = LdaModel(corpus, id2word=dictionary, num_topics=5, passes=10)

# Print the topics and their associated words
for topic in model.print_topics(num_words=5):
    print(topic)

In this exercise, we first read a text file into a list of strings. We preprocess the text by removing newlines, converting to lowercase, and tokenizing into words using the split method. We create a dictionary of words and their frequency and create a bag-of-words representation of the text using the doc2bow method of the dictionary object. We train an LDA model on the corpus using the LdaModel class from the gensim.models module. Finally, we print the topics and their associated words using the print_topics method of the model object.

Exercise 13: Web Scraping

Concepts:

Web Scraping
Beautiful Soup library
Requests library
CSV file handling

Description: Write a Python script that scrapes a website for product information and saves the information to a CSV file.

Solution:

import requests
from bs4 import BeautifulSoup
import csv

# Define the URL of the website to scrape
url = 'https://www.example.com/products'

# Add headers to mimic a browser request
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)'}

# Send a request to the website
response = requests.get(url, headers=headers)

# Check if the request was successful
if response.status_code != 200:
    print(f"Failed to fetch data. Status Code: {response.status_code}")
    exit()

# Parse the HTML content of the response using BeautifulSoup
soup = BeautifulSoup(response.content, 'html.parser')

# Find all the product listings on the page
listings = soup.find_all('div', class_='product-listing')

# Write the product information to a CSV file
with open('products.csv', 'w', newline='', encoding='utf-8') as f:
    writer = csv.writer(f)
    writer.writerow(['Product Name', 'Price', 'Description'])

    for listing in listings:
        name = listing.find('h3')
        price = listing.find('span', class_='price')
        description = listing.find('p')

        # Extract text safely, handling missing elements
        name = name.get_text(strip=True) if name else 'N/A'
        price = price.get_text(strip=True) if price else 'N/A'
        description = description.get_text(strip=True) if description else 'N/A'

        writer.writerow([name, price, description])

print("Scraping completed. Data saved to 'products.csv'.")

In this exercise, we first define the URL of the website to scrape and send a request to the website using the get function from the requests module. We parse the HTML content of the response using Beautiful Soup and find all the product listings on the page using the find_all method. We write the product information to a CSV file using the csv module.

Exercise 14: Big Data Processing

Concepts:

Big Data Processing
PySpark
Data Transformations
Aggregation
Parquet file format

Description: Write a PySpark script that reads a CSV file containing customer purchase data, performs some data transformations and aggregation, and saves the results to a Parquet file.

Solution:

from pyspark.sql import SparkSession

# Create a SparkSession object
spark = SparkSession.builder.appName('customer-purchases').getOrCreate()

# Verify if the file exists before reading (optional but useful)
import os
if not os.path.exists('customer_purchases.csv'):
    raise FileNotFoundError("Error: The file 'customer_purchases.csv' does not exist.")

# Read the CSV file into a Spark DataFrame
df = spark.read.csv('customer_purchases.csv', header=True, inferSchema=True)

# Perform some data transformations
df = df.filter((df['purchase_date'] >= '2020-01-01') & (df['purchase_date'] <= '2020-12-31'))
df = df.select('customer_id', 'product_id', 'price')

# Group by customer and calculate total spending
df = df.groupBy('customer_id').sum('price').withColumnRenamed('sum(price)', 'total_spent')

# Save the results to a Parquet file
df.write.mode('overwrite').parquet('customer_spending.parquet')

print("Processing completed. Data saved to 'customer_spending.parquet'.")

In this exercise, we first create a SparkSession object using the SparkSession class from the pyspark.sql module. We read a CSV file containing customer purchase data into a Spark DataFrame using the read.csv method. We perform some data transformations on the DataFrame using the filter, select, and groupBy methods. Finally, we save the results to a Parquet file using the write.parquet method.

Exercise 15: DevOps

Concepts:

DevOps
Fabric library

Description: Write a Python script that automates the deployment of a web application to a remote server using the Fabric library.

Solution:

from fabric import Connection
import getpass

# Define the host and user credentials for the remote server
host = 'example.com'
user = 'user'
password = getpass.getpass("Enter SSH password: ")  # Secure password entry

# Define the path to the web application on the local machine and the remote server
local_path = '/path/to/local/app'
remote_path = '/path/to/remote/app'

# Create a connection to the remote server
c = Connection(host=host, user=user, connect_kwargs={'password': password})

# Ensure the remote directory exists
c.run(f'mkdir -p {remote_path}')

# Upload the local files to the remote server
c.put(local_path, remote_path, recursive=True)  # Enables recursive copy

# Change to the application directory
with c.cd(remote_path):
    # Install required dependencies
    c.run('sudo apt-get update && sudo apt-get install -y python3-pip')
    c.run('pip3 install -r requirements.txt')

    # Start the web application in the background
    c.run('nohup python3 app.py > app.log 2>&1 &', pty=False)

print("Deployment completed successfully.")

In this exercise, we first define the host and user credentials for the remote server. We define the path to the web application on the local machine and the remote server. We create a connection to the remote server using the Connection class from the fabric module. We upload the local files to the remote server using the put method of the connection object. We install any required dependencies on the remote server using the run method of the connection object. Finally, we start the web application on the remote server using the run method.

Exercise 16: Reinforcement Learning

Concepts:

Reinforcement Learning
Q-Learning
OpenAI Gym library

Description: Write a Python script that implements a reinforcement learning algorithm to teach an agent to play a simple game.

Solution:

import gym
import numpy as np
import time

# Create the FrozenLake environment
env = gym.make("FrozenLake-v1", is_slippery=True)

# Initialize the Q-table
Q = np.zeros([env.observation_space.n, env.action_space.n])

# Set hyperparameters
alpha = 0.8  # Learning rate
gamma = 0.95  # Discount factor
epsilon = 0.1  # Exploration probability
num_episodes = 2000  # Training episodes

# Train the agent using Q-learning
for episode in range(num_episodes):
    state, _ = env.reset()
    done = False

    while not done:
        # Choose action using epsilon-greedy policy
        if np.random.uniform() < epsilon:
            action = env.action_space.sample()  # Random action (exploration)
        else:
            action = np.argmax(Q[state, :])  # Best action from Q-table

        # Take the action and observe the next state
        next_state, reward, done, _, _ = env.step(action)

        # Update Q-value using the Bellman equation
        Q[state, action] = (1 - alpha) * Q[state, action] + alpha * (reward + gamma * np.max(Q[next_state, :]))

        # Move to the next state
        state = next_state

# Test the agent by playing the game
state, _ = env.reset()
done = False
print("\nTesting trained agent:\n")

while not done:
    action = np.argmax(Q[state, :])
    next_state, reward, done, _, _ = env.step(action)
    
    # Render the environment
    env.render()
    time.sleep(0.5)  # Pause for visibility
    
    state = next_state

print("\nGame Over!")

In this exercise, we first create an OpenAI Gym environment for the game using the make function from the gym module. We define the Q-table for the agent as a NumPy array and set the hyperparameters for the Q-learning algorithm. We train the agent using the Q-learning algorithm by looping through a specified number of episodes and updating the Q-table based on the rewards and next states. Finally, we test the agent by playing the game using the Q-table and visualizing the game using the render method.

Exercise 17: Time Series Analysis

Concepts:

Time Series Analysis
Data Preprocessing
Data Visualization
ARIMA model
Statsmodels library

Description: Write a Python script that reads a CSV file containing time series data, performs some data preprocessing and visualization, and fits a time series model to the data.

Solution:

import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm

# Read the CSV file into a pandas dataframe
df = pd.read_csv('time_series.csv')

# Convert the date column to a datetime object and set it as the index
df['date'] = pd.to_datetime(df['date'])
df.set_index('date', inplace=True)

# Check for missing values before resampling
if df.isnull().values.any():
    df = df.fillna(method='ffill')

# Ensure the column name is correct
target_col = df.columns[0]  # Assuming first column is the time series value

# Resample the data to a monthly frequency
df = df.resample('M').mean()

# Plot the time series data
plt.figure(figsize=(10, 5))
plt.plot(df.index, df[target_col], label="Time Series")
plt.xlabel("Date")
plt.ylabel("Value")
plt.title("Time Series Visualization")
plt.legend()
plt.grid()
plt.show()

# Fit an ARIMA model
model = sm.tsa.ARIMA(df[target_col].dropna(), order=(1, 1, 1))  # Use dropna() to avoid errors
results = model.fit()

# Print the model summary
print(results.summary())

In this exercise, we first read a CSV file containing time series data into a pandas dataframe. We convert the date column to a datetime object and set it as the index. We resample the data to a monthly frequency and fill any missing values using forward fill. We visualize the data using the plot function from the matplotlib.pyplot module. Finally, we fit an ARIMA model to the data using the ARIMAfunction from the statsmodels.api module and print the summary of the model using the summary method of the results object.

Exercise 18: Computer Networking

Concepts:

Computer Networking
TCP/IP Protocol
Socket Programming

Description: Write a Python script that implements a simple TCP server that accepts client connections and sends and receives data.

Solution:

import socket

# Define the host and port for the server
host = 'localhost'
port = 12345

# Create a socket object
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)

# Bind the socket to the host and port
s.bind((host, port))

# Listen for incoming connections
s.listen(1)
print('Server listening on', host, port)

# Accept a client connection
conn, addr = s.accept()
print('Connected by', addr)

# Send data to the client
conn.sendall(b'Hello, client!')

# Receive data from the client
data = conn.recv(1024)
print('Received:', data.decode())

# Close the connection
conn.close()

In this exercise, we first define the host and port for the server. We create a socket object using the socket function from the socket module and bind the socket to the host and port using the bind method. We listen for incoming connections using the listen method and accept a client connection using the accept method, which returns a connection object and the address of the client. We send data to the client using the sendall method of the connection object and receive data from the client using the recv method. Finally, we close the connection using the close method.

Exercise 19: Data Analysis and Visualization

Concepts:

Data Analysis
Data Visualization
PDF Report Generation
Pandas library
Matplotlib library
ReportLab library

Description: Write a Python script that reads a CSV file containing sales data for a retail store, performs some data analysis and visualization, and saves the results to a PDF report.

Solution:

import pandas as pd
import matplotlib.pyplot as plt
from reportlab.lib.pagesizes import letter
from reportlab.pdfgen import canvas
import os

# Read the CSV file into a pandas dataframe
df = pd.read_csv('sales_data.csv')

# Calculate the total sales by category and month
totals = df.groupby(['category', 'month'])['sales'].sum()

# Get unique categories
categories = df['category'].unique()

# Create subplots dynamically based on the number of categories
fig, axes = plt.subplots(nrows=len(categories), ncols=1, figsize=(8.5, 11))

# Ensure `axes` is always iterable (even if there's only one category)
if len(categories) == 1:
    axes = [axes]

# Plot total sales by category and month
for i, category in enumerate(categories):
    totals.loc[category].plot(ax=axes[i], kind='bar', title=f"Category: {category}")
    axes[i].set_ylabel("Sales")

plt.tight_layout()
plt.savefig('sales_plot.png')  # Save the figure
plt.close(fig)  # Close to free memory

# Create a PDF report
pdf_filename = 'sales_report.pdf'
c = canvas.Canvas(pdf_filename, pagesize=letter)

# Add title and description
c.setFont("Helvetica-Bold", 16)
c.drawString(50, 750, 'Sales Report')

c.setFont("Helvetica", 12)
c.drawString(50, 730, 'Total Sales by Category and Month')

# Add the image to the PDF if it exists
if os.path.exists('sales_plot.png'):
    c.drawImage('sales_plot.png', 50, 450, width=500, height=300)

# Save and close the PDF
c.showPage()
c.save()

print(f"Report saved as {pdf_filename}")

In this exercise, we first read a CSV file containing sales data for a retail store into a pandas dataframe. We calculate the total sales by category and month using the groupby and sum methods. We plot the total sales by category and month using the plot function from the matplotlib.pyplot module and save the plot to a PNG file. Finally, we generate a PDF report using the Canvas and Image functions from the reportlab module.

Exercise 20: Machine Learning

Concepts:

Machine Learning
Convolutional Neural Networks
Keras library
MNIST dataset

Description: Write a Python script that trains a machine learning model to classify images of handwritten digits from the MNIST dataset.

Solution:

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

# Load the MNIST dataset
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

# Normalize the pixel values and reshape the data
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0
x_train = x_train.reshape(-1, 28, 28, 1)
x_test = x_test.reshape(-1, 28, 28, 1)

# Define the CNN model
model = keras.Sequential([
    layers.Conv2D(32, (3, 3), activation='relu', padding='same', input_shape=(28, 28, 1)),
    layers.MaxPooling2D((2, 2)),

    layers.Conv2D(64, (3, 3), activation='relu', padding='same'),
    layers.MaxPooling2D((2, 2)),

    layers.Conv2D(128, (3, 3), activation='relu', padding='same'),
    layers.MaxPooling2D((2, 2)),

    layers.Flatten(),
    layers.Dense(128, activation='relu'),  # Added a fully connected layer
    layers.Dropout(0.5),  # Prevent overfitting
    layers.Dense(10, activation='softmax')
])

# Compile the model
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# Train the model
model.fit(x_train, y_train, epochs=10, validation_data=(x_test, y_test), batch_size=64)

# Evaluate the model on the test data
test_loss, test_acc = model.evaluate(x_test, y_test, verbose=2)
print('Test accuracy:', test_acc)

In this exercise, we first load the MNIST dataset using the load_data function from the keras.datasets.mnist module. We normalize the pixel values and reshape the data using NumPy. We define a convolutional neural network model using the Sequential class and various layers from the layers module of Keras. We compile the model using the compile method with the Adam optimizer and sparse categorical crossentropy loss function. We train the model using the fit method and evaluate the model on the test data using the evaluate method.

Exercise 21: Natural Language Processing

Concepts:

Natural Language Processing

Text Preprocessing

Text Representation

Topic Modeling

Latent Dirichlet Allocation

Gensim library

Description: Write a Python script that uses natural language processing techniques to analyze a corpus of text data and extract useful insights.

Solution:

import gensim
from gensim import corpora
from gensim.models import LdaModel
import pandas as pd
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize

# Download required resources
nltk.download('stopwords')
nltk.download('punkt')

# Read the text data into a pandas dataframe
df = pd.read_csv('text_data.csv')

# Handle missing values
df['text'] = df['text'].fillna('')

# Define stop words and clean text
stop_words = set(stopwords.words('english'))

def preprocess_text(text):
    tokens = word_tokenize(text.lower())  # Tokenization & lowercasing
    return [word for word in tokens if word.isalnum() and word not in stop_words]  # Remove punctuation & stopwords

df['cleaned_text'] = df['text'].apply(preprocess_text)

# Create a document-term matrix
texts = df['cleaned_text'].tolist()
dictionary = corpora.Dictionary(texts)
corpus = [dictionary.doc2bow(text) for text in texts]

# Train LDA model
num_topics = 5
lda_model = LdaModel(corpus, num_topics=num_topics, id2word=dictionary, passes=10)

# Print topics and top words for each
for topic_id, words in lda_model.show_topics(num_topics=num_topics, formatted=False):
    print(f'Topic {topic_id}:', ', '.join(word for word, _ in words))

# Convert topic distributions into a structured DataFrame
topic_dists = [{f"Topic_{topic}": prob for topic, prob in lda_model.get_document_topics(doc, minimum_probability=0)} for doc in corpus]
topic_df = pd.DataFrame(topic_dists)

# Merge topic distributions with original data
df = pd.concat([df, topic_df], axis=1)

# Save the results
df.to_csv('text_data_topics.csv', index=False)
print("Saved processed data to 'text_data_topics.csv'.")

In this exercise, we first read a corpus of text data into a pandas dataframe. We define the stop words using the stopwords function from the nltk.corpus module and remove them from the text data using list comprehension and apply method of pandas. We create a document-term matrix from the text data using the Dictionary and corpus functions from the gensim module. We perform topic modeling using latent Dirichlet allocation (LDA) using the LdaModel function and extract the topic distributions for each document. Finally, we save the results to a CSV file using the to_csv method of pandas.

Exercise 22: Web Scraping

Concepts:

Web Scraping
HTML Parsing
BeautifulSoup library
CSV File I/O

Description: Write a Python script that scrapes data from a website using the BeautifulSoup library and saves it to a CSV file.

Solution:

import requests
from bs4 import BeautifulSoup
import csv

# Define the URL to scrape
url = 'https://www.example.com'

# Headers to mimic a real browser request
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)'}

# Send a GET request
response = requests.get(url, headers=headers)

# Check if the request was successful
if response.status_code != 200:
    print(f"Error: Unable to fetch data (Status Code: {response.status_code})")
    exit()

# Parse the HTML content
soup = BeautifulSoup(response.content, 'html.parser')

# Extract data
data = []
for item in soup.find_all('div', class_='item'):
    name_tag = item.find('h3')
    price_tag = item.find('span', class_='price')

    # Extract text safely, handling missing elements
    name = name_tag.get_text(strip=True) if name_tag else 'N/A'
    price = price_tag.get_text(strip=True) if price_tag else 'N/A'

    data.append([name, price])

# Save to CSV
csv_filename = 'data.csv'
with open(csv_filename, 'w', newline='', encoding='utf-8') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow(['Name', 'Price'])  # Add headers
    writer.writerows(data)

print(f"Scraping completed. Data saved to '{csv_filename}'.")

In this exercise, we first define the URL to scrape using the requests library and parse the HTML content using the BeautifulSoup library. We extract the data from the HTML content using the find_all and find methods of the soup object. Finally, we save the data to a CSV file using the csv module.

Exercise 23: Database Interaction

Concepts:

Database Interaction
SQLite database
SQL queries
SQLite3 module

Description: Write a Python script that interacts with a database to retrieve and manipulate data.

Solution:

import sqlite3

# Connect to the database
conn = sqlite3.connect('example.db')

# Create a cursor object
c = conn.cursor()

# Execute an SQL query to create a table
c.execute('''CREATE TABLE IF NOT EXISTS customers
             (id INTEGER PRIMARY KEY, name TEXT, email TEXT, phone TEXT)''')

# Execute an SQL query to insert data into the table
c.execute("INSERT INTO customers (name, email, phone) VALUES ('John Smith', 'john@example.com', '555-1234')")

# Execute an SQL query to retrieve data from the table
c.execute("SELECT * FROM customers")
rows = c.fetchall()
for row in rows:
    print(row)

# Execute an SQL query to update data in the table
c.execute("UPDATE customers SET phone='555-5678' WHERE name='John Smith'")

# Execute an SQL query to delete data from the table
c.execute("DELETE FROM customers WHERE name='John Smith'")

# Commit the changes to the database
conn.commit()

# Close the database connection
conn.close()

In this exercise, we first connect to an SQLite database using the connect function from the sqlite3 module. We create a cursor object using the cursor method of the connection object and execute SQL queries using the execute method of the cursor object. We retrieve data from the table using the fetchall method and print the results. We update data in the table using the UPDATE statement and delete data from the table using the DELETE statement. Finally, we commit the changes to the database and close the connection.

Exercise 24: Parallel Processing

Concepts:

Parallel Processing
Multiprocessing
Process Pool
CPU-bound tasks

Description: Write a Python script that performs a time-consuming computation using parallel processing to speed up the computation.

Solution:

import time
import multiprocessing

# Define an optimized CPU-bound function
def compute(num):
    return num * (num - 1) // 2  # Uses O(1) formula instead of a loop

if __name__ == '__main__':
    # Create a process pool with the number of CPUs available
    num_cpus = multiprocessing.cpu_count()
    pool = multiprocessing.Pool(num_cpus)

    # Generate a list of numbers to compute
    num_list = [10000000] * num_cpus

    # Compute the results using parallel processing
    start_time = time.time()
    results = pool.map(compute, num_list)

    # Close the pool properly
    pool.close()
    pool.join()

    end_time = time.time()

    # Print the results and computation time
    print('Results:', results)
    print('Computation time:', end_time - start_time, 'seconds')

In this exercise, we first define a CPU-bound function that takes a long time to compute. We then create a process pool using the Pool function from the multiprocessing module with the number of CPUs available. We generate a list of numbers to compute and compute the results using the map method of the process pool. Finally, we print the results and computation time.

Exercise 25: Image Processing

Concepts:

Image Processing
Pillow library
Image Manipulation
Image Filtering

Description: Write a Python script that performs basic image processing operations on an image file.

Solution:

from PIL import Image, ImageFilter
import os

# Define image paths
input_path = 'example.jpg'
output_path = 'processed.jpg'

# Check if the input file exists
if not os.path.exists(input_path):
    raise FileNotFoundError(f"Error: The file '{input_path}' was not found.")

try:
    # Open the image file using a context manager
    with Image.open(input_path) as image:
        # Display the original image (optional, may not work in all environments)
        image.show()

        # Resize the image
        image = image.resize((500, 500))

        # Convert the image to grayscale
        image = image.convert('L')

        # Apply a Gaussian blur filter
        image = image.filter(ImageFilter.GaussianBlur(radius=2))

        # Save the processed image to a file
        image.save(output_path)

        # Display the processed image
        image.show()
        print(f"Processed image saved as '{output_path}'.")

except Exception as e:
    print(f"An error occurred: {e}")

In this exercise, we first open an image file using the Image class from the Pillow library. We resize the image using the resize method and convert it to grayscale using the convert method with the 'L' mode. We apply a Gaussian blur filter using the filter method with the GaussianBlur class from the ImageFilter module. Finally, we save the processed image to a file using the save method and display it using the show method.

I hope you find these exercises useful! Let me know if you have any further questions.

Advance Level Exercises Part 1

Exercise 1: File Parsing

Concepts:

File I/O
Regular expressions

Description: Write a Python script that reads a text file and extracts all URLs that are present in the file. The output should be a list of URLs.

Solution:

import re

# Open the file for reading
with open('input_file.txt', 'r') as f:
    # Read the file contents
    file_contents = f.read()

    # Use regular expression to extract URLs
    urls = re.findall(r'http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+', file_contents)

# Print the list of URLs
print(urls)

Exercise 2: Data Analysis

Concepts:

File I/O
Data manipulation
Pandas library

Description: Write a Python script that reads a CSV file containing sales data and calculates the total sales revenue for each product category.

Solution:

import pandas as pd

# Read the CSV file into a pandas dataframe
df = pd.read_csv('sales_data.csv')

# Group the data by product category and sum the sales revenue
total_revenue = df.groupby('Product Category')['Sales Revenue'].sum()

# Print the total revenue for each product category
print(total_revenue)

Exercise 3: Web Scraping

Concepts

Web scraping
Requests library
Beautiful Soup library
CSV file I/O

Description: Write a Python script that scrapes the title and price of all products listed on an e-commerce website and stores them in a CSV file.

Solution:

import requests
from bs4 import BeautifulSoup
import csv

# Define the target URL
url = 'https://www.example.com/products'

# Headers to mimic a real browser request
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)'}

# Make a GET request to the website
response = requests.get(url, headers=headers)

# Check if the request was successful
if response.status_code == 200:
    # Parse the HTML content using Beautiful Soup
    soup = BeautifulSoup(response.content, 'html.parser')

    # Find all product titles and prices
    titles = [title.get_text(strip=True) for title in soup.find_all('h3', class_='product-title')]
    prices = [price.get_text(strip=True) for price in soup.find_all('div', class_='product-price')]

    # Zip the titles and prices together
    data = list(zip(titles, prices))

    # Write the data to a CSV file with headers
    with open('product_data.csv', 'w', newline='', encoding='utf-8') as f:
        writer = csv.writer(f)
        writer.writerow(['Product Title', 'Price'])  # Add headers
        writer.writerows(data)

    print("Scraping completed. Data saved to 'product_data.csv'.")

else:
    print(f"Failed to retrieve the webpage. Status code: {response.status_code}")

Exercise 4: Multithreading

Concepts:

Multithreading
Requests library
Threading library

Description: Write a Python script that uses multithreading to download multiple images from a URL list simultaneously.

Solution:

import requests
import threading

# URL list of images to download
url_list = ['https://www.example.com/image1.jpg', 'https://www.example.com/image2.jpg', 'https://www.example.com/image3.jpg']

# Function to download an image from a URL
def download_image(url):
    response = requests.get(url)
    with open(url.split('/')[-1], 'wb') as f:
        f.write(response.content)

# Create a thread for each URL and start them all simultaneously
threads = []
for url in url_list:
    thread = threading.Thread(target=download_image, args=(url,))
    threads.append(thread)
    thread.start()

# Wait for all threads to finish
for thread in threads:
    thread.join()

Exercise 5: Machine Learning

Concepts:

Machine learning
Scikit-learn library

Description: Write a Python script that trains a machine learning model on a dataset and uses it to predict the output for new data.

Solution:

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

# Read the dataset into a pandas dataframe
df = pd.read_csv('dataset.csv')

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(df[['feature1', 'feature2']], df['target'], test_size=0.2, random_state=42)

# Train a linear regression model on the training data
model = LinearRegression()
model.fit(X_train, y_train)

# Use the model to predict the output for the testing data
y_pred = model.predict(X_test)

# Evaluate the model performance using the mean squared error metric
mse = ((y_test - y_pred) ** 2).mean()
print("Mean squared error:", mse)

Exercise 6: Natural Language Processing

Concepts:

Natural Language Processing
Sentiment Analysis
NLTK library

Description: Write a Python script that reads a text file and performs sentiment analysis on the text using a pre-trained NLP model.

Solution:

import nltk
from nltk.sentiment.vader import SentimentIntensityAnalyzer

# Ensure the VADER lexicon is downloaded
nltk.download('vader_lexicon')

# Read the text file into a string
with open('input_file.txt', 'r', encoding='utf-8') as f:
    text = f.read()

# Create a SentimentIntensityAnalyzer object
sid = SentimentIntensityAnalyzer()

# Perform sentiment analysis on the text
scores = sid.polarity_scores(text)

# Print the sentiment scores
print(scores)

Exercise 7: Web Development

Concepts:

Web Development
Flask framework
File Uploads

Description: Write a Python script that creates a web application using the Flask framework that allows users to upload a file and performs some processing on the file.

Solution:

from flask import Flask, render_template, request
import os

app = Flask(__name__)

# Set the path for file uploads
UPLOAD_FOLDER = 'uploads'
app.config['UPLOAD_FOLDER'] = UPLOAD_FOLDER

# Ensure the upload directory exists
if not os.path.exists(UPLOAD_FOLDER):
    os.makedirs(UPLOAD_FOLDER)

# Route for the home page
@app.route('/')
def index():
    return render_template('index.html')

# Route for file uploads
@app.route('/upload', methods=['POST'])
def upload():
    if 'file' not in request.files:
        return 'No file part', 400

    file = request.files['file']

    if file.filename == '':
        return 'No selected file', 400

    # Save the file to the uploads folder
    file.save(os.path.join(app.config['UPLOAD_FOLDER'], file.filename))

    return 'File uploaded successfully'

if __name__ == '__main__':
    app.run(debug=True)

Exercise 8: Data Visualization

Concepts:

Data Visualization
Matplotlib library
Candlestick Charts

Description: Write a Python script that reads a CSV file containing stock market data and plots a candlestick chart of the data.

Solution:

import pandas as pd
import matplotlib.pyplot as plt
import mplfinance as mpf

# Read the CSV file into a pandas dataframe
df = pd.read_csv('stock_data.csv', parse_dates=['Date'])
df.set_index('Date', inplace=True)  # Set Date as index

# Plot the candlestick chart using mplfinance
mpf.plot(df, type='candle', style='charles', title='Stock Market Data', ylabel='Price')

# Display the chart
plt.show()

Exercise 9: Machine Learning

Concepts:

Machine Learning
Scikit-learn library

Solution:

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# Read the dataset into a pandas dataframe
df = pd.read_csv('flower_data.csv')

# Check for missing values
if df.isnull().sum().sum() > 0:
    df = df.dropna()  # Drop rows with missing values

# Define feature columns and target column
X = df[['sepal_length', 'sepal_width', 'petal_length', 'petal_width']]
y = df['species']

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize the feature values
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Train a logistic regression model on the training data
model = LogisticRegression(solver='saga', max_iter=5000)  # Increased iterations & changed solver
model.fit(X_train, y_train)

# Use the model to predict the output for the testing data
y_pred = model.predict(X_test)

# Evaluate the model performance using the accuracy score metric
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

Exercise 10: Data Analysis

Concepts:

Data Analysis
Recommendation Systems
Collaborative Filtering
Surprise library

Description: Write a Python script that reads a CSV file containing customer purchase data and generates a recommendation system that recommends products to customers based on their purchase history.

Solution:

import pandas as pd
from surprise import Dataset, Reader, SVD, accuracy
from surprise.model_selection import train_test_split

# Read the CSV file into a pandas dataframe
df = pd.read_csv('purchase_data.csv')

# Ensure that the dataset has no missing values
df = df.dropna(subset=['customer_id', 'product_id', 'rating'])

# Convert the pandas dataframe to a Surprise dataset
reader = Reader(rating_scale=(1, 5))
data = Dataset.load_from_df(df[['customer_id', 'product_id', 'rating']], reader)

# Split the data into training and testing sets
trainset, testset = train_test_split(data, test_size=0.2)

# Train an SVD model on the training data
model = SVD(n_factors=50, n_epochs=20, lr_all=0.005, reg_all=0.02)
model.fit(trainset)

# Use the model to predict the output for the testing data
predictions = model.test(testset)

# Evaluate the model performance using the root mean squared error metric
rmse = accuracy.rmse(predictions)
print("RMSE:", rmse)

# Recommend products to customers based on their purchase history
customer_ids = df['customer_id'].unique()
product_ids = df['product_id'].unique()

recommendations = {}

for customer_id in customer_ids:
    purchased_products = set(df[df['customer_id'] == customer_id]['product_id'].values)
    potential_recommendations = []

    for product_id in product_ids:
        if product_id not in purchased_products:
            pred = model.predict(customer_id, product_id)
            potential_recommendations.append((product_id, pred.est))

    # Sort by predicted rating and take the top 5 recommendations
    top_recommendations = sorted(potential_recommendations, key=lambda x: x[1], reverse=True)[:5]
    recommendations[customer_id] = top_recommendations

# Display recommendations
for customer, recs in recommendations.items():
    print(f"Customer {customer} recommended products: {recs}")

Exercise 11: Computer Vision

Concepts:

Computer Vision
Object Detection
OpenCV library
Pre-trained models

Description: Write a Python script that reads an image and performs object detection on the image using a pre-trained object detection model.

Solution:

import cv2
import numpy as np

# Read the image file
img = cv2.imread('image.jpg')

# Check if the image is loaded correctly
if img is None:
    raise FileNotFoundError("Error: Image file not found or unable to load.")

# Load the pre-trained object detection model
model = cv2.dnn.readNetFromTensorflow('frozen_inference_graph.pb', 'ssd_mobilenet_v2_coco_2018_03_29.pbtxt')

# Prepare the input image for the model
blob = cv2.dnn.blobFromImage(img, size=(300, 300), swapRB=True, crop=False)
model.setInput(blob)

# Perform object detection
output = model.forward()

# Loop through detected objects and draw bounding boxes
h, w, _ = img.shape  # Get image dimensions
for detection in output[0, 0, :, :]:
    confidence = float(detection[2])
    if confidence > 0.5:
        x1 = int(detection[3] * w)
        y1 = int(detection[4] * h)
        x2 = int(detection[5] * w)
        y2 = int(detection[6] * h)

        # Draw bounding box with label and confidence score
        cv2.rectangle(img, (x1, y1), (x2, y2), (0, 255, 0), 2)
        label = f'Confidence: {confidence:.2f}'
        cv2.putText(img, label, (x1, y1 - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)

# Display the image with detections
cv2.imshow('Object Detection', img)
cv2.waitKey(0)
cv2.destroyAllWindows()

Exercise 12: Natural Language Processing

Concepts:

Natural Language Processing
Topic Modeling
Latent Dirichlet Allocation
Gensim library

Description: Write a Python script that reads a text file and performs topic modeling on the text using Latent Dirichlet Allocation (LDA).

Solution:

import gensim
from gensim import corpora
from gensim.models import LdaModel

# Read the text file into a list of strings
with open('input_file.txt', 'r') as f:
    text = f.readlines()

# Remove newlines and convert to lowercase
text = [line.strip().lower() for line in text]

# Tokenize the text into words
tokens = [line.split() for line in text]

# Create a dictionary of words and their frequency
dictionary = corpora.Dictionary(tokens)

# Create a bag-of-words representation of the text
corpus = [dictionary.doc2bow(token) for token in tokens]

# Train an LDA model on the text
model = LdaModel(corpus, id2word=dictionary, num_topics=5, passes=10)

# Print the topics and their associated words
for topic in model.print_topics(num_words=5):
    print(topic)

Exercise 13: Web Scraping

Concepts:

Web Scraping
Beautiful Soup library
Requests library
CSV file handling

Description: Write a Python script that scrapes a website for product information and saves the information to a CSV file.

Solution:

import requests
from bs4 import BeautifulSoup
import csv

# Define the URL of the website to scrape
url = 'https://www.example.com/products'

# Add headers to mimic a browser request
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)'}

# Send a request to the website
response = requests.get(url, headers=headers)

# Check if the request was successful
if response.status_code != 200:
    print(f"Failed to fetch data. Status Code: {response.status_code}")
    exit()

# Parse the HTML content of the response using BeautifulSoup
soup = BeautifulSoup(response.content, 'html.parser')

# Find all the product listings on the page
listings = soup.find_all('div', class_='product-listing')

# Write the product information to a CSV file
with open('products.csv', 'w', newline='', encoding='utf-8') as f:
    writer = csv.writer(f)
    writer.writerow(['Product Name', 'Price', 'Description'])

    for listing in listings:
        name = listing.find('h3')
        price = listing.find('span', class_='price')
        description = listing.find('p')

        # Extract text safely, handling missing elements
        name = name.get_text(strip=True) if name else 'N/A'
        price = price.get_text(strip=True) if price else 'N/A'
        description = description.get_text(strip=True) if description else 'N/A'

        writer.writerow([name, price, description])

print("Scraping completed. Data saved to 'products.csv'.")

Exercise 14: Big Data Processing

Concepts:

Big Data Processing
PySpark
Data Transformations
Aggregation
Parquet file format

Description: Write a PySpark script that reads a CSV file containing customer purchase data, performs some data transformations and aggregation, and saves the results to a Parquet file.

Solution:

from pyspark.sql import SparkSession

# Create a SparkSession object
spark = SparkSession.builder.appName('customer-purchases').getOrCreate()

# Verify if the file exists before reading (optional but useful)
import os
if not os.path.exists('customer_purchases.csv'):
    raise FileNotFoundError("Error: The file 'customer_purchases.csv' does not exist.")

# Read the CSV file into a Spark DataFrame
df = spark.read.csv('customer_purchases.csv', header=True, inferSchema=True)

# Perform some data transformations
df = df.filter((df['purchase_date'] >= '2020-01-01') & (df['purchase_date'] <= '2020-12-31'))
df = df.select('customer_id', 'product_id', 'price')

# Group by customer and calculate total spending
df = df.groupBy('customer_id').sum('price').withColumnRenamed('sum(price)', 'total_spent')

# Save the results to a Parquet file
df.write.mode('overwrite').parquet('customer_spending.parquet')

print("Processing completed. Data saved to 'customer_spending.parquet'.")

Exercise 15: DevOps

Concepts:

DevOps
Fabric library

Description: Write a Python script that automates the deployment of a web application to a remote server using the Fabric library.

Solution:

from fabric import Connection
import getpass

# Define the host and user credentials for the remote server
host = 'example.com'
user = 'user'
password = getpass.getpass("Enter SSH password: ")  # Secure password entry

# Define the path to the web application on the local machine and the remote server
local_path = '/path/to/local/app'
remote_path = '/path/to/remote/app'

# Create a connection to the remote server
c = Connection(host=host, user=user, connect_kwargs={'password': password})

# Ensure the remote directory exists
c.run(f'mkdir -p {remote_path}')

# Upload the local files to the remote server
c.put(local_path, remote_path, recursive=True)  # Enables recursive copy

# Change to the application directory
with c.cd(remote_path):
    # Install required dependencies
    c.run('sudo apt-get update && sudo apt-get install -y python3-pip')
    c.run('pip3 install -r requirements.txt')

    # Start the web application in the background
    c.run('nohup python3 app.py > app.log 2>&1 &', pty=False)

print("Deployment completed successfully.")

Exercise 16: Reinforcement Learning

Concepts:

Reinforcement Learning
Q-Learning
OpenAI Gym library

Description: Write a Python script that implements a reinforcement learning algorithm to teach an agent to play a simple game.

Solution:

import gym
import numpy as np
import time

# Create the FrozenLake environment
env = gym.make("FrozenLake-v1", is_slippery=True)

# Initialize the Q-table
Q = np.zeros([env.observation_space.n, env.action_space.n])

# Set hyperparameters
alpha = 0.8  # Learning rate
gamma = 0.95  # Discount factor
epsilon = 0.1  # Exploration probability
num_episodes = 2000  # Training episodes

# Train the agent using Q-learning
for episode in range(num_episodes):
    state, _ = env.reset()
    done = False

    while not done:
        # Choose action using epsilon-greedy policy
        if np.random.uniform() < epsilon:
            action = env.action_space.sample()  # Random action (exploration)
        else:
            action = np.argmax(Q[state, :])  # Best action from Q-table

        # Take the action and observe the next state
        next_state, reward, done, _, _ = env.step(action)

        # Update Q-value using the Bellman equation
        Q[state, action] = (1 - alpha) * Q[state, action] + alpha * (reward + gamma * np.max(Q[next_state, :]))

        # Move to the next state
        state = next_state

# Test the agent by playing the game
state, _ = env.reset()
done = False
print("\nTesting trained agent:\n")

while not done:
    action = np.argmax(Q[state, :])
    next_state, reward, done, _, _ = env.step(action)
    
    # Render the environment
    env.render()
    time.sleep(0.5)  # Pause for visibility
    
    state = next_state

print("\nGame Over!")

Exercise 17: Time Series Analysis

Concepts:

Time Series Analysis
Data Preprocessing
Data Visualization
ARIMA model
Statsmodels library

Description: Write a Python script that reads a CSV file containing time series data, performs some data preprocessing and visualization, and fits a time series model to the data.

Solution:

import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm

# Read the CSV file into a pandas dataframe
df = pd.read_csv('time_series.csv')

# Convert the date column to a datetime object and set it as the index
df['date'] = pd.to_datetime(df['date'])
df.set_index('date', inplace=True)

# Check for missing values before resampling
if df.isnull().values.any():
    df = df.fillna(method='ffill')

# Ensure the column name is correct
target_col = df.columns[0]  # Assuming first column is the time series value

# Resample the data to a monthly frequency
df = df.resample('M').mean()

# Plot the time series data
plt.figure(figsize=(10, 5))
plt.plot(df.index, df[target_col], label="Time Series")
plt.xlabel("Date")
plt.ylabel("Value")
plt.title("Time Series Visualization")
plt.legend()
plt.grid()
plt.show()

# Fit an ARIMA model
model = sm.tsa.ARIMA(df[target_col].dropna(), order=(1, 1, 1))  # Use dropna() to avoid errors
results = model.fit()

# Print the model summary
print(results.summary())

Exercise 18: Computer Networking

Concepts:

Computer Networking
TCP/IP Protocol
Socket Programming

Description: Write a Python script that implements a simple TCP server that accepts client connections and sends and receives data.

Solution:

import socket

# Define the host and port for the server
host = 'localhost'
port = 12345

# Create a socket object
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)

# Bind the socket to the host and port
s.bind((host, port))

# Listen for incoming connections
s.listen(1)
print('Server listening on', host, port)

# Accept a client connection
conn, addr = s.accept()
print('Connected by', addr)

# Send data to the client
conn.sendall(b'Hello, client!')

# Receive data from the client
data = conn.recv(1024)
print('Received:', data.decode())

# Close the connection
conn.close()

Exercise 19: Data Analysis and Visualization

Concepts:

Data Analysis
Data Visualization
PDF Report Generation
Pandas library
Matplotlib library
ReportLab library

Description: Write a Python script that reads a CSV file containing sales data for a retail store, performs some data analysis and visualization, and saves the results to a PDF report.

Solution:

import pandas as pd
import matplotlib.pyplot as plt
from reportlab.lib.pagesizes import letter
from reportlab.pdfgen import canvas
import os

# Read the CSV file into a pandas dataframe
df = pd.read_csv('sales_data.csv')

# Calculate the total sales by category and month
totals = df.groupby(['category', 'month'])['sales'].sum()

# Get unique categories
categories = df['category'].unique()

# Create subplots dynamically based on the number of categories
fig, axes = plt.subplots(nrows=len(categories), ncols=1, figsize=(8.5, 11))

# Ensure `axes` is always iterable (even if there's only one category)
if len(categories) == 1:
    axes = [axes]

# Plot total sales by category and month
for i, category in enumerate(categories):
    totals.loc[category].plot(ax=axes[i], kind='bar', title=f"Category: {category}")
    axes[i].set_ylabel("Sales")

plt.tight_layout()
plt.savefig('sales_plot.png')  # Save the figure
plt.close(fig)  # Close to free memory

# Create a PDF report
pdf_filename = 'sales_report.pdf'
c = canvas.Canvas(pdf_filename, pagesize=letter)

# Add title and description
c.setFont("Helvetica-Bold", 16)
c.drawString(50, 750, 'Sales Report')

c.setFont("Helvetica", 12)
c.drawString(50, 730, 'Total Sales by Category and Month')

# Add the image to the PDF if it exists
if os.path.exists('sales_plot.png'):
    c.drawImage('sales_plot.png', 50, 450, width=500, height=300)

# Save and close the PDF
c.showPage()
c.save()

print(f"Report saved as {pdf_filename}")

Exercise 20: Machine Learning

Concepts:

Machine Learning
Convolutional Neural Networks
Keras library
MNIST dataset

Description: Write a Python script that trains a machine learning model to classify images of handwritten digits from the MNIST dataset.

Solution:

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

# Load the MNIST dataset
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

# Normalize the pixel values and reshape the data
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0
x_train = x_train.reshape(-1, 28, 28, 1)
x_test = x_test.reshape(-1, 28, 28, 1)

# Define the CNN model
model = keras.Sequential([
    layers.Conv2D(32, (3, 3), activation='relu', padding='same', input_shape=(28, 28, 1)),
    layers.MaxPooling2D((2, 2)),

    layers.Conv2D(64, (3, 3), activation='relu', padding='same'),
    layers.MaxPooling2D((2, 2)),

    layers.Conv2D(128, (3, 3), activation='relu', padding='same'),
    layers.MaxPooling2D((2, 2)),

    layers.Flatten(),
    layers.Dense(128, activation='relu'),  # Added a fully connected layer
    layers.Dropout(0.5),  # Prevent overfitting
    layers.Dense(10, activation='softmax')
])

# Compile the model
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# Train the model
model.fit(x_train, y_train, epochs=10, validation_data=(x_test, y_test), batch_size=64)

# Evaluate the model on the test data
test_loss, test_acc = model.evaluate(x_test, y_test, verbose=2)
print('Test accuracy:', test_acc)

Exercise 21: Natural Language Processing

Concepts:

Natural Language Processing

Text Preprocessing

Text Representation

Topic Modeling

Latent Dirichlet Allocation

Gensim library

Description: Write a Python script that uses natural language processing techniques to analyze a corpus of text data and extract useful insights.

Solution:

import gensim
from gensim import corpora
from gensim.models import LdaModel
import pandas as pd
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize

# Download required resources
nltk.download('stopwords')
nltk.download('punkt')

# Read the text data into a pandas dataframe
df = pd.read_csv('text_data.csv')

# Handle missing values
df['text'] = df['text'].fillna('')

# Define stop words and clean text
stop_words = set(stopwords.words('english'))

def preprocess_text(text):
    tokens = word_tokenize(text.lower())  # Tokenization & lowercasing
    return [word for word in tokens if word.isalnum() and word not in stop_words]  # Remove punctuation & stopwords

df['cleaned_text'] = df['text'].apply(preprocess_text)

# Create a document-term matrix
texts = df['cleaned_text'].tolist()
dictionary = corpora.Dictionary(texts)
corpus = [dictionary.doc2bow(text) for text in texts]

# Train LDA model
num_topics = 5
lda_model = LdaModel(corpus, num_topics=num_topics, id2word=dictionary, passes=10)

# Print topics and top words for each
for topic_id, words in lda_model.show_topics(num_topics=num_topics, formatted=False):
    print(f'Topic {topic_id}:', ', '.join(word for word, _ in words))

# Convert topic distributions into a structured DataFrame
topic_dists = [{f"Topic_{topic}": prob for topic, prob in lda_model.get_document_topics(doc, minimum_probability=0)} for doc in corpus]
topic_df = pd.DataFrame(topic_dists)

# Merge topic distributions with original data
df = pd.concat([df, topic_df], axis=1)

# Save the results
df.to_csv('text_data_topics.csv', index=False)
print("Saved processed data to 'text_data_topics.csv'.")

Exercise 22: Web Scraping

Concepts:

Web Scraping
HTML Parsing
BeautifulSoup library
CSV File I/O

Description: Write a Python script that scrapes data from a website using the BeautifulSoup library and saves it to a CSV file.

Solution:

import requests
from bs4 import BeautifulSoup
import csv

# Define the URL to scrape
url = 'https://www.example.com'

# Headers to mimic a real browser request
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)'}

# Send a GET request
response = requests.get(url, headers=headers)

# Check if the request was successful
if response.status_code != 200:
    print(f"Error: Unable to fetch data (Status Code: {response.status_code})")
    exit()

# Parse the HTML content
soup = BeautifulSoup(response.content, 'html.parser')

# Extract data
data = []
for item in soup.find_all('div', class_='item'):
    name_tag = item.find('h3')
    price_tag = item.find('span', class_='price')

    # Extract text safely, handling missing elements
    name = name_tag.get_text(strip=True) if name_tag else 'N/A'
    price = price_tag.get_text(strip=True) if price_tag else 'N/A'

    data.append([name, price])

# Save to CSV
csv_filename = 'data.csv'
with open(csv_filename, 'w', newline='', encoding='utf-8') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow(['Name', 'Price'])  # Add headers
    writer.writerows(data)

print(f"Scraping completed. Data saved to '{csv_filename}'.")

Exercise 23: Database Interaction

Concepts:

Database Interaction
SQLite database
SQL queries
SQLite3 module

Description: Write a Python script that interacts with a database to retrieve and manipulate data.

Solution:

import sqlite3

# Connect to the database
conn = sqlite3.connect('example.db')

# Create a cursor object
c = conn.cursor()

# Execute an SQL query to create a table
c.execute('''CREATE TABLE IF NOT EXISTS customers
             (id INTEGER PRIMARY KEY, name TEXT, email TEXT, phone TEXT)''')

# Execute an SQL query to insert data into the table
c.execute("INSERT INTO customers (name, email, phone) VALUES ('John Smith', 'john@example.com', '555-1234')")

# Execute an SQL query to retrieve data from the table
c.execute("SELECT * FROM customers")
rows = c.fetchall()
for row in rows:
    print(row)

# Execute an SQL query to update data in the table
c.execute("UPDATE customers SET phone='555-5678' WHERE name='John Smith'")

# Execute an SQL query to delete data from the table
c.execute("DELETE FROM customers WHERE name='John Smith'")

# Commit the changes to the database
conn.commit()

# Close the database connection
conn.close()

Exercise 24: Parallel Processing

Concepts:

Parallel Processing
Multiprocessing
Process Pool
CPU-bound tasks

Description: Write a Python script that performs a time-consuming computation using parallel processing to speed up the computation.

Solution:

import time
import multiprocessing

# Define an optimized CPU-bound function
def compute(num):
    return num * (num - 1) // 2  # Uses O(1) formula instead of a loop

if __name__ == '__main__':
    # Create a process pool with the number of CPUs available
    num_cpus = multiprocessing.cpu_count()
    pool = multiprocessing.Pool(num_cpus)

    # Generate a list of numbers to compute
    num_list = [10000000] * num_cpus

    # Compute the results using parallel processing
    start_time = time.time()
    results = pool.map(compute, num_list)

    # Close the pool properly
    pool.close()
    pool.join()

    end_time = time.time()

    # Print the results and computation time
    print('Results:', results)
    print('Computation time:', end_time - start_time, 'seconds')

Exercise 25: Image Processing

Concepts:

Image Processing
Pillow library
Image Manipulation
Image Filtering

Description: Write a Python script that performs basic image processing operations on an image file.

Solution:

from PIL import Image, ImageFilter
import os

# Define image paths
input_path = 'example.jpg'
output_path = 'processed.jpg'

# Check if the input file exists
if not os.path.exists(input_path):
    raise FileNotFoundError(f"Error: The file '{input_path}' was not found.")

try:
    # Open the image file using a context manager
    with Image.open(input_path) as image:
        # Display the original image (optional, may not work in all environments)
        image.show()

        # Resize the image
        image = image.resize((500, 500))

        # Convert the image to grayscale
        image = image.convert('L')

        # Apply a Gaussian blur filter
        image = image.filter(ImageFilter.GaussianBlur(radius=2))

        # Save the processed image to a file
        image.save(output_path)

        # Display the processed image
        image.show()
        print(f"Processed image saved as '{output_path}'.")

except Exception as e:
    print(f"An error occurred: {e}")

I hope you find these exercises useful! Let me know if you have any further questions.

Advance Level Exercises Part 1

Exercise 1: File Parsing

Concepts:

File I/O
Regular expressions

Description: Write a Python script that reads a text file and extracts all URLs that are present in the file. The output should be a list of URLs.

Solution:

import re

# Open the file for reading
with open('input_file.txt', 'r') as f:
    # Read the file contents
    file_contents = f.read()

    # Use regular expression to extract URLs
    urls = re.findall(r'http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+', file_contents)

# Print the list of URLs
print(urls)

Exercise 2: Data Analysis

Concepts:

File I/O
Data manipulation
Pandas library

Description: Write a Python script that reads a CSV file containing sales data and calculates the total sales revenue for each product category.

Solution:

import pandas as pd

# Read the CSV file into a pandas dataframe
df = pd.read_csv('sales_data.csv')

# Group the data by product category and sum the sales revenue
total_revenue = df.groupby('Product Category')['Sales Revenue'].sum()

# Print the total revenue for each product category
print(total_revenue)

Exercise 3: Web Scraping

Concepts

Web scraping
Requests library
Beautiful Soup library
CSV file I/O

Description: Write a Python script that scrapes the title and price of all products listed on an e-commerce website and stores them in a CSV file.

Solution:

import requests
from bs4 import BeautifulSoup
import csv

# Define the target URL
url = 'https://www.example.com/products'

# Headers to mimic a real browser request
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)'}

# Make a GET request to the website
response = requests.get(url, headers=headers)

# Check if the request was successful
if response.status_code == 200:
    # Parse the HTML content using Beautiful Soup
    soup = BeautifulSoup(response.content, 'html.parser')

    # Find all product titles and prices
    titles = [title.get_text(strip=True) for title in soup.find_all('h3', class_='product-title')]
    prices = [price.get_text(strip=True) for price in soup.find_all('div', class_='product-price')]

    # Zip the titles and prices together
    data = list(zip(titles, prices))

    # Write the data to a CSV file with headers
    with open('product_data.csv', 'w', newline='', encoding='utf-8') as f:
        writer = csv.writer(f)
        writer.writerow(['Product Title', 'Price'])  # Add headers
        writer.writerows(data)

    print("Scraping completed. Data saved to 'product_data.csv'.")

else:
    print(f"Failed to retrieve the webpage. Status code: {response.status_code}")

Exercise 4: Multithreading

Concepts:

Multithreading
Requests library
Threading library

Description: Write a Python script that uses multithreading to download multiple images from a URL list simultaneously.

Solution:

import requests
import threading

# URL list of images to download
url_list = ['https://www.example.com/image1.jpg', 'https://www.example.com/image2.jpg', 'https://www.example.com/image3.jpg']

# Function to download an image from a URL
def download_image(url):
    response = requests.get(url)
    with open(url.split('/')[-1], 'wb') as f:
        f.write(response.content)

# Create a thread for each URL and start them all simultaneously
threads = []
for url in url_list:
    thread = threading.Thread(target=download_image, args=(url,))
    threads.append(thread)
    thread.start()

# Wait for all threads to finish
for thread in threads:
    thread.join()

Exercise 5: Machine Learning

Concepts:

Machine learning
Scikit-learn library

Description: Write a Python script that trains a machine learning model on a dataset and uses it to predict the output for new data.

Solution:

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

# Read the dataset into a pandas dataframe
df = pd.read_csv('dataset.csv')

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(df[['feature1', 'feature2']], df['target'], test_size=0.2, random_state=42)

# Train a linear regression model on the training data
model = LinearRegression()
model.fit(X_train, y_train)

# Use the model to predict the output for the testing data
y_pred = model.predict(X_test)

# Evaluate the model performance using the mean squared error metric
mse = ((y_test - y_pred) ** 2).mean()
print("Mean squared error:", mse)

Exercise 6: Natural Language Processing

Concepts:

Natural Language Processing
Sentiment Analysis
NLTK library

Description: Write a Python script that reads a text file and performs sentiment analysis on the text using a pre-trained NLP model.

Solution:

import nltk
from nltk.sentiment.vader import SentimentIntensityAnalyzer

# Ensure the VADER lexicon is downloaded
nltk.download('vader_lexicon')

# Read the text file into a string
with open('input_file.txt', 'r', encoding='utf-8') as f:
    text = f.read()

# Create a SentimentIntensityAnalyzer object
sid = SentimentIntensityAnalyzer()

# Perform sentiment analysis on the text
scores = sid.polarity_scores(text)

# Print the sentiment scores
print(scores)

Exercise 7: Web Development

Concepts:

Web Development
Flask framework
File Uploads

Description: Write a Python script that creates a web application using the Flask framework that allows users to upload a file and performs some processing on the file.

Solution:

from flask import Flask, render_template, request
import os

app = Flask(__name__)

# Set the path for file uploads
UPLOAD_FOLDER = 'uploads'
app.config['UPLOAD_FOLDER'] = UPLOAD_FOLDER

# Ensure the upload directory exists
if not os.path.exists(UPLOAD_FOLDER):
    os.makedirs(UPLOAD_FOLDER)

# Route for the home page
@app.route('/')
def index():
    return render_template('index.html')

# Route for file uploads
@app.route('/upload', methods=['POST'])
def upload():
    if 'file' not in request.files:
        return 'No file part', 400

    file = request.files['file']

    if file.filename == '':
        return 'No selected file', 400

    # Save the file to the uploads folder
    file.save(os.path.join(app.config['UPLOAD_FOLDER'], file.filename))

    return 'File uploaded successfully'

if __name__ == '__main__':
    app.run(debug=True)

Exercise 8: Data Visualization

Concepts:

Data Visualization
Matplotlib library
Candlestick Charts

Description: Write a Python script that reads a CSV file containing stock market data and plots a candlestick chart of the data.

Solution:

import pandas as pd
import matplotlib.pyplot as plt
import mplfinance as mpf

# Read the CSV file into a pandas dataframe
df = pd.read_csv('stock_data.csv', parse_dates=['Date'])
df.set_index('Date', inplace=True)  # Set Date as index

# Plot the candlestick chart using mplfinance
mpf.plot(df, type='candle', style='charles', title='Stock Market Data', ylabel='Price')

# Display the chart
plt.show()

Exercise 9: Machine Learning

Concepts:

Machine Learning
Scikit-learn library

Solution:

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# Read the dataset into a pandas dataframe
df = pd.read_csv('flower_data.csv')

# Check for missing values
if df.isnull().sum().sum() > 0:
    df = df.dropna()  # Drop rows with missing values

# Define feature columns and target column
X = df[['sepal_length', 'sepal_width', 'petal_length', 'petal_width']]
y = df['species']

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize the feature values
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Train a logistic regression model on the training data
model = LogisticRegression(solver='saga', max_iter=5000)  # Increased iterations & changed solver
model.fit(X_train, y_train)

# Use the model to predict the output for the testing data
y_pred = model.predict(X_test)

# Evaluate the model performance using the accuracy score metric
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

Exercise 10: Data Analysis

Concepts:

Data Analysis
Recommendation Systems
Collaborative Filtering
Surprise library

Description: Write a Python script that reads a CSV file containing customer purchase data and generates a recommendation system that recommends products to customers based on their purchase history.

Solution:

import pandas as pd
from surprise import Dataset, Reader, SVD, accuracy
from surprise.model_selection import train_test_split

# Read the CSV file into a pandas dataframe
df = pd.read_csv('purchase_data.csv')

# Ensure that the dataset has no missing values
df = df.dropna(subset=['customer_id', 'product_id', 'rating'])

# Convert the pandas dataframe to a Surprise dataset
reader = Reader(rating_scale=(1, 5))
data = Dataset.load_from_df(df[['customer_id', 'product_id', 'rating']], reader)

# Split the data into training and testing sets
trainset, testset = train_test_split(data, test_size=0.2)

# Train an SVD model on the training data
model = SVD(n_factors=50, n_epochs=20, lr_all=0.005, reg_all=0.02)
model.fit(trainset)

# Use the model to predict the output for the testing data
predictions = model.test(testset)

# Evaluate the model performance using the root mean squared error metric
rmse = accuracy.rmse(predictions)
print("RMSE:", rmse)

# Recommend products to customers based on their purchase history
customer_ids = df['customer_id'].unique()
product_ids = df['product_id'].unique()

recommendations = {}

for customer_id in customer_ids:
    purchased_products = set(df[df['customer_id'] == customer_id]['product_id'].values)
    potential_recommendations = []

    for product_id in product_ids:
        if product_id not in purchased_products:
            pred = model.predict(customer_id, product_id)
            potential_recommendations.append((product_id, pred.est))

    # Sort by predicted rating and take the top 5 recommendations
    top_recommendations = sorted(potential_recommendations, key=lambda x: x[1], reverse=True)[:5]
    recommendations[customer_id] = top_recommendations

# Display recommendations
for customer, recs in recommendations.items():
    print(f"Customer {customer} recommended products: {recs}")

Exercise 11: Computer Vision

Concepts:

Computer Vision
Object Detection
OpenCV library
Pre-trained models

Description: Write a Python script that reads an image and performs object detection on the image using a pre-trained object detection model.

Solution:

import cv2
import numpy as np

# Read the image file
img = cv2.imread('image.jpg')

# Check if the image is loaded correctly
if img is None:
    raise FileNotFoundError("Error: Image file not found or unable to load.")

# Load the pre-trained object detection model
model = cv2.dnn.readNetFromTensorflow('frozen_inference_graph.pb', 'ssd_mobilenet_v2_coco_2018_03_29.pbtxt')

# Prepare the input image for the model
blob = cv2.dnn.blobFromImage(img, size=(300, 300), swapRB=True, crop=False)
model.setInput(blob)

# Perform object detection
output = model.forward()

# Loop through detected objects and draw bounding boxes
h, w, _ = img.shape  # Get image dimensions
for detection in output[0, 0, :, :]:
    confidence = float(detection[2])
    if confidence > 0.5:
        x1 = int(detection[3] * w)
        y1 = int(detection[4] * h)
        x2 = int(detection[5] * w)
        y2 = int(detection[6] * h)

        # Draw bounding box with label and confidence score
        cv2.rectangle(img, (x1, y1), (x2, y2), (0, 255, 0), 2)
        label = f'Confidence: {confidence:.2f}'
        cv2.putText(img, label, (x1, y1 - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)

# Display the image with detections
cv2.imshow('Object Detection', img)
cv2.waitKey(0)
cv2.destroyAllWindows()

Exercise 12: Natural Language Processing

Concepts:

Natural Language Processing
Topic Modeling
Latent Dirichlet Allocation
Gensim library

Description: Write a Python script that reads a text file and performs topic modeling on the text using Latent Dirichlet Allocation (LDA).

Solution:

import gensim
from gensim import corpora
from gensim.models import LdaModel

# Read the text file into a list of strings
with open('input_file.txt', 'r') as f:
    text = f.readlines()

# Remove newlines and convert to lowercase
text = [line.strip().lower() for line in text]

# Tokenize the text into words
tokens = [line.split() for line in text]

# Create a dictionary of words and their frequency
dictionary = corpora.Dictionary(tokens)

# Create a bag-of-words representation of the text
corpus = [dictionary.doc2bow(token) for token in tokens]

# Train an LDA model on the text
model = LdaModel(corpus, id2word=dictionary, num_topics=5, passes=10)

# Print the topics and their associated words
for topic in model.print_topics(num_words=5):
    print(topic)

Exercise 13: Web Scraping

Concepts:

Web Scraping
Beautiful Soup library
Requests library
CSV file handling

Description: Write a Python script that scrapes a website for product information and saves the information to a CSV file.

Solution:

import requests
from bs4 import BeautifulSoup
import csv

# Define the URL of the website to scrape
url = 'https://www.example.com/products'

# Add headers to mimic a browser request
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)'}

# Send a request to the website
response = requests.get(url, headers=headers)

# Check if the request was successful
if response.status_code != 200:
    print(f"Failed to fetch data. Status Code: {response.status_code}")
    exit()

# Parse the HTML content of the response using BeautifulSoup
soup = BeautifulSoup(response.content, 'html.parser')

# Find all the product listings on the page
listings = soup.find_all('div', class_='product-listing')

# Write the product information to a CSV file
with open('products.csv', 'w', newline='', encoding='utf-8') as f:
    writer = csv.writer(f)
    writer.writerow(['Product Name', 'Price', 'Description'])

    for listing in listings:
        name = listing.find('h3')
        price = listing.find('span', class_='price')
        description = listing.find('p')

        # Extract text safely, handling missing elements
        name = name.get_text(strip=True) if name else 'N/A'
        price = price.get_text(strip=True) if price else 'N/A'
        description = description.get_text(strip=True) if description else 'N/A'

        writer.writerow([name, price, description])

print("Scraping completed. Data saved to 'products.csv'.")

Exercise 14: Big Data Processing

Concepts:

Big Data Processing
PySpark
Data Transformations
Aggregation
Parquet file format

Description: Write a PySpark script that reads a CSV file containing customer purchase data, performs some data transformations and aggregation, and saves the results to a Parquet file.

Solution:

from pyspark.sql import SparkSession

# Create a SparkSession object
spark = SparkSession.builder.appName('customer-purchases').getOrCreate()

# Verify if the file exists before reading (optional but useful)
import os
if not os.path.exists('customer_purchases.csv'):
    raise FileNotFoundError("Error: The file 'customer_purchases.csv' does not exist.")

# Read the CSV file into a Spark DataFrame
df = spark.read.csv('customer_purchases.csv', header=True, inferSchema=True)

# Perform some data transformations
df = df.filter((df['purchase_date'] >= '2020-01-01') & (df['purchase_date'] <= '2020-12-31'))
df = df.select('customer_id', 'product_id', 'price')

# Group by customer and calculate total spending
df = df.groupBy('customer_id').sum('price').withColumnRenamed('sum(price)', 'total_spent')

# Save the results to a Parquet file
df.write.mode('overwrite').parquet('customer_spending.parquet')

print("Processing completed. Data saved to 'customer_spending.parquet'.")

Exercise 15: DevOps

Concepts:

DevOps
Fabric library

Description: Write a Python script that automates the deployment of a web application to a remote server using the Fabric library.

Solution:

from fabric import Connection
import getpass

# Define the host and user credentials for the remote server
host = 'example.com'
user = 'user'
password = getpass.getpass("Enter SSH password: ")  # Secure password entry

# Define the path to the web application on the local machine and the remote server
local_path = '/path/to/local/app'
remote_path = '/path/to/remote/app'

# Create a connection to the remote server
c = Connection(host=host, user=user, connect_kwargs={'password': password})

# Ensure the remote directory exists
c.run(f'mkdir -p {remote_path}')

# Upload the local files to the remote server
c.put(local_path, remote_path, recursive=True)  # Enables recursive copy

# Change to the application directory
with c.cd(remote_path):
    # Install required dependencies
    c.run('sudo apt-get update && sudo apt-get install -y python3-pip')
    c.run('pip3 install -r requirements.txt')

    # Start the web application in the background
    c.run('nohup python3 app.py > app.log 2>&1 &', pty=False)

print("Deployment completed successfully.")

Exercise 16: Reinforcement Learning

Concepts:

Reinforcement Learning
Q-Learning
OpenAI Gym library

Description: Write a Python script that implements a reinforcement learning algorithm to teach an agent to play a simple game.

Solution:

import gym
import numpy as np
import time

# Create the FrozenLake environment
env = gym.make("FrozenLake-v1", is_slippery=True)

# Initialize the Q-table
Q = np.zeros([env.observation_space.n, env.action_space.n])

# Set hyperparameters
alpha = 0.8  # Learning rate
gamma = 0.95  # Discount factor
epsilon = 0.1  # Exploration probability
num_episodes = 2000  # Training episodes

# Train the agent using Q-learning
for episode in range(num_episodes):
    state, _ = env.reset()
    done = False

    while not done:
        # Choose action using epsilon-greedy policy
        if np.random.uniform() < epsilon:
            action = env.action_space.sample()  # Random action (exploration)
        else:
            action = np.argmax(Q[state, :])  # Best action from Q-table

        # Take the action and observe the next state
        next_state, reward, done, _, _ = env.step(action)

        # Update Q-value using the Bellman equation
        Q[state, action] = (1 - alpha) * Q[state, action] + alpha * (reward + gamma * np.max(Q[next_state, :]))

        # Move to the next state
        state = next_state

# Test the agent by playing the game
state, _ = env.reset()
done = False
print("\nTesting trained agent:\n")

while not done:
    action = np.argmax(Q[state, :])
    next_state, reward, done, _, _ = env.step(action)
    
    # Render the environment
    env.render()
    time.sleep(0.5)  # Pause for visibility
    
    state = next_state

print("\nGame Over!")

Exercise 17: Time Series Analysis

Concepts:

Time Series Analysis
Data Preprocessing
Data Visualization
ARIMA model
Statsmodels library

Description: Write a Python script that reads a CSV file containing time series data, performs some data preprocessing and visualization, and fits a time series model to the data.

Solution:

import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm

# Read the CSV file into a pandas dataframe
df = pd.read_csv('time_series.csv')

# Convert the date column to a datetime object and set it as the index
df['date'] = pd.to_datetime(df['date'])
df.set_index('date', inplace=True)

# Check for missing values before resampling
if df.isnull().values.any():
    df = df.fillna(method='ffill')

# Ensure the column name is correct
target_col = df.columns[0]  # Assuming first column is the time series value

# Resample the data to a monthly frequency
df = df.resample('M').mean()

# Plot the time series data
plt.figure(figsize=(10, 5))
plt.plot(df.index, df[target_col], label="Time Series")
plt.xlabel("Date")
plt.ylabel("Value")
plt.title("Time Series Visualization")
plt.legend()
plt.grid()
plt.show()

# Fit an ARIMA model
model = sm.tsa.ARIMA(df[target_col].dropna(), order=(1, 1, 1))  # Use dropna() to avoid errors
results = model.fit()

# Print the model summary
print(results.summary())

Exercise 18: Computer Networking

Concepts:

Computer Networking
TCP/IP Protocol
Socket Programming

Description: Write a Python script that implements a simple TCP server that accepts client connections and sends and receives data.

Solution:

import socket

# Define the host and port for the server
host = 'localhost'
port = 12345

# Create a socket object
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)

# Bind the socket to the host and port
s.bind((host, port))

# Listen for incoming connections
s.listen(1)
print('Server listening on', host, port)

# Accept a client connection
conn, addr = s.accept()
print('Connected by', addr)

# Send data to the client
conn.sendall(b'Hello, client!')

# Receive data from the client
data = conn.recv(1024)
print('Received:', data.decode())

# Close the connection
conn.close()

Exercise 19: Data Analysis and Visualization

Concepts:

Data Analysis
Data Visualization
PDF Report Generation
Pandas library
Matplotlib library
ReportLab library

Description: Write a Python script that reads a CSV file containing sales data for a retail store, performs some data analysis and visualization, and saves the results to a PDF report.

Solution:

import pandas as pd
import matplotlib.pyplot as plt
from reportlab.lib.pagesizes import letter
from reportlab.pdfgen import canvas
import os

# Read the CSV file into a pandas dataframe
df = pd.read_csv('sales_data.csv')

# Calculate the total sales by category and month
totals = df.groupby(['category', 'month'])['sales'].sum()

# Get unique categories
categories = df['category'].unique()

# Create subplots dynamically based on the number of categories
fig, axes = plt.subplots(nrows=len(categories), ncols=1, figsize=(8.5, 11))

# Ensure `axes` is always iterable (even if there's only one category)
if len(categories) == 1:
    axes = [axes]

# Plot total sales by category and month
for i, category in enumerate(categories):
    totals.loc[category].plot(ax=axes[i], kind='bar', title=f"Category: {category}")
    axes[i].set_ylabel("Sales")

plt.tight_layout()
plt.savefig('sales_plot.png')  # Save the figure
plt.close(fig)  # Close to free memory

# Create a PDF report
pdf_filename = 'sales_report.pdf'
c = canvas.Canvas(pdf_filename, pagesize=letter)

# Add title and description
c.setFont("Helvetica-Bold", 16)
c.drawString(50, 750, 'Sales Report')

c.setFont("Helvetica", 12)
c.drawString(50, 730, 'Total Sales by Category and Month')

# Add the image to the PDF if it exists
if os.path.exists('sales_plot.png'):
    c.drawImage('sales_plot.png', 50, 450, width=500, height=300)

# Save and close the PDF
c.showPage()
c.save()

print(f"Report saved as {pdf_filename}")

Exercise 20: Machine Learning

Concepts:

Machine Learning
Convolutional Neural Networks
Keras library
MNIST dataset

Description: Write a Python script that trains a machine learning model to classify images of handwritten digits from the MNIST dataset.

Solution:

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

# Load the MNIST dataset
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

# Normalize the pixel values and reshape the data
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0
x_train = x_train.reshape(-1, 28, 28, 1)
x_test = x_test.reshape(-1, 28, 28, 1)

# Define the CNN model
model = keras.Sequential([
    layers.Conv2D(32, (3, 3), activation='relu', padding='same', input_shape=(28, 28, 1)),
    layers.MaxPooling2D((2, 2)),

    layers.Conv2D(64, (3, 3), activation='relu', padding='same'),
    layers.MaxPooling2D((2, 2)),

    layers.Conv2D(128, (3, 3), activation='relu', padding='same'),
    layers.MaxPooling2D((2, 2)),

    layers.Flatten(),
    layers.Dense(128, activation='relu'),  # Added a fully connected layer
    layers.Dropout(0.5),  # Prevent overfitting
    layers.Dense(10, activation='softmax')
])

# Compile the model
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# Train the model
model.fit(x_train, y_train, epochs=10, validation_data=(x_test, y_test), batch_size=64)

# Evaluate the model on the test data
test_loss, test_acc = model.evaluate(x_test, y_test, verbose=2)
print('Test accuracy:', test_acc)

Exercise 21: Natural Language Processing

Concepts:

Natural Language Processing

Text Preprocessing

Text Representation

Topic Modeling

Latent Dirichlet Allocation

Gensim library

Description: Write a Python script that uses natural language processing techniques to analyze a corpus of text data and extract useful insights.

Solution:

import gensim
from gensim import corpora
from gensim.models import LdaModel
import pandas as pd
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize

# Download required resources
nltk.download('stopwords')
nltk.download('punkt')

# Read the text data into a pandas dataframe
df = pd.read_csv('text_data.csv')

# Handle missing values
df['text'] = df['text'].fillna('')

# Define stop words and clean text
stop_words = set(stopwords.words('english'))

def preprocess_text(text):
    tokens = word_tokenize(text.lower())  # Tokenization & lowercasing
    return [word for word in tokens if word.isalnum() and word not in stop_words]  # Remove punctuation & stopwords

df['cleaned_text'] = df['text'].apply(preprocess_text)

# Create a document-term matrix
texts = df['cleaned_text'].tolist()
dictionary = corpora.Dictionary(texts)
corpus = [dictionary.doc2bow(text) for text in texts]

# Train LDA model
num_topics = 5
lda_model = LdaModel(corpus, num_topics=num_topics, id2word=dictionary, passes=10)

# Print topics and top words for each
for topic_id, words in lda_model.show_topics(num_topics=num_topics, formatted=False):
    print(f'Topic {topic_id}:', ', '.join(word for word, _ in words))

# Convert topic distributions into a structured DataFrame
topic_dists = [{f"Topic_{topic}": prob for topic, prob in lda_model.get_document_topics(doc, minimum_probability=0)} for doc in corpus]
topic_df = pd.DataFrame(topic_dists)

# Merge topic distributions with original data
df = pd.concat([df, topic_df], axis=1)

# Save the results
df.to_csv('text_data_topics.csv', index=False)
print("Saved processed data to 'text_data_topics.csv'.")

Exercise 22: Web Scraping

Concepts:

Web Scraping
HTML Parsing
BeautifulSoup library
CSV File I/O

Description: Write a Python script that scrapes data from a website using the BeautifulSoup library and saves it to a CSV file.

Solution:

import requests
from bs4 import BeautifulSoup
import csv

# Define the URL to scrape
url = 'https://www.example.com'

# Headers to mimic a real browser request
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)'}

# Send a GET request
response = requests.get(url, headers=headers)

# Check if the request was successful
if response.status_code != 200:
    print(f"Error: Unable to fetch data (Status Code: {response.status_code})")
    exit()

# Parse the HTML content
soup = BeautifulSoup(response.content, 'html.parser')

# Extract data
data = []
for item in soup.find_all('div', class_='item'):
    name_tag = item.find('h3')
    price_tag = item.find('span', class_='price')

    # Extract text safely, handling missing elements
    name = name_tag.get_text(strip=True) if name_tag else 'N/A'
    price = price_tag.get_text(strip=True) if price_tag else 'N/A'

    data.append([name, price])

# Save to CSV
csv_filename = 'data.csv'
with open(csv_filename, 'w', newline='', encoding='utf-8') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow(['Name', 'Price'])  # Add headers
    writer.writerows(data)

print(f"Scraping completed. Data saved to '{csv_filename}'.")

Exercise 23: Database Interaction

Concepts:

Database Interaction
SQLite database
SQL queries
SQLite3 module

Description: Write a Python script that interacts with a database to retrieve and manipulate data.

Solution:

import sqlite3

# Connect to the database
conn = sqlite3.connect('example.db')

# Create a cursor object
c = conn.cursor()

# Execute an SQL query to create a table
c.execute('''CREATE TABLE IF NOT EXISTS customers
             (id INTEGER PRIMARY KEY, name TEXT, email TEXT, phone TEXT)''')

# Execute an SQL query to insert data into the table
c.execute("INSERT INTO customers (name, email, phone) VALUES ('John Smith', 'john@example.com', '555-1234')")

# Execute an SQL query to retrieve data from the table
c.execute("SELECT * FROM customers")
rows = c.fetchall()
for row in rows:
    print(row)

# Execute an SQL query to update data in the table
c.execute("UPDATE customers SET phone='555-5678' WHERE name='John Smith'")

# Execute an SQL query to delete data from the table
c.execute("DELETE FROM customers WHERE name='John Smith'")

# Commit the changes to the database
conn.commit()

# Close the database connection
conn.close()

Exercise 24: Parallel Processing

Concepts:

Parallel Processing
Multiprocessing
Process Pool
CPU-bound tasks

Description: Write a Python script that performs a time-consuming computation using parallel processing to speed up the computation.

Solution:

import time
import multiprocessing

# Define an optimized CPU-bound function
def compute(num):
    return num * (num - 1) // 2  # Uses O(1) formula instead of a loop

if __name__ == '__main__':
    # Create a process pool with the number of CPUs available
    num_cpus = multiprocessing.cpu_count()
    pool = multiprocessing.Pool(num_cpus)

    # Generate a list of numbers to compute
    num_list = [10000000] * num_cpus

    # Compute the results using parallel processing
    start_time = time.time()
    results = pool.map(compute, num_list)

    # Close the pool properly
    pool.close()
    pool.join()

    end_time = time.time()

    # Print the results and computation time
    print('Results:', results)
    print('Computation time:', end_time - start_time, 'seconds')

Exercise 25: Image Processing

Concepts:

Image Processing
Pillow library
Image Manipulation
Image Filtering

Description: Write a Python script that performs basic image processing operations on an image file.

Solution:

from PIL import Image, ImageFilter
import os

# Define image paths
input_path = 'example.jpg'
output_path = 'processed.jpg'

# Check if the input file exists
if not os.path.exists(input_path):
    raise FileNotFoundError(f"Error: The file '{input_path}' was not found.")

try:
    # Open the image file using a context manager
    with Image.open(input_path) as image:
        # Display the original image (optional, may not work in all environments)
        image.show()

        # Resize the image
        image = image.resize((500, 500))

        # Convert the image to grayscale
        image = image.convert('L')

        # Apply a Gaussian blur filter
        image = image.filter(ImageFilter.GaussianBlur(radius=2))

        # Save the processed image to a file
        image.save(output_path)

        # Display the processed image
        image.show()
        print(f"Processed image saved as '{output_path}'.")

except Exception as e:
    print(f"An error occurred: {e}")

I hope you find these exercises useful! Let me know if you have any further questions.

Advance Level Exercises Part 1

Exercise 1: File Parsing

Concepts:

File I/O
Regular expressions

Description: Write a Python script that reads a text file and extracts all URLs that are present in the file. The output should be a list of URLs.

Solution:

import re

# Open the file for reading
with open('input_file.txt', 'r') as f:
    # Read the file contents
    file_contents = f.read()

    # Use regular expression to extract URLs
    urls = re.findall(r'http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+', file_contents)

# Print the list of URLs
print(urls)

Exercise 2: Data Analysis

Concepts:

File I/O
Data manipulation
Pandas library

Description: Write a Python script that reads a CSV file containing sales data and calculates the total sales revenue for each product category.

Solution:

import pandas as pd

# Read the CSV file into a pandas dataframe
df = pd.read_csv('sales_data.csv')

# Group the data by product category and sum the sales revenue
total_revenue = df.groupby('Product Category')['Sales Revenue'].sum()

# Print the total revenue for each product category
print(total_revenue)

Exercise 3: Web Scraping

Concepts

Web scraping
Requests library
Beautiful Soup library
CSV file I/O

Description: Write a Python script that scrapes the title and price of all products listed on an e-commerce website and stores them in a CSV file.

Solution:

import requests
from bs4 import BeautifulSoup
import csv

# Define the target URL
url = 'https://www.example.com/products'

# Headers to mimic a real browser request
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)'}

# Make a GET request to the website
response = requests.get(url, headers=headers)

# Check if the request was successful
if response.status_code == 200:
    # Parse the HTML content using Beautiful Soup
    soup = BeautifulSoup(response.content, 'html.parser')

    # Find all product titles and prices
    titles = [title.get_text(strip=True) for title in soup.find_all('h3', class_='product-title')]
    prices = [price.get_text(strip=True) for price in soup.find_all('div', class_='product-price')]

    # Zip the titles and prices together
    data = list(zip(titles, prices))

    # Write the data to a CSV file with headers
    with open('product_data.csv', 'w', newline='', encoding='utf-8') as f:
        writer = csv.writer(f)
        writer.writerow(['Product Title', 'Price'])  # Add headers
        writer.writerows(data)

    print("Scraping completed. Data saved to 'product_data.csv'.")

else:
    print(f"Failed to retrieve the webpage. Status code: {response.status_code}")

Exercise 4: Multithreading

Concepts:

Multithreading
Requests library
Threading library

Description: Write a Python script that uses multithreading to download multiple images from a URL list simultaneously.

Solution:

import requests
import threading

# URL list of images to download
url_list = ['https://www.example.com/image1.jpg', 'https://www.example.com/image2.jpg', 'https://www.example.com/image3.jpg']

# Function to download an image from a URL
def download_image(url):
    response = requests.get(url)
    with open(url.split('/')[-1], 'wb') as f:
        f.write(response.content)

# Create a thread for each URL and start them all simultaneously
threads = []
for url in url_list:
    thread = threading.Thread(target=download_image, args=(url,))
    threads.append(thread)
    thread.start()

# Wait for all threads to finish
for thread in threads:
    thread.join()

Exercise 5: Machine Learning

Concepts:

Machine learning
Scikit-learn library

Description: Write a Python script that trains a machine learning model on a dataset and uses it to predict the output for new data.

Solution:

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

# Read the dataset into a pandas dataframe
df = pd.read_csv('dataset.csv')

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(df[['feature1', 'feature2']], df['target'], test_size=0.2, random_state=42)

# Train a linear regression model on the training data
model = LinearRegression()
model.fit(X_train, y_train)

# Use the model to predict the output for the testing data
y_pred = model.predict(X_test)

# Evaluate the model performance using the mean squared error metric
mse = ((y_test - y_pred) ** 2).mean()
print("Mean squared error:", mse)

Exercise 6: Natural Language Processing

Concepts:

Natural Language Processing
Sentiment Analysis
NLTK library

Description: Write a Python script that reads a text file and performs sentiment analysis on the text using a pre-trained NLP model.

Solution:

import nltk
from nltk.sentiment.vader import SentimentIntensityAnalyzer

# Ensure the VADER lexicon is downloaded
nltk.download('vader_lexicon')

# Read the text file into a string
with open('input_file.txt', 'r', encoding='utf-8') as f:
    text = f.read()

# Create a SentimentIntensityAnalyzer object
sid = SentimentIntensityAnalyzer()

# Perform sentiment analysis on the text
scores = sid.polarity_scores(text)

# Print the sentiment scores
print(scores)

Exercise 7: Web Development

Concepts:

Web Development
Flask framework
File Uploads

Description: Write a Python script that creates a web application using the Flask framework that allows users to upload a file and performs some processing on the file.

Solution:

from flask import Flask, render_template, request
import os

app = Flask(__name__)

# Set the path for file uploads
UPLOAD_FOLDER = 'uploads'
app.config['UPLOAD_FOLDER'] = UPLOAD_FOLDER

# Ensure the upload directory exists
if not os.path.exists(UPLOAD_FOLDER):
    os.makedirs(UPLOAD_FOLDER)

# Route for the home page
@app.route('/')
def index():
    return render_template('index.html')

# Route for file uploads
@app.route('/upload', methods=['POST'])
def upload():
    if 'file' not in request.files:
        return 'No file part', 400

    file = request.files['file']

    if file.filename == '':
        return 'No selected file', 400

    # Save the file to the uploads folder
    file.save(os.path.join(app.config['UPLOAD_FOLDER'], file.filename))

    return 'File uploaded successfully'

if __name__ == '__main__':
    app.run(debug=True)

Exercise 8: Data Visualization

Concepts:

Data Visualization
Matplotlib library
Candlestick Charts

Description: Write a Python script that reads a CSV file containing stock market data and plots a candlestick chart of the data.

Solution:

import pandas as pd
import matplotlib.pyplot as plt
import mplfinance as mpf

# Read the CSV file into a pandas dataframe
df = pd.read_csv('stock_data.csv', parse_dates=['Date'])
df.set_index('Date', inplace=True)  # Set Date as index

# Plot the candlestick chart using mplfinance
mpf.plot(df, type='candle', style='charles', title='Stock Market Data', ylabel='Price')

# Display the chart
plt.show()

Exercise 9: Machine Learning

Concepts:

Machine Learning
Scikit-learn library

Solution:

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# Read the dataset into a pandas dataframe
df = pd.read_csv('flower_data.csv')

# Check for missing values
if df.isnull().sum().sum() > 0:
    df = df.dropna()  # Drop rows with missing values

# Define feature columns and target column
X = df[['sepal_length', 'sepal_width', 'petal_length', 'petal_width']]
y = df['species']

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize the feature values
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Train a logistic regression model on the training data
model = LogisticRegression(solver='saga', max_iter=5000)  # Increased iterations & changed solver
model.fit(X_train, y_train)

# Use the model to predict the output for the testing data
y_pred = model.predict(X_test)

# Evaluate the model performance using the accuracy score metric
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

Exercise 10: Data Analysis

Concepts:

Data Analysis
Recommendation Systems
Collaborative Filtering
Surprise library

Description: Write a Python script that reads a CSV file containing customer purchase data and generates a recommendation system that recommends products to customers based on their purchase history.

Solution:

import pandas as pd
from surprise import Dataset, Reader, SVD, accuracy
from surprise.model_selection import train_test_split

# Read the CSV file into a pandas dataframe
df = pd.read_csv('purchase_data.csv')

# Ensure that the dataset has no missing values
df = df.dropna(subset=['customer_id', 'product_id', 'rating'])

# Convert the pandas dataframe to a Surprise dataset
reader = Reader(rating_scale=(1, 5))
data = Dataset.load_from_df(df[['customer_id', 'product_id', 'rating']], reader)

# Split the data into training and testing sets
trainset, testset = train_test_split(data, test_size=0.2)

# Train an SVD model on the training data
model = SVD(n_factors=50, n_epochs=20, lr_all=0.005, reg_all=0.02)
model.fit(trainset)

# Use the model to predict the output for the testing data
predictions = model.test(testset)

# Evaluate the model performance using the root mean squared error metric
rmse = accuracy.rmse(predictions)
print("RMSE:", rmse)

# Recommend products to customers based on their purchase history
customer_ids = df['customer_id'].unique()
product_ids = df['product_id'].unique()

recommendations = {}

for customer_id in customer_ids:
    purchased_products = set(df[df['customer_id'] == customer_id]['product_id'].values)
    potential_recommendations = []

    for product_id in product_ids:
        if product_id not in purchased_products:
            pred = model.predict(customer_id, product_id)
            potential_recommendations.append((product_id, pred.est))

    # Sort by predicted rating and take the top 5 recommendations
    top_recommendations = sorted(potential_recommendations, key=lambda x: x[1], reverse=True)[:5]
    recommendations[customer_id] = top_recommendations

# Display recommendations
for customer, recs in recommendations.items():
    print(f"Customer {customer} recommended products: {recs}")

Exercise 11: Computer Vision

Concepts:

Computer Vision
Object Detection
OpenCV library
Pre-trained models

Description: Write a Python script that reads an image and performs object detection on the image using a pre-trained object detection model.

Solution:

import cv2
import numpy as np

# Read the image file
img = cv2.imread('image.jpg')

# Check if the image is loaded correctly
if img is None:
    raise FileNotFoundError("Error: Image file not found or unable to load.")

# Load the pre-trained object detection model
model = cv2.dnn.readNetFromTensorflow('frozen_inference_graph.pb', 'ssd_mobilenet_v2_coco_2018_03_29.pbtxt')

# Prepare the input image for the model
blob = cv2.dnn.blobFromImage(img, size=(300, 300), swapRB=True, crop=False)
model.setInput(blob)

# Perform object detection
output = model.forward()

# Loop through detected objects and draw bounding boxes
h, w, _ = img.shape  # Get image dimensions
for detection in output[0, 0, :, :]:
    confidence = float(detection[2])
    if confidence > 0.5:
        x1 = int(detection[3] * w)
        y1 = int(detection[4] * h)
        x2 = int(detection[5] * w)
        y2 = int(detection[6] * h)

        # Draw bounding box with label and confidence score
        cv2.rectangle(img, (x1, y1), (x2, y2), (0, 255, 0), 2)
        label = f'Confidence: {confidence:.2f}'
        cv2.putText(img, label, (x1, y1 - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)

# Display the image with detections
cv2.imshow('Object Detection', img)
cv2.waitKey(0)
cv2.destroyAllWindows()

Exercise 12: Natural Language Processing

Concepts:

Natural Language Processing
Topic Modeling
Latent Dirichlet Allocation
Gensim library

Description: Write a Python script that reads a text file and performs topic modeling on the text using Latent Dirichlet Allocation (LDA).

Solution:

import gensim
from gensim import corpora
from gensim.models import LdaModel

# Read the text file into a list of strings
with open('input_file.txt', 'r') as f:
    text = f.readlines()

# Remove newlines and convert to lowercase
text = [line.strip().lower() for line in text]

# Tokenize the text into words
tokens = [line.split() for line in text]

# Create a dictionary of words and their frequency
dictionary = corpora.Dictionary(tokens)

# Create a bag-of-words representation of the text
corpus = [dictionary.doc2bow(token) for token in tokens]

# Train an LDA model on the text
model = LdaModel(corpus, id2word=dictionary, num_topics=5, passes=10)

# Print the topics and their associated words
for topic in model.print_topics(num_words=5):
    print(topic)

Exercise 13: Web Scraping

Concepts:

Web Scraping
Beautiful Soup library
Requests library
CSV file handling

Description: Write a Python script that scrapes a website for product information and saves the information to a CSV file.

Solution:

import requests
from bs4 import BeautifulSoup
import csv

# Define the URL of the website to scrape
url = 'https://www.example.com/products'

# Add headers to mimic a browser request
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)'}

# Send a request to the website
response = requests.get(url, headers=headers)

# Check if the request was successful
if response.status_code != 200:
    print(f"Failed to fetch data. Status Code: {response.status_code}")
    exit()

# Parse the HTML content of the response using BeautifulSoup
soup = BeautifulSoup(response.content, 'html.parser')

# Find all the product listings on the page
listings = soup.find_all('div', class_='product-listing')

# Write the product information to a CSV file
with open('products.csv', 'w', newline='', encoding='utf-8') as f:
    writer = csv.writer(f)
    writer.writerow(['Product Name', 'Price', 'Description'])

    for listing in listings:
        name = listing.find('h3')
        price = listing.find('span', class_='price')
        description = listing.find('p')

        # Extract text safely, handling missing elements
        name = name.get_text(strip=True) if name else 'N/A'
        price = price.get_text(strip=True) if price else 'N/A'
        description = description.get_text(strip=True) if description else 'N/A'

        writer.writerow([name, price, description])

print("Scraping completed. Data saved to 'products.csv'.")

Exercise 14: Big Data Processing

Concepts:

Big Data Processing
PySpark
Data Transformations
Aggregation
Parquet file format

Description: Write a PySpark script that reads a CSV file containing customer purchase data, performs some data transformations and aggregation, and saves the results to a Parquet file.

Solution:

from pyspark.sql import SparkSession

# Create a SparkSession object
spark = SparkSession.builder.appName('customer-purchases').getOrCreate()

# Verify if the file exists before reading (optional but useful)
import os
if not os.path.exists('customer_purchases.csv'):
    raise FileNotFoundError("Error: The file 'customer_purchases.csv' does not exist.")

# Read the CSV file into a Spark DataFrame
df = spark.read.csv('customer_purchases.csv', header=True, inferSchema=True)

# Perform some data transformations
df = df.filter((df['purchase_date'] >= '2020-01-01') & (df['purchase_date'] <= '2020-12-31'))
df = df.select('customer_id', 'product_id', 'price')

# Group by customer and calculate total spending
df = df.groupBy('customer_id').sum('price').withColumnRenamed('sum(price)', 'total_spent')

# Save the results to a Parquet file
df.write.mode('overwrite').parquet('customer_spending.parquet')

print("Processing completed. Data saved to 'customer_spending.parquet'.")

Exercise 15: DevOps

Concepts:

DevOps
Fabric library

Description: Write a Python script that automates the deployment of a web application to a remote server using the Fabric library.

Solution:

from fabric import Connection
import getpass

# Define the host and user credentials for the remote server
host = 'example.com'
user = 'user'
password = getpass.getpass("Enter SSH password: ")  # Secure password entry

# Define the path to the web application on the local machine and the remote server
local_path = '/path/to/local/app'
remote_path = '/path/to/remote/app'

# Create a connection to the remote server
c = Connection(host=host, user=user, connect_kwargs={'password': password})

# Ensure the remote directory exists
c.run(f'mkdir -p {remote_path}')

# Upload the local files to the remote server
c.put(local_path, remote_path, recursive=True)  # Enables recursive copy

# Change to the application directory
with c.cd(remote_path):
    # Install required dependencies
    c.run('sudo apt-get update && sudo apt-get install -y python3-pip')
    c.run('pip3 install -r requirements.txt')

    # Start the web application in the background
    c.run('nohup python3 app.py > app.log 2>&1 &', pty=False)

print("Deployment completed successfully.")

Exercise 16: Reinforcement Learning

Concepts:

Reinforcement Learning
Q-Learning
OpenAI Gym library

Description: Write a Python script that implements a reinforcement learning algorithm to teach an agent to play a simple game.

Solution:

import gym
import numpy as np
import time

# Create the FrozenLake environment
env = gym.make("FrozenLake-v1", is_slippery=True)

# Initialize the Q-table
Q = np.zeros([env.observation_space.n, env.action_space.n])

# Set hyperparameters
alpha = 0.8  # Learning rate
gamma = 0.95  # Discount factor
epsilon = 0.1  # Exploration probability
num_episodes = 2000  # Training episodes

# Train the agent using Q-learning
for episode in range(num_episodes):
    state, _ = env.reset()
    done = False

    while not done:
        # Choose action using epsilon-greedy policy
        if np.random.uniform() < epsilon:
            action = env.action_space.sample()  # Random action (exploration)
        else:
            action = np.argmax(Q[state, :])  # Best action from Q-table

        # Take the action and observe the next state
        next_state, reward, done, _, _ = env.step(action)

        # Update Q-value using the Bellman equation
        Q[state, action] = (1 - alpha) * Q[state, action] + alpha * (reward + gamma * np.max(Q[next_state, :]))

        # Move to the next state
        state = next_state

# Test the agent by playing the game
state, _ = env.reset()
done = False
print("\nTesting trained agent:\n")

while not done:
    action = np.argmax(Q[state, :])
    next_state, reward, done, _, _ = env.step(action)
    
    # Render the environment
    env.render()
    time.sleep(0.5)  # Pause for visibility
    
    state = next_state

print("\nGame Over!")

Exercise 17: Time Series Analysis

Concepts:

Time Series Analysis
Data Preprocessing
Data Visualization
ARIMA model
Statsmodels library

Description: Write a Python script that reads a CSV file containing time series data, performs some data preprocessing and visualization, and fits a time series model to the data.

Solution:

import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm

# Read the CSV file into a pandas dataframe
df = pd.read_csv('time_series.csv')

# Convert the date column to a datetime object and set it as the index
df['date'] = pd.to_datetime(df['date'])
df.set_index('date', inplace=True)

# Check for missing values before resampling
if df.isnull().values.any():
    df = df.fillna(method='ffill')

# Ensure the column name is correct
target_col = df.columns[0]  # Assuming first column is the time series value

# Resample the data to a monthly frequency
df = df.resample('M').mean()

# Plot the time series data
plt.figure(figsize=(10, 5))
plt.plot(df.index, df[target_col], label="Time Series")
plt.xlabel("Date")
plt.ylabel("Value")
plt.title("Time Series Visualization")
plt.legend()
plt.grid()
plt.show()

# Fit an ARIMA model
model = sm.tsa.ARIMA(df[target_col].dropna(), order=(1, 1, 1))  # Use dropna() to avoid errors
results = model.fit()

# Print the model summary
print(results.summary())

Exercise 18: Computer Networking

Concepts:

Computer Networking
TCP/IP Protocol
Socket Programming

Description: Write a Python script that implements a simple TCP server that accepts client connections and sends and receives data.

Solution:

import socket

# Define the host and port for the server
host = 'localhost'
port = 12345

# Create a socket object
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)

# Bind the socket to the host and port
s.bind((host, port))

# Listen for incoming connections
s.listen(1)
print('Server listening on', host, port)

# Accept a client connection
conn, addr = s.accept()
print('Connected by', addr)

# Send data to the client
conn.sendall(b'Hello, client!')

# Receive data from the client
data = conn.recv(1024)
print('Received:', data.decode())

# Close the connection
conn.close()

Exercise 19: Data Analysis and Visualization

Concepts:

Data Analysis
Data Visualization
PDF Report Generation
Pandas library
Matplotlib library
ReportLab library

Description: Write a Python script that reads a CSV file containing sales data for a retail store, performs some data analysis and visualization, and saves the results to a PDF report.

Solution:

import pandas as pd
import matplotlib.pyplot as plt
from reportlab.lib.pagesizes import letter
from reportlab.pdfgen import canvas
import os

# Read the CSV file into a pandas dataframe
df = pd.read_csv('sales_data.csv')

# Calculate the total sales by category and month
totals = df.groupby(['category', 'month'])['sales'].sum()

# Get unique categories
categories = df['category'].unique()

# Create subplots dynamically based on the number of categories
fig, axes = plt.subplots(nrows=len(categories), ncols=1, figsize=(8.5, 11))

# Ensure `axes` is always iterable (even if there's only one category)
if len(categories) == 1:
    axes = [axes]

# Plot total sales by category and month
for i, category in enumerate(categories):
    totals.loc[category].plot(ax=axes[i], kind='bar', title=f"Category: {category}")
    axes[i].set_ylabel("Sales")

plt.tight_layout()
plt.savefig('sales_plot.png')  # Save the figure
plt.close(fig)  # Close to free memory

# Create a PDF report
pdf_filename = 'sales_report.pdf'
c = canvas.Canvas(pdf_filename, pagesize=letter)

# Add title and description
c.setFont("Helvetica-Bold", 16)
c.drawString(50, 750, 'Sales Report')

c.setFont("Helvetica", 12)
c.drawString(50, 730, 'Total Sales by Category and Month')

# Add the image to the PDF if it exists
if os.path.exists('sales_plot.png'):
    c.drawImage('sales_plot.png', 50, 450, width=500, height=300)

# Save and close the PDF
c.showPage()
c.save()

print(f"Report saved as {pdf_filename}")

Exercise 20: Machine Learning

Concepts:

Machine Learning
Convolutional Neural Networks
Keras library
MNIST dataset

Description: Write a Python script that trains a machine learning model to classify images of handwritten digits from the MNIST dataset.

Solution:

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

# Load the MNIST dataset
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

# Normalize the pixel values and reshape the data
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0
x_train = x_train.reshape(-1, 28, 28, 1)
x_test = x_test.reshape(-1, 28, 28, 1)

# Define the CNN model
model = keras.Sequential([
    layers.Conv2D(32, (3, 3), activation='relu', padding='same', input_shape=(28, 28, 1)),
    layers.MaxPooling2D((2, 2)),

    layers.Conv2D(64, (3, 3), activation='relu', padding='same'),
    layers.MaxPooling2D((2, 2)),

    layers.Conv2D(128, (3, 3), activation='relu', padding='same'),
    layers.MaxPooling2D((2, 2)),

    layers.Flatten(),
    layers.Dense(128, activation='relu'),  # Added a fully connected layer
    layers.Dropout(0.5),  # Prevent overfitting
    layers.Dense(10, activation='softmax')
])

# Compile the model
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# Train the model
model.fit(x_train, y_train, epochs=10, validation_data=(x_test, y_test), batch_size=64)

# Evaluate the model on the test data
test_loss, test_acc = model.evaluate(x_test, y_test, verbose=2)
print('Test accuracy:', test_acc)

Exercise 21: Natural Language Processing

Concepts:

Natural Language Processing

Text Preprocessing

Text Representation

Topic Modeling

Latent Dirichlet Allocation

Gensim library

Description: Write a Python script that uses natural language processing techniques to analyze a corpus of text data and extract useful insights.

Solution:

import gensim
from gensim import corpora
from gensim.models import LdaModel
import pandas as pd
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize

# Download required resources
nltk.download('stopwords')
nltk.download('punkt')

# Read the text data into a pandas dataframe
df = pd.read_csv('text_data.csv')

# Handle missing values
df['text'] = df['text'].fillna('')

# Define stop words and clean text
stop_words = set(stopwords.words('english'))

def preprocess_text(text):
    tokens = word_tokenize(text.lower())  # Tokenization & lowercasing
    return [word for word in tokens if word.isalnum() and word not in stop_words]  # Remove punctuation & stopwords

df['cleaned_text'] = df['text'].apply(preprocess_text)

# Create a document-term matrix
texts = df['cleaned_text'].tolist()
dictionary = corpora.Dictionary(texts)
corpus = [dictionary.doc2bow(text) for text in texts]

# Train LDA model
num_topics = 5
lda_model = LdaModel(corpus, num_topics=num_topics, id2word=dictionary, passes=10)

# Print topics and top words for each
for topic_id, words in lda_model.show_topics(num_topics=num_topics, formatted=False):
    print(f'Topic {topic_id}:', ', '.join(word for word, _ in words))

# Convert topic distributions into a structured DataFrame
topic_dists = [{f"Topic_{topic}": prob for topic, prob in lda_model.get_document_topics(doc, minimum_probability=0)} for doc in corpus]
topic_df = pd.DataFrame(topic_dists)

# Merge topic distributions with original data
df = pd.concat([df, topic_df], axis=1)

# Save the results
df.to_csv('text_data_topics.csv', index=False)
print("Saved processed data to 'text_data_topics.csv'.")

Exercise 22: Web Scraping

Concepts:

Web Scraping
HTML Parsing
BeautifulSoup library
CSV File I/O

Description: Write a Python script that scrapes data from a website using the BeautifulSoup library and saves it to a CSV file.

Solution:

import requests
from bs4 import BeautifulSoup
import csv

# Define the URL to scrape
url = 'https://www.example.com'

# Headers to mimic a real browser request
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)'}

# Send a GET request
response = requests.get(url, headers=headers)

# Check if the request was successful
if response.status_code != 200:
    print(f"Error: Unable to fetch data (Status Code: {response.status_code})")
    exit()

# Parse the HTML content
soup = BeautifulSoup(response.content, 'html.parser')

# Extract data
data = []
for item in soup.find_all('div', class_='item'):
    name_tag = item.find('h3')
    price_tag = item.find('span', class_='price')

    # Extract text safely, handling missing elements
    name = name_tag.get_text(strip=True) if name_tag else 'N/A'
    price = price_tag.get_text(strip=True) if price_tag else 'N/A'

    data.append([name, price])

# Save to CSV
csv_filename = 'data.csv'
with open(csv_filename, 'w', newline='', encoding='utf-8') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow(['Name', 'Price'])  # Add headers
    writer.writerows(data)

print(f"Scraping completed. Data saved to '{csv_filename}'.")

Exercise 23: Database Interaction

Concepts:

Database Interaction
SQLite database
SQL queries
SQLite3 module

Description: Write a Python script that interacts with a database to retrieve and manipulate data.

Solution:

import sqlite3

# Connect to the database
conn = sqlite3.connect('example.db')

# Create a cursor object
c = conn.cursor()

# Execute an SQL query to create a table
c.execute('''CREATE TABLE IF NOT EXISTS customers
             (id INTEGER PRIMARY KEY, name TEXT, email TEXT, phone TEXT)''')

# Execute an SQL query to insert data into the table
c.execute("INSERT INTO customers (name, email, phone) VALUES ('John Smith', 'john@example.com', '555-1234')")

# Execute an SQL query to retrieve data from the table
c.execute("SELECT * FROM customers")
rows = c.fetchall()
for row in rows:
    print(row)

# Execute an SQL query to update data in the table
c.execute("UPDATE customers SET phone='555-5678' WHERE name='John Smith'")

# Execute an SQL query to delete data from the table
c.execute("DELETE FROM customers WHERE name='John Smith'")

# Commit the changes to the database
conn.commit()

# Close the database connection
conn.close()

Exercise 24: Parallel Processing

Concepts:

Parallel Processing
Multiprocessing
Process Pool
CPU-bound tasks

Description: Write a Python script that performs a time-consuming computation using parallel processing to speed up the computation.

Solution:

import time
import multiprocessing

# Define an optimized CPU-bound function
def compute(num):
    return num * (num - 1) // 2  # Uses O(1) formula instead of a loop

if __name__ == '__main__':
    # Create a process pool with the number of CPUs available
    num_cpus = multiprocessing.cpu_count()
    pool = multiprocessing.Pool(num_cpus)

    # Generate a list of numbers to compute
    num_list = [10000000] * num_cpus

    # Compute the results using parallel processing
    start_time = time.time()
    results = pool.map(compute, num_list)

    # Close the pool properly
    pool.close()
    pool.join()

    end_time = time.time()

    # Print the results and computation time
    print('Results:', results)
    print('Computation time:', end_time - start_time, 'seconds')

Exercise 25: Image Processing

Concepts:

Image Processing
Pillow library
Image Manipulation
Image Filtering

Description: Write a Python script that performs basic image processing operations on an image file.

Solution:

from PIL import Image, ImageFilter
import os

# Define image paths
input_path = 'example.jpg'
output_path = 'processed.jpg'

# Check if the input file exists
if not os.path.exists(input_path):
    raise FileNotFoundError(f"Error: The file '{input_path}' was not found.")

try:
    # Open the image file using a context manager
    with Image.open(input_path) as image:
        # Display the original image (optional, may not work in all environments)
        image.show()

        # Resize the image
        image = image.resize((500, 500))

        # Convert the image to grayscale
        image = image.convert('L')

        # Apply a Gaussian blur filter
        image = image.filter(ImageFilter.GaussianBlur(radius=2))

        # Save the processed image to a file
        image.save(output_path)

        # Display the processed image
        image.show()
        print(f"Processed image saved as '{output_path}'.")

except Exception as e:
    print(f"An error occurred: {e}")

I hope you find these exercises useful! Let me know if you have any further questions.

Purchase this book

The App is Under a Quick Maintenance

We apologize for the inconvenience. Please come back later

Chapter 6: Advance Level Exercises

Advance Level Exercises Part 1

Exercise 1: File Parsing

Exercise 2: Data Analysis

Exercise 3: Web Scraping

Exercise 4: Multithreading

Exercise 5: Machine Learning

Exercise 6: Natural Language Processing

Exercise 7: Web Development

Exercise 8: Data Visualization

Exercise 9: Machine Learning

Exercise 10: Data Analysis

Exercise 11: Computer Vision

Exercise 12: Natural Language Processing

Exercise 13: Web Scraping

Exercise 14: Big Data Processing

Exercise 15: DevOps

Exercise 16: Reinforcement Learning

Exercise 17: Time Series Analysis

Exercise 18: Computer Networking

Exercise 19: Data Analysis and Visualization

Exercise 20: Machine Learning

Exercise 21: Natural Language Processing

Exercise 22: Web Scraping

Exercise 23: Database Interaction

Exercise 24: Parallel Processing

Exercise 25: Image Processing

Advance Level Exercises Part 1

Exercise 1: File Parsing

Exercise 2: Data Analysis

Exercise 3: Web Scraping

Exercise 4: Multithreading

Exercise 5: Machine Learning

Exercise 6: Natural Language Processing

Exercise 7: Web Development

Exercise 8: Data Visualization

Exercise 9: Machine Learning

Exercise 10: Data Analysis

Exercise 11: Computer Vision

Exercise 12: Natural Language Processing

Exercise 13: Web Scraping

Exercise 14: Big Data Processing

Exercise 15: DevOps

Exercise 16: Reinforcement Learning

Exercise 17: Time Series Analysis

Exercise 18: Computer Networking

Exercise 19: Data Analysis and Visualization

Exercise 20: Machine Learning

Exercise 21: Natural Language Processing

Exercise 22: Web Scraping

Exercise 23: Database Interaction

Exercise 24: Parallel Processing

Exercise 25: Image Processing

Advance Level Exercises Part 1

Exercise 1: File Parsing

Exercise 2: Data Analysis

Exercise 3: Web Scraping

Exercise 4: Multithreading

Exercise 5: Machine Learning

Exercise 6: Natural Language Processing

Exercise 7: Web Development

Exercise 8: Data Visualization

Exercise 9: Machine Learning

Exercise 10: Data Analysis

Exercise 11: Computer Vision

Exercise 12: Natural Language Processing

Exercise 13: Web Scraping

Exercise 14: Big Data Processing

Exercise 15: DevOps

Exercise 16: Reinforcement Learning

Exercise 17: Time Series Analysis

Exercise 18: Computer Networking

Exercise 19: Data Analysis and Visualization

Exercise 20: Machine Learning

Exercise 21: Natural Language Processing

Exercise 22: Web Scraping

Exercise 23: Database Interaction

Exercise 24: Parallel Processing