Menu iconMenu iconData Analysis Foundations with Python
Data Analysis Foundations with Python

Chapter 16: Case Study 1: Sales Data Analysis

16.4 Practical Exercises: Sales Data Analysis

After diving deep into our case study on Sales Data Analysis, it's time for some hands-on practice. Below are practical exercises that complement the material we've covered. Remember, the best way to solidify your understanding is through application!  

Exercise 1: Data Exploration

  1. Load the 'sales_data.csv' file into a DataFrame.
  2. Display the first 5 rows of the DataFrame.
  3. Provide basic statistics for each numeric column.

Solution

import pandas as pd

# Load the data
df = pd.read_csv('sales_data.csv')

# Display the first 5 rows
print(df.head())

# Basic statistics
print(df.describe())

Download here the sales_data.csv file

Exercise 2: Data Visualization

  1. Create a bar chart showing the total revenue generated by each product.
  2. Create a pie chart representing the percentage of total sales for each product.

Solution

import matplotlib.pyplot as plt

# Bar chart
df_grouped = df.groupby('Product_Name')['Revenue'].sum()
plt.bar(df_grouped.index, df_grouped.values)
plt.xlabel('Product Name')
plt.ylabel('Total Revenue')
plt.title('Total Revenue by Product')
plt.show()

# Pie chart
plt.pie(df_grouped, labels=df_grouped.index, autopct='%1.1f%%')
plt.title('Percentage of Total Sales by Product')
plt.show()

Exercise 3: Simple Predictive Modeling

  1. Train a simple linear regression model to predict the revenue based on the quantity sold for the product "Phone."
  2. Evaluate the model using R-squared.

Solution

from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score

# Filter the data for the product "Phone"
df_phone = df[df['Product_Name'] == 'Phone']

# Train the model
X = df_phone[['Quantity_Sold']]
y = df_phone['Revenue']
model = LinearRegression()
model.fit(X, y)

# Make predictions
y_pred = model.predict(X)

# Evaluate the model
print('R-squared:', r2_score(y, y_pred))

Exercise 4: Advanced

Try different machine learning algorithms to predict revenue. Compare their performances.

Solution

Note: The code for different algorithms may vary. The focus here is on showing how to use multiple algorithms and compare their performances.

from sklearn.ensemble import RandomForestRegressor
from sklearn.svm import SVR

# Random Forest
rf_model = RandomForestRegressor()
rf_model.fit(X, y)
rf_y_pred = rf_model.predict(X)
print('Random Forest R-squared:', r2_score(y, rf_y_pred))

# Support Vector Regression
svr_model = SVR()
svr_model.fit(X, y)
svr_y_pred = svr_model.predict(X)
print('SVR R-squared:', r2_score(y, svr_y_pred))

16.4 Practical Exercises: Sales Data Analysis

After diving deep into our case study on Sales Data Analysis, it's time for some hands-on practice. Below are practical exercises that complement the material we've covered. Remember, the best way to solidify your understanding is through application!  

Exercise 1: Data Exploration

  1. Load the 'sales_data.csv' file into a DataFrame.
  2. Display the first 5 rows of the DataFrame.
  3. Provide basic statistics for each numeric column.

Solution

import pandas as pd

# Load the data
df = pd.read_csv('sales_data.csv')

# Display the first 5 rows
print(df.head())

# Basic statistics
print(df.describe())

Download here the sales_data.csv file

Exercise 2: Data Visualization

  1. Create a bar chart showing the total revenue generated by each product.
  2. Create a pie chart representing the percentage of total sales for each product.

Solution

import matplotlib.pyplot as plt

# Bar chart
df_grouped = df.groupby('Product_Name')['Revenue'].sum()
plt.bar(df_grouped.index, df_grouped.values)
plt.xlabel('Product Name')
plt.ylabel('Total Revenue')
plt.title('Total Revenue by Product')
plt.show()

# Pie chart
plt.pie(df_grouped, labels=df_grouped.index, autopct='%1.1f%%')
plt.title('Percentage of Total Sales by Product')
plt.show()

Exercise 3: Simple Predictive Modeling

  1. Train a simple linear regression model to predict the revenue based on the quantity sold for the product "Phone."
  2. Evaluate the model using R-squared.

Solution

from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score

# Filter the data for the product "Phone"
df_phone = df[df['Product_Name'] == 'Phone']

# Train the model
X = df_phone[['Quantity_Sold']]
y = df_phone['Revenue']
model = LinearRegression()
model.fit(X, y)

# Make predictions
y_pred = model.predict(X)

# Evaluate the model
print('R-squared:', r2_score(y, y_pred))

Exercise 4: Advanced

Try different machine learning algorithms to predict revenue. Compare their performances.

Solution

Note: The code for different algorithms may vary. The focus here is on showing how to use multiple algorithms and compare their performances.

from sklearn.ensemble import RandomForestRegressor
from sklearn.svm import SVR

# Random Forest
rf_model = RandomForestRegressor()
rf_model.fit(X, y)
rf_y_pred = rf_model.predict(X)
print('Random Forest R-squared:', r2_score(y, rf_y_pred))

# Support Vector Regression
svr_model = SVR()
svr_model.fit(X, y)
svr_y_pred = svr_model.predict(X)
print('SVR R-squared:', r2_score(y, svr_y_pred))

16.4 Practical Exercises: Sales Data Analysis

After diving deep into our case study on Sales Data Analysis, it's time for some hands-on practice. Below are practical exercises that complement the material we've covered. Remember, the best way to solidify your understanding is through application!  

Exercise 1: Data Exploration

  1. Load the 'sales_data.csv' file into a DataFrame.
  2. Display the first 5 rows of the DataFrame.
  3. Provide basic statistics for each numeric column.

Solution

import pandas as pd

# Load the data
df = pd.read_csv('sales_data.csv')

# Display the first 5 rows
print(df.head())

# Basic statistics
print(df.describe())

Download here the sales_data.csv file

Exercise 2: Data Visualization

  1. Create a bar chart showing the total revenue generated by each product.
  2. Create a pie chart representing the percentage of total sales for each product.

Solution

import matplotlib.pyplot as plt

# Bar chart
df_grouped = df.groupby('Product_Name')['Revenue'].sum()
plt.bar(df_grouped.index, df_grouped.values)
plt.xlabel('Product Name')
plt.ylabel('Total Revenue')
plt.title('Total Revenue by Product')
plt.show()

# Pie chart
plt.pie(df_grouped, labels=df_grouped.index, autopct='%1.1f%%')
plt.title('Percentage of Total Sales by Product')
plt.show()

Exercise 3: Simple Predictive Modeling

  1. Train a simple linear regression model to predict the revenue based on the quantity sold for the product "Phone."
  2. Evaluate the model using R-squared.

Solution

from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score

# Filter the data for the product "Phone"
df_phone = df[df['Product_Name'] == 'Phone']

# Train the model
X = df_phone[['Quantity_Sold']]
y = df_phone['Revenue']
model = LinearRegression()
model.fit(X, y)

# Make predictions
y_pred = model.predict(X)

# Evaluate the model
print('R-squared:', r2_score(y, y_pred))

Exercise 4: Advanced

Try different machine learning algorithms to predict revenue. Compare their performances.

Solution

Note: The code for different algorithms may vary. The focus here is on showing how to use multiple algorithms and compare their performances.

from sklearn.ensemble import RandomForestRegressor
from sklearn.svm import SVR

# Random Forest
rf_model = RandomForestRegressor()
rf_model.fit(X, y)
rf_y_pred = rf_model.predict(X)
print('Random Forest R-squared:', r2_score(y, rf_y_pred))

# Support Vector Regression
svr_model = SVR()
svr_model.fit(X, y)
svr_y_pred = svr_model.predict(X)
print('SVR R-squared:', r2_score(y, svr_y_pred))

16.4 Practical Exercises: Sales Data Analysis

After diving deep into our case study on Sales Data Analysis, it's time for some hands-on practice. Below are practical exercises that complement the material we've covered. Remember, the best way to solidify your understanding is through application!  

Exercise 1: Data Exploration

  1. Load the 'sales_data.csv' file into a DataFrame.
  2. Display the first 5 rows of the DataFrame.
  3. Provide basic statistics for each numeric column.

Solution

import pandas as pd

# Load the data
df = pd.read_csv('sales_data.csv')

# Display the first 5 rows
print(df.head())

# Basic statistics
print(df.describe())

Download here the sales_data.csv file

Exercise 2: Data Visualization

  1. Create a bar chart showing the total revenue generated by each product.
  2. Create a pie chart representing the percentage of total sales for each product.

Solution

import matplotlib.pyplot as plt

# Bar chart
df_grouped = df.groupby('Product_Name')['Revenue'].sum()
plt.bar(df_grouped.index, df_grouped.values)
plt.xlabel('Product Name')
plt.ylabel('Total Revenue')
plt.title('Total Revenue by Product')
plt.show()

# Pie chart
plt.pie(df_grouped, labels=df_grouped.index, autopct='%1.1f%%')
plt.title('Percentage of Total Sales by Product')
plt.show()

Exercise 3: Simple Predictive Modeling

  1. Train a simple linear regression model to predict the revenue based on the quantity sold for the product "Phone."
  2. Evaluate the model using R-squared.

Solution

from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score

# Filter the data for the product "Phone"
df_phone = df[df['Product_Name'] == 'Phone']

# Train the model
X = df_phone[['Quantity_Sold']]
y = df_phone['Revenue']
model = LinearRegression()
model.fit(X, y)

# Make predictions
y_pred = model.predict(X)

# Evaluate the model
print('R-squared:', r2_score(y, y_pred))

Exercise 4: Advanced

Try different machine learning algorithms to predict revenue. Compare their performances.

Solution

Note: The code for different algorithms may vary. The focus here is on showing how to use multiple algorithms and compare their performances.

from sklearn.ensemble import RandomForestRegressor
from sklearn.svm import SVR

# Random Forest
rf_model = RandomForestRegressor()
rf_model.fit(X, y)
rf_y_pred = rf_model.predict(X)
print('Random Forest R-squared:', r2_score(y, rf_y_pred))

# Support Vector Regression
svr_model = SVR()
svr_model.fit(X, y)
svr_y_pred = svr_model.predict(X)
print('SVR R-squared:', r2_score(y, svr_y_pred))