Chapter 16: Case Study 1: Sales Data Analysis
16.4 Practical Exercises: Sales Data Analysis
After diving deep into our case study on Sales Data Analysis, it's time for some hands-on practice. Below are practical exercises that complement the material we've covered. Remember, the best way to solidify your understanding is through application!
Exercise 1: Data Exploration
- Load the 'sales_data.csv' file into a DataFrame.
- Display the first 5 rows of the DataFrame.
- Provide basic statistics for each numeric column.
Solution
import pandas as pd
# Load the data
df = pd.read_csv('sales_data.csv')
# Display the first 5 rows
print(df.head())
# Basic statistics
print(df.describe())
Download here the sales_data.csv file
Exercise 2: Data Visualization
- Create a bar chart showing the total revenue generated by each product.
- Create a pie chart representing the percentage of total sales for each product.
Solution
import matplotlib.pyplot as plt
# Bar chart
df_grouped = df.groupby('Product_Name')['Revenue'].sum()
plt.bar(df_grouped.index, df_grouped.values)
plt.xlabel('Product Name')
plt.ylabel('Total Revenue')
plt.title('Total Revenue by Product')
plt.show()
# Pie chart
plt.pie(df_grouped, labels=df_grouped.index, autopct='%1.1f%%')
plt.title('Percentage of Total Sales by Product')
plt.show()
Exercise 3: Simple Predictive Modeling
- Train a simple linear regression model to predict the revenue based on the quantity sold for the product "Phone."
- Evaluate the model using R-squared.
Solution
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score
# Filter the data for the product "Phone"
df_phone = df[df['Product_Name'] == 'Phone']
# Train the model
X = df_phone[['Quantity_Sold']]
y = df_phone['Revenue']
model = LinearRegression()
model.fit(X, y)
# Make predictions
y_pred = model.predict(X)
# Evaluate the model
print('R-squared:', r2_score(y, y_pred))
Exercise 4: Advanced
Try different machine learning algorithms to predict revenue. Compare their performances.
Solution
Note: The code for different algorithms may vary. The focus here is on showing how to use multiple algorithms and compare their performances.
from sklearn.ensemble import RandomForestRegressor
from sklearn.svm import SVR
# Random Forest
rf_model = RandomForestRegressor()
rf_model.fit(X, y)
rf_y_pred = rf_model.predict(X)
print('Random Forest R-squared:', r2_score(y, rf_y_pred))
# Support Vector Regression
svr_model = SVR()
svr_model.fit(X, y)
svr_y_pred = svr_model.predict(X)
print('SVR R-squared:', r2_score(y, svr_y_pred))
16.4 Practical Exercises: Sales Data Analysis
After diving deep into our case study on Sales Data Analysis, it's time for some hands-on practice. Below are practical exercises that complement the material we've covered. Remember, the best way to solidify your understanding is through application!
Exercise 1: Data Exploration
- Load the 'sales_data.csv' file into a DataFrame.
- Display the first 5 rows of the DataFrame.
- Provide basic statistics for each numeric column.
Solution
import pandas as pd
# Load the data
df = pd.read_csv('sales_data.csv')
# Display the first 5 rows
print(df.head())
# Basic statistics
print(df.describe())
Download here the sales_data.csv file
Exercise 2: Data Visualization
- Create a bar chart showing the total revenue generated by each product.
- Create a pie chart representing the percentage of total sales for each product.
Solution
import matplotlib.pyplot as plt
# Bar chart
df_grouped = df.groupby('Product_Name')['Revenue'].sum()
plt.bar(df_grouped.index, df_grouped.values)
plt.xlabel('Product Name')
plt.ylabel('Total Revenue')
plt.title('Total Revenue by Product')
plt.show()
# Pie chart
plt.pie(df_grouped, labels=df_grouped.index, autopct='%1.1f%%')
plt.title('Percentage of Total Sales by Product')
plt.show()
Exercise 3: Simple Predictive Modeling
- Train a simple linear regression model to predict the revenue based on the quantity sold for the product "Phone."
- Evaluate the model using R-squared.
Solution
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score
# Filter the data for the product "Phone"
df_phone = df[df['Product_Name'] == 'Phone']
# Train the model
X = df_phone[['Quantity_Sold']]
y = df_phone['Revenue']
model = LinearRegression()
model.fit(X, y)
# Make predictions
y_pred = model.predict(X)
# Evaluate the model
print('R-squared:', r2_score(y, y_pred))
Exercise 4: Advanced
Try different machine learning algorithms to predict revenue. Compare their performances.
Solution
Note: The code for different algorithms may vary. The focus here is on showing how to use multiple algorithms and compare their performances.
from sklearn.ensemble import RandomForestRegressor
from sklearn.svm import SVR
# Random Forest
rf_model = RandomForestRegressor()
rf_model.fit(X, y)
rf_y_pred = rf_model.predict(X)
print('Random Forest R-squared:', r2_score(y, rf_y_pred))
# Support Vector Regression
svr_model = SVR()
svr_model.fit(X, y)
svr_y_pred = svr_model.predict(X)
print('SVR R-squared:', r2_score(y, svr_y_pred))
16.4 Practical Exercises: Sales Data Analysis
After diving deep into our case study on Sales Data Analysis, it's time for some hands-on practice. Below are practical exercises that complement the material we've covered. Remember, the best way to solidify your understanding is through application!
Exercise 1: Data Exploration
- Load the 'sales_data.csv' file into a DataFrame.
- Display the first 5 rows of the DataFrame.
- Provide basic statistics for each numeric column.
Solution
import pandas as pd
# Load the data
df = pd.read_csv('sales_data.csv')
# Display the first 5 rows
print(df.head())
# Basic statistics
print(df.describe())
Download here the sales_data.csv file
Exercise 2: Data Visualization
- Create a bar chart showing the total revenue generated by each product.
- Create a pie chart representing the percentage of total sales for each product.
Solution
import matplotlib.pyplot as plt
# Bar chart
df_grouped = df.groupby('Product_Name')['Revenue'].sum()
plt.bar(df_grouped.index, df_grouped.values)
plt.xlabel('Product Name')
plt.ylabel('Total Revenue')
plt.title('Total Revenue by Product')
plt.show()
# Pie chart
plt.pie(df_grouped, labels=df_grouped.index, autopct='%1.1f%%')
plt.title('Percentage of Total Sales by Product')
plt.show()
Exercise 3: Simple Predictive Modeling
- Train a simple linear regression model to predict the revenue based on the quantity sold for the product "Phone."
- Evaluate the model using R-squared.
Solution
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score
# Filter the data for the product "Phone"
df_phone = df[df['Product_Name'] == 'Phone']
# Train the model
X = df_phone[['Quantity_Sold']]
y = df_phone['Revenue']
model = LinearRegression()
model.fit(X, y)
# Make predictions
y_pred = model.predict(X)
# Evaluate the model
print('R-squared:', r2_score(y, y_pred))
Exercise 4: Advanced
Try different machine learning algorithms to predict revenue. Compare their performances.
Solution
Note: The code for different algorithms may vary. The focus here is on showing how to use multiple algorithms and compare their performances.
from sklearn.ensemble import RandomForestRegressor
from sklearn.svm import SVR
# Random Forest
rf_model = RandomForestRegressor()
rf_model.fit(X, y)
rf_y_pred = rf_model.predict(X)
print('Random Forest R-squared:', r2_score(y, rf_y_pred))
# Support Vector Regression
svr_model = SVR()
svr_model.fit(X, y)
svr_y_pred = svr_model.predict(X)
print('SVR R-squared:', r2_score(y, svr_y_pred))
16.4 Practical Exercises: Sales Data Analysis
After diving deep into our case study on Sales Data Analysis, it's time for some hands-on practice. Below are practical exercises that complement the material we've covered. Remember, the best way to solidify your understanding is through application!
Exercise 1: Data Exploration
- Load the 'sales_data.csv' file into a DataFrame.
- Display the first 5 rows of the DataFrame.
- Provide basic statistics for each numeric column.
Solution
import pandas as pd
# Load the data
df = pd.read_csv('sales_data.csv')
# Display the first 5 rows
print(df.head())
# Basic statistics
print(df.describe())
Download here the sales_data.csv file
Exercise 2: Data Visualization
- Create a bar chart showing the total revenue generated by each product.
- Create a pie chart representing the percentage of total sales for each product.
Solution
import matplotlib.pyplot as plt
# Bar chart
df_grouped = df.groupby('Product_Name')['Revenue'].sum()
plt.bar(df_grouped.index, df_grouped.values)
plt.xlabel('Product Name')
plt.ylabel('Total Revenue')
plt.title('Total Revenue by Product')
plt.show()
# Pie chart
plt.pie(df_grouped, labels=df_grouped.index, autopct='%1.1f%%')
plt.title('Percentage of Total Sales by Product')
plt.show()
Exercise 3: Simple Predictive Modeling
- Train a simple linear regression model to predict the revenue based on the quantity sold for the product "Phone."
- Evaluate the model using R-squared.
Solution
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score
# Filter the data for the product "Phone"
df_phone = df[df['Product_Name'] == 'Phone']
# Train the model
X = df_phone[['Quantity_Sold']]
y = df_phone['Revenue']
model = LinearRegression()
model.fit(X, y)
# Make predictions
y_pred = model.predict(X)
# Evaluate the model
print('R-squared:', r2_score(y, y_pred))
Exercise 4: Advanced
Try different machine learning algorithms to predict revenue. Compare their performances.
Solution
Note: The code for different algorithms may vary. The focus here is on showing how to use multiple algorithms and compare their performances.
from sklearn.ensemble import RandomForestRegressor
from sklearn.svm import SVR
# Random Forest
rf_model = RandomForestRegressor()
rf_model.fit(X, y)
rf_y_pred = rf_model.predict(X)
print('Random Forest R-squared:', r2_score(y, rf_y_pred))
# Support Vector Regression
svr_model = SVR()
svr_model.fit(X, y)
svr_y_pred = svr_model.predict(X)
print('SVR R-squared:', r2_score(y, svr_y_pred))