# Chapter 16: Case Study 1: Sales Data Analysis

## 16.4 Practical Exercises: Sales Data Analysis

After diving deep into our case study on Sales Data Analysis, it's time for some hands-on practice. Below are practical exercises that complement the material we've covered. Remember, the best way to solidify your understanding is through application!

### Exercise 1: Data Exploration

- Load the 'sales_data.csv' file into a DataFrame.
- Display the first 5 rows of the DataFrame.
- Provide basic statistics for each numeric column.

**Solution**

`import pandas as pd`

# Load the data

df = pd.read_csv('sales_data.csv')

# Display the first 5 rows

print(df.head())

# Basic statistics

print(df.describe())

Download here the sales_data.csv file

### Exercise 2: Data Visualization

- Create a bar chart showing the total revenue generated by each product.
- Create a pie chart representing the percentage of total sales for each product.

**Solution**

`import matplotlib.pyplot as plt`

# Bar chart

df_grouped = df.groupby('Product_Name')['Revenue'].sum()

plt.bar(df_grouped.index, df_grouped.values)

plt.xlabel('Product Name')

plt.ylabel('Total Revenue')

plt.title('Total Revenue by Product')

plt.show()

# Pie chart

plt.pie(df_grouped, labels=df_grouped.index, autopct='%1.1f%%')

plt.title('Percentage of Total Sales by Product')

plt.show()

### Exercise 3: Simple Predictive Modeling

- Train a simple linear regression model to predict the revenue based on the quantity sold for the product "Phone."
- Evaluate the model using R-squared.

**Solution**

`from sklearn.linear_model import LinearRegression`

from sklearn.metrics import r2_score

# Filter the data for the product "Phone"

df_phone = df[df['Product_Name'] == 'Phone']

# Train the model

X = df_phone[['Quantity_Sold']]

y = df_phone['Revenue']

model = LinearRegression()

model.fit(X, y)

# Make predictions

y_pred = model.predict(X)

# Evaluate the model

print('R-squared:', r2_score(y, y_pred))

### Exercise 4: Advanced

Try different machine learning algorithms to predict revenue. Compare their performances.

**Solution**

*Note: The code for different algorithms may vary. The focus here is on showing how to use multiple algorithms and compare their performances.*

`from sklearn.ensemble import RandomForestRegressor`

from sklearn.svm import SVR

# Random Forest

rf_model = RandomForestRegressor()

rf_model.fit(X, y)

rf_y_pred = rf_model.predict(X)

print('Random Forest R-squared:', r2_score(y, rf_y_pred))

# Support Vector Regression

svr_model = SVR()

svr_model.fit(X, y)

svr_y_pred = svr_model.predict(X)

print('SVR R-squared:', r2_score(y, svr_y_pred))

## 16.4 Practical Exercises: Sales Data Analysis

After diving deep into our case study on Sales Data Analysis, it's time for some hands-on practice. Below are practical exercises that complement the material we've covered. Remember, the best way to solidify your understanding is through application!

### Exercise 1: Data Exploration

- Load the 'sales_data.csv' file into a DataFrame.
- Display the first 5 rows of the DataFrame.
- Provide basic statistics for each numeric column.

**Solution**

`import pandas as pd`

# Load the data

df = pd.read_csv('sales_data.csv')

# Display the first 5 rows

print(df.head())

# Basic statistics

print(df.describe())

Download here the sales_data.csv file

### Exercise 2: Data Visualization

- Create a bar chart showing the total revenue generated by each product.
- Create a pie chart representing the percentage of total sales for each product.

**Solution**

`import matplotlib.pyplot as plt`

# Bar chart

df_grouped = df.groupby('Product_Name')['Revenue'].sum()

plt.bar(df_grouped.index, df_grouped.values)

plt.xlabel('Product Name')

plt.ylabel('Total Revenue')

plt.title('Total Revenue by Product')

plt.show()

# Pie chart

plt.pie(df_grouped, labels=df_grouped.index, autopct='%1.1f%%')

plt.title('Percentage of Total Sales by Product')

plt.show()

### Exercise 3: Simple Predictive Modeling

- Train a simple linear regression model to predict the revenue based on the quantity sold for the product "Phone."
- Evaluate the model using R-squared.

**Solution**

`from sklearn.linear_model import LinearRegression`

from sklearn.metrics import r2_score

# Filter the data for the product "Phone"

df_phone = df[df['Product_Name'] == 'Phone']

# Train the model

X = df_phone[['Quantity_Sold']]

y = df_phone['Revenue']

model = LinearRegression()

model.fit(X, y)

# Make predictions

y_pred = model.predict(X)

# Evaluate the model

print('R-squared:', r2_score(y, y_pred))

### Exercise 4: Advanced

Try different machine learning algorithms to predict revenue. Compare their performances.

**Solution**

*Note: The code for different algorithms may vary. The focus here is on showing how to use multiple algorithms and compare their performances.*

`from sklearn.ensemble import RandomForestRegressor`

from sklearn.svm import SVR

# Random Forest

rf_model = RandomForestRegressor()

rf_model.fit(X, y)

rf_y_pred = rf_model.predict(X)

print('Random Forest R-squared:', r2_score(y, rf_y_pred))

# Support Vector Regression

svr_model = SVR()

svr_model.fit(X, y)

svr_y_pred = svr_model.predict(X)

print('SVR R-squared:', r2_score(y, svr_y_pred))

## 16.4 Practical Exercises: Sales Data Analysis

After diving deep into our case study on Sales Data Analysis, it's time for some hands-on practice. Below are practical exercises that complement the material we've covered. Remember, the best way to solidify your understanding is through application!

### Exercise 1: Data Exploration

- Load the 'sales_data.csv' file into a DataFrame.
- Display the first 5 rows of the DataFrame.
- Provide basic statistics for each numeric column.

**Solution**

`import pandas as pd`

# Load the data

df = pd.read_csv('sales_data.csv')

# Display the first 5 rows

print(df.head())

# Basic statistics

print(df.describe())

Download here the sales_data.csv file

### Exercise 2: Data Visualization

- Create a bar chart showing the total revenue generated by each product.
- Create a pie chart representing the percentage of total sales for each product.

**Solution**

`import matplotlib.pyplot as plt`

# Bar chart

df_grouped = df.groupby('Product_Name')['Revenue'].sum()

plt.bar(df_grouped.index, df_grouped.values)

plt.xlabel('Product Name')

plt.ylabel('Total Revenue')

plt.title('Total Revenue by Product')

plt.show()

# Pie chart

plt.pie(df_grouped, labels=df_grouped.index, autopct='%1.1f%%')

plt.title('Percentage of Total Sales by Product')

plt.show()

### Exercise 3: Simple Predictive Modeling

- Train a simple linear regression model to predict the revenue based on the quantity sold for the product "Phone."
- Evaluate the model using R-squared.

**Solution**

`from sklearn.linear_model import LinearRegression`

from sklearn.metrics import r2_score

# Filter the data for the product "Phone"

df_phone = df[df['Product_Name'] == 'Phone']

# Train the model

X = df_phone[['Quantity_Sold']]

y = df_phone['Revenue']

model = LinearRegression()

model.fit(X, y)

# Make predictions

y_pred = model.predict(X)

# Evaluate the model

print('R-squared:', r2_score(y, y_pred))

### Exercise 4: Advanced

Try different machine learning algorithms to predict revenue. Compare their performances.

**Solution**

*Note: The code for different algorithms may vary. The focus here is on showing how to use multiple algorithms and compare their performances.*

`from sklearn.ensemble import RandomForestRegressor`

from sklearn.svm import SVR

# Random Forest

rf_model = RandomForestRegressor()

rf_model.fit(X, y)

rf_y_pred = rf_model.predict(X)

print('Random Forest R-squared:', r2_score(y, rf_y_pred))

# Support Vector Regression

svr_model = SVR()

svr_model.fit(X, y)

svr_y_pred = svr_model.predict(X)

print('SVR R-squared:', r2_score(y, svr_y_pred))

## 16.4 Practical Exercises: Sales Data Analysis

### Exercise 1: Data Exploration

- Load the 'sales_data.csv' file into a DataFrame.
- Display the first 5 rows of the DataFrame.
- Provide basic statistics for each numeric column.

**Solution**

`import pandas as pd`

# Load the data

df = pd.read_csv('sales_data.csv')

# Display the first 5 rows

print(df.head())

# Basic statistics

print(df.describe())

Download here the sales_data.csv file

### Exercise 2: Data Visualization

- Create a bar chart showing the total revenue generated by each product.
- Create a pie chart representing the percentage of total sales for each product.

**Solution**

`import matplotlib.pyplot as plt`

# Bar chart

df_grouped = df.groupby('Product_Name')['Revenue'].sum()

plt.bar(df_grouped.index, df_grouped.values)

plt.xlabel('Product Name')

plt.ylabel('Total Revenue')

plt.title('Total Revenue by Product')

plt.show()

# Pie chart

plt.pie(df_grouped, labels=df_grouped.index, autopct='%1.1f%%')

plt.title('Percentage of Total Sales by Product')

plt.show()

### Exercise 3: Simple Predictive Modeling

- Evaluate the model using R-squared.

**Solution**

`from sklearn.linear_model import LinearRegression`

from sklearn.metrics import r2_score

# Filter the data for the product "Phone"

df_phone = df[df['Product_Name'] == 'Phone']

# Train the model

X = df_phone[['Quantity_Sold']]

y = df_phone['Revenue']

model = LinearRegression()

model.fit(X, y)

# Make predictions

y_pred = model.predict(X)

# Evaluate the model

print('R-squared:', r2_score(y, y_pred))

### Exercise 4: Advanced

Try different machine learning algorithms to predict revenue. Compare their performances.

**Solution**

`from sklearn.ensemble import RandomForestRegressor`

from sklearn.svm import SVR

# Random Forest

rf_model = RandomForestRegressor()

rf_model.fit(X, y)

rf_y_pred = rf_model.predict(X)

print('Random Forest R-squared:', r2_score(y, rf_y_pred))

# Support Vector Regression

svr_model = SVR()

svr_model.fit(X, y)

svr_y_pred = svr_model.predict(X)

print('SVR R-squared:', r2_score(y, svr_y_pred))