# Chapter 16: Case Study 1: Sales Data Analysis

## 16.3 Predictive Modeling

After understanding our data through EDA and visualization, the next sensible step is to make some predictions based on this understanding. Predictive modeling enables us to anticipate future trends and outcomes by using algorithms and statistical models. This is a crucial step in the data analysis process, as it allows us to make informed decisions and plan for the future.

By building a predictive model, we can gain insights into potential future sales trends and patterns. This can help us to identify areas for improvement, optimize our resources, and make informed business decisions. Predictive modeling is like a magic wand, but backed by data, and can provide us with valuable insights that we might not otherwise have access to.

In this section of our Sales Data Analysis case study, we'll delve deeper into the process of building a predictive model. We'll explore the different types of models that we can use, and the different algorithms and statistical models that underpin them. We'll also look at how we can evaluate the performance of our model, and how we can use it to make informed predictions about future sales.

So, are you ready to take your data analysis skills to the next level? Let's dive in and explore the fascinating world of predictive modeling!

### 16.3.1 Preprocessing for Predictive Modeling

Before we proceed with building a model, let's make sure our data is in the right format. We've already cleaned our data in the previous section, so we'll just check that the features we plan to use are appropriately scaled.

`from sklearn.preprocessing import StandardScaler`

# Create a new DataFrame for modeling

df_for_modeling = df_monthly_sales[['Quantity', 'TotalSales']]

# Scaling the features

scaler = StandardScaler()

df_scaled = scaler.fit_transform(df_for_modeling)

### 16.3.2 Model Selection and Training

For our sales data, we'll use a simple linear regression model to predict `TotalSales`

based on `Quantity`

.

`from sklearn.model_selection import train_test_split`

from sklearn.linear_model import LinearRegression

# Splitting the data into training and test sets

X = df_scaled[:, 0].reshape(-1, 1)

y = df_scaled[:, 1]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize and train the model

model = LinearRegression()

model.fit(X_train, y_train)

### 16.3.3 Model Evaluation

Let's assess how well our model performs using metrics like RMSE and R-squared.

`from sklearn.metrics import mean_squared_error, r2_score`

# Making predictions

y_pred = model.predict(X_test)

# Calculate the performance metrics

rmse = np.sqrt(mean_squared_error(y_test, y_pred))

r2 = r2_score(y_test, y_pred)

print(f'RMSE: {rmse}')

print(f'R-squared: {r2}')

### 16.3.4 Making Future Predictions

Now that our model is trained and evaluated, let's make some future sales predictions.

`# Make future predictions`

future_quantity = np.array([1200, 1400, 1600]).reshape(-1, 1)

future_quantity_scaled = scaler.transform(future_quantity)

future_sales_scaled = model.predict(future_quantity_scaled)

# Inverse transform to get actual sales values

future_sales = scaler.inverse_transform(np.column_stack((future_quantity, future_sales_scaled)))[:, 1]

print(f"Predicted Future Sales: {future_sales}")

And voilà! You now have a predictive model for your sales data, ready to guide you in your future endeavors. Isn’t that exciting?

Feel empowered, because understanding the past and present through EDA, and peeking into the future with predictive modeling, can be the keys to your business success!

## 16.3 Predictive Modeling

After understanding our data through EDA and visualization, the next sensible step is to make some predictions based on this understanding. Predictive modeling enables us to anticipate future trends and outcomes by using algorithms and statistical models. This is a crucial step in the data analysis process, as it allows us to make informed decisions and plan for the future.

By building a predictive model, we can gain insights into potential future sales trends and patterns. This can help us to identify areas for improvement, optimize our resources, and make informed business decisions. Predictive modeling is like a magic wand, but backed by data, and can provide us with valuable insights that we might not otherwise have access to.

In this section of our Sales Data Analysis case study, we'll delve deeper into the process of building a predictive model. We'll explore the different types of models that we can use, and the different algorithms and statistical models that underpin them. We'll also look at how we can evaluate the performance of our model, and how we can use it to make informed predictions about future sales.

So, are you ready to take your data analysis skills to the next level? Let's dive in and explore the fascinating world of predictive modeling!

### 16.3.1 Preprocessing for Predictive Modeling

Before we proceed with building a model, let's make sure our data is in the right format. We've already cleaned our data in the previous section, so we'll just check that the features we plan to use are appropriately scaled.

`from sklearn.preprocessing import StandardScaler`

# Create a new DataFrame for modeling

df_for_modeling = df_monthly_sales[['Quantity', 'TotalSales']]

# Scaling the features

scaler = StandardScaler()

df_scaled = scaler.fit_transform(df_for_modeling)

### 16.3.2 Model Selection and Training

For our sales data, we'll use a simple linear regression model to predict `TotalSales`

based on `Quantity`

.

`from sklearn.model_selection import train_test_split`

from sklearn.linear_model import LinearRegression

# Splitting the data into training and test sets

X = df_scaled[:, 0].reshape(-1, 1)

y = df_scaled[:, 1]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize and train the model

model = LinearRegression()

model.fit(X_train, y_train)

### 16.3.3 Model Evaluation

Let's assess how well our model performs using metrics like RMSE and R-squared.

`from sklearn.metrics import mean_squared_error, r2_score`

# Making predictions

y_pred = model.predict(X_test)

# Calculate the performance metrics

rmse = np.sqrt(mean_squared_error(y_test, y_pred))

r2 = r2_score(y_test, y_pred)

print(f'RMSE: {rmse}')

print(f'R-squared: {r2}')

### 16.3.4 Making Future Predictions

Now that our model is trained and evaluated, let's make some future sales predictions.

`# Make future predictions`

future_quantity = np.array([1200, 1400, 1600]).reshape(-1, 1)

future_quantity_scaled = scaler.transform(future_quantity)

future_sales_scaled = model.predict(future_quantity_scaled)

# Inverse transform to get actual sales values

future_sales = scaler.inverse_transform(np.column_stack((future_quantity, future_sales_scaled)))[:, 1]

print(f"Predicted Future Sales: {future_sales}")

And voilà! You now have a predictive model for your sales data, ready to guide you in your future endeavors. Isn’t that exciting?

Feel empowered, because understanding the past and present through EDA, and peeking into the future with predictive modeling, can be the keys to your business success!

## 16.3 Predictive Modeling

After understanding our data through EDA and visualization, the next sensible step is to make some predictions based on this understanding. Predictive modeling enables us to anticipate future trends and outcomes by using algorithms and statistical models. This is a crucial step in the data analysis process, as it allows us to make informed decisions and plan for the future.

By building a predictive model, we can gain insights into potential future sales trends and patterns. This can help us to identify areas for improvement, optimize our resources, and make informed business decisions. Predictive modeling is like a magic wand, but backed by data, and can provide us with valuable insights that we might not otherwise have access to.

In this section of our Sales Data Analysis case study, we'll delve deeper into the process of building a predictive model. We'll explore the different types of models that we can use, and the different algorithms and statistical models that underpin them. We'll also look at how we can evaluate the performance of our model, and how we can use it to make informed predictions about future sales.

So, are you ready to take your data analysis skills to the next level? Let's dive in and explore the fascinating world of predictive modeling!

### 16.3.1 Preprocessing for Predictive Modeling

Before we proceed with building a model, let's make sure our data is in the right format. We've already cleaned our data in the previous section, so we'll just check that the features we plan to use are appropriately scaled.

`from sklearn.preprocessing import StandardScaler`

# Create a new DataFrame for modeling

df_for_modeling = df_monthly_sales[['Quantity', 'TotalSales']]

# Scaling the features

scaler = StandardScaler()

df_scaled = scaler.fit_transform(df_for_modeling)

### 16.3.2 Model Selection and Training

For our sales data, we'll use a simple linear regression model to predict `TotalSales`

based on `Quantity`

.

`from sklearn.model_selection import train_test_split`

from sklearn.linear_model import LinearRegression

# Splitting the data into training and test sets

X = df_scaled[:, 0].reshape(-1, 1)

y = df_scaled[:, 1]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize and train the model

model = LinearRegression()

model.fit(X_train, y_train)

### 16.3.3 Model Evaluation

Let's assess how well our model performs using metrics like RMSE and R-squared.

`from sklearn.metrics import mean_squared_error, r2_score`

# Making predictions

y_pred = model.predict(X_test)

# Calculate the performance metrics

rmse = np.sqrt(mean_squared_error(y_test, y_pred))

r2 = r2_score(y_test, y_pred)

print(f'RMSE: {rmse}')

print(f'R-squared: {r2}')

### 16.3.4 Making Future Predictions

Now that our model is trained and evaluated, let's make some future sales predictions.

`# Make future predictions`

future_quantity = np.array([1200, 1400, 1600]).reshape(-1, 1)

future_quantity_scaled = scaler.transform(future_quantity)

future_sales_scaled = model.predict(future_quantity_scaled)

# Inverse transform to get actual sales values

future_sales = scaler.inverse_transform(np.column_stack((future_quantity, future_sales_scaled)))[:, 1]

print(f"Predicted Future Sales: {future_sales}")

And voilà! You now have a predictive model for your sales data, ready to guide you in your future endeavors. Isn’t that exciting?

Feel empowered, because understanding the past and present through EDA, and peeking into the future with predictive modeling, can be the keys to your business success!

## 16.3 Predictive Modeling

### 16.3.1 Preprocessing for Predictive Modeling

`from sklearn.preprocessing import StandardScaler`

# Create a new DataFrame for modeling

df_for_modeling = df_monthly_sales[['Quantity', 'TotalSales']]

# Scaling the features

scaler = StandardScaler()

df_scaled = scaler.fit_transform(df_for_modeling)

### 16.3.2 Model Selection and Training

`TotalSales`

based on `Quantity`

.

`from sklearn.model_selection import train_test_split`

from sklearn.linear_model import LinearRegression

# Splitting the data into training and test sets

X = df_scaled[:, 0].reshape(-1, 1)

y = df_scaled[:, 1]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize and train the model

model = LinearRegression()

model.fit(X_train, y_train)

### 16.3.3 Model Evaluation

Let's assess how well our model performs using metrics like RMSE and R-squared.

`from sklearn.metrics import mean_squared_error, r2_score`

# Making predictions

y_pred = model.predict(X_test)

# Calculate the performance metrics

rmse = np.sqrt(mean_squared_error(y_test, y_pred))

r2 = r2_score(y_test, y_pred)

print(f'RMSE: {rmse}')

print(f'R-squared: {r2}')

### 16.3.4 Making Future Predictions

Now that our model is trained and evaluated, let's make some future sales predictions.

`# Make future predictions`

future_quantity = np.array([1200, 1400, 1600]).reshape(-1, 1)

future_quantity_scaled = scaler.transform(future_quantity)

future_sales_scaled = model.predict(future_quantity_scaled)

# Inverse transform to get actual sales values

future_sales = scaler.inverse_transform(np.column_stack((future_quantity, future_sales_scaled)))[:, 1]

print(f"Predicted Future Sales: {future_sales}")