Code icon

The App is Under a Quick Maintenance

We apologize for the inconvenience. Please come back later

Menu iconMenu iconFeature Engineering for Modern Machine Learning with Scikit-Learn
Feature Engineering for Modern Machine Learning with Scikit-Learn

Chapter 2: Feature Engineering for Predictive Modelscsv

2.3 Practical Exercises for Chapter 2

These exercises will help you practice feature engineering techniques specifically for classification and regression models. Each exercise comes with a solution that includes code for guidance.

Exercise 1: Calculate Recency for Each Customer

In a retail dataset, calculate the Recency feature for each customer, which represents the number of days since their last purchase. Use this feature to predict customer engagement.

  1. Load the dataset.
  2. Convert the PurchaseDate column to datetime.
  3. Calculate Recency as the number of days since the most recent purchase.
import pandas as pd

# Sample retail data
data = {'CustomerID': [1, 2, 1, 3, 2],
        'PurchaseDate': ['2023-07-01', '2023-07-10', '2023-07-15', '2023-07-20', '2023-08-01']}
df = pd.DataFrame(data)

# Solution: Calculate Recency
df['PurchaseDate'] = pd.to_datetime(df['PurchaseDate'])
most_recent_date = df['PurchaseDate'].max()
df['Recency'] = (most_recent_date - df['PurchaseDate']).dt.days

# Get the minimum recency for each customer
recency_df = df.groupby('CustomerID')['Recency'].min().reset_index()

print("\\nData with Recency Feature:")
print(recency_df)

In this solution:

Recency is calculated as the number of days since the last purchase for each customer, indicating recent engagement.

Exercise 2: Calculate Average Purchase Value (Monetary Value)

Calculate the Average Purchase Value for each customer, indicating their typical spending behavior. This is a key feature for predicting customer lifetime value (CLTV).

  1. Load the dataset.
  2. Group by CustomerID and calculate the average Total Spend.
# Sample retail data with Total Spend
data = {'CustomerID': [1, 2, 1, 3, 2],
        'Total Spend': [200, 150, 300, 250, 400]}
df = pd.DataFrame(data)

# Solution: Calculate Average Purchase Value
monetary_value_df = df.groupby('CustomerID')['Total Spend'].mean().reset_index()
monetary_value_df.rename(columns={'Total Spend': 'AvgPurchaseValue'}, inplace=True)

print("\\nData with Average Purchase Value Feature:")
print(monetary_value_df)

In this solution:

AvgPurchaseValue represents each customer’s average transaction value, providing insight into their spending habits.

Exercise 3: Calculate Purchase Frequency for Each Customer

Calculate Purchase Frequency for each customer, indicating how often they make purchases. High purchase frequency often correlates with high engagement and loyalty.

  1. Load the dataset.
  2. Group by CustomerID and count the number of transactions.
# Sample retail data with Purchase Frequency
data = {'CustomerID': [1, 2, 1, 3, 2, 3, 1],
        'PurchaseDate': ['2023-07-01', '2023-07-10', '2023-07-15', '2023-07-20', '2023-08-01', '2023-08-05', '2023-08-10']}
df = pd.DataFrame(data)

# Solution: Calculate Purchase Frequency
frequency_df = df.groupby('CustomerID').size().reset_index(name='Frequency')

print("\\nData with Frequency Feature:")
print(frequency_df)

In this solution:

Frequency is calculated as the number of transactions per CustomerID, showing how often each customer engages with the service.

Exercise 4: Calculate Purchase Trend Using Spending Data

Calculate Purchase Trend to capture changes in customer spending over time. For each customer, use their monthly spending trend to determine if their spending is increasing, decreasing, or stable.

  1. Load the dataset.
  2. Convert PurchaseDate to month and group by CustomerID and Month.
  3. Calculate the slope of spending over time for each customer.
import numpy as np

# Sample retail data with PurchaseDate and Total Spend
data = {'CustomerID': [1, 1, 1, 2, 2, 3, 3],
        'PurchaseDate': ['2023-07-01', '2023-08-01', '2023-09-01', '2023-07-01', '2023-08-01', '2023-07-01', '2023-08-01'],
        'Total Spend': [200, 250, 300, 400, 350, 150, 100]}
df = pd.DataFrame(data)
df['PurchaseDate'] = pd.to_datetime(df['PurchaseDate'])
df['PurchaseMonth'] = df['PurchaseDate'].dt.to_period('M')

# Calculate monthly spending and slope
monthly_spend = df.groupby(['CustomerID', 'PurchaseMonth'])['Total Spend'].sum().reset_index()

# Function to calculate trend slope
def calculate_trend(customer_df):
    x = np.arange(len(customer_df))
    y = customer_df['Total Spend'].values
    if len(x) > 1:
        return np.polyfit(x, y, 1)[0]  # Linear trend slope
    return 0

# Apply trend calculation
trend_df = monthly_spend.groupby('CustomerID').apply(calculate_trend).reset_index(name='PurchaseTrend')

print("\\nData with Purchase Trend Feature:")
print(trend_df)

In this solution:

Purchase Trend captures the slope of spending over time, revealing whether a customer’s spending is rising, falling, or stable.

Exercise 5: Build a Logistic Regression Model Using Engineered Features

Using features such as RecencyFrequency, and Monetary Value, train a Logistic Regression model to predict churn.

  1. Calculate each feature (recency, frequency, monetary value).
  2. Use these features to train a logistic regression model.
  3. Evaluate model performance with accuracy.
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# Sample engineered data with churn label
data = {'CustomerID': [1, 2, 3, 4, 5],
        'Recency': [10, 30, 5, 40, 15],
        'Frequency': [5, 2, 7, 1, 3],
        'AvgPurchaseValue': [200, 150, 250, 100, 300],
        'Churn': [0, 1, 0, 1, 0]}  # 0: Not Churned, 1: Churned
df = pd.DataFrame(data)

# Define features and target
X = df[['Recency', 'Frequency', 'AvgPurchaseValue']]
y = df['Churn']

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train logistic regression model
log_reg = LogisticRegression()
log_reg.fit(X_train, y_train)

# Predictions and evaluation
y_pred = log_reg.predict(X_test)
print("Model Accuracy:", accuracy_score(y_test, y_pred))

In this solution:

  • RecencyFrequency, and AvgPurchaseValue are used as features in a logistic regression model to predict churn.
  • Model accuracy is calculated to assess performance.

These exercises cover essential feature engineering steps, from calculating customer engagement metrics like Recency and Frequency to implementing models that leverage engineered features. By working through these exercises, you’ll gain hands-on experience in building and evaluating features for predictive modeling.

2.3 Practical Exercises for Chapter 2

These exercises will help you practice feature engineering techniques specifically for classification and regression models. Each exercise comes with a solution that includes code for guidance.

Exercise 1: Calculate Recency for Each Customer

In a retail dataset, calculate the Recency feature for each customer, which represents the number of days since their last purchase. Use this feature to predict customer engagement.

  1. Load the dataset.
  2. Convert the PurchaseDate column to datetime.
  3. Calculate Recency as the number of days since the most recent purchase.
import pandas as pd

# Sample retail data
data = {'CustomerID': [1, 2, 1, 3, 2],
        'PurchaseDate': ['2023-07-01', '2023-07-10', '2023-07-15', '2023-07-20', '2023-08-01']}
df = pd.DataFrame(data)

# Solution: Calculate Recency
df['PurchaseDate'] = pd.to_datetime(df['PurchaseDate'])
most_recent_date = df['PurchaseDate'].max()
df['Recency'] = (most_recent_date - df['PurchaseDate']).dt.days

# Get the minimum recency for each customer
recency_df = df.groupby('CustomerID')['Recency'].min().reset_index()

print("\\nData with Recency Feature:")
print(recency_df)

In this solution:

Recency is calculated as the number of days since the last purchase for each customer, indicating recent engagement.

Exercise 2: Calculate Average Purchase Value (Monetary Value)

Calculate the Average Purchase Value for each customer, indicating their typical spending behavior. This is a key feature for predicting customer lifetime value (CLTV).

  1. Load the dataset.
  2. Group by CustomerID and calculate the average Total Spend.
# Sample retail data with Total Spend
data = {'CustomerID': [1, 2, 1, 3, 2],
        'Total Spend': [200, 150, 300, 250, 400]}
df = pd.DataFrame(data)

# Solution: Calculate Average Purchase Value
monetary_value_df = df.groupby('CustomerID')['Total Spend'].mean().reset_index()
monetary_value_df.rename(columns={'Total Spend': 'AvgPurchaseValue'}, inplace=True)

print("\\nData with Average Purchase Value Feature:")
print(monetary_value_df)

In this solution:

AvgPurchaseValue represents each customer’s average transaction value, providing insight into their spending habits.

Exercise 3: Calculate Purchase Frequency for Each Customer

Calculate Purchase Frequency for each customer, indicating how often they make purchases. High purchase frequency often correlates with high engagement and loyalty.

  1. Load the dataset.
  2. Group by CustomerID and count the number of transactions.
# Sample retail data with Purchase Frequency
data = {'CustomerID': [1, 2, 1, 3, 2, 3, 1],
        'PurchaseDate': ['2023-07-01', '2023-07-10', '2023-07-15', '2023-07-20', '2023-08-01', '2023-08-05', '2023-08-10']}
df = pd.DataFrame(data)

# Solution: Calculate Purchase Frequency
frequency_df = df.groupby('CustomerID').size().reset_index(name='Frequency')

print("\\nData with Frequency Feature:")
print(frequency_df)

In this solution:

Frequency is calculated as the number of transactions per CustomerID, showing how often each customer engages with the service.

Exercise 4: Calculate Purchase Trend Using Spending Data

Calculate Purchase Trend to capture changes in customer spending over time. For each customer, use their monthly spending trend to determine if their spending is increasing, decreasing, or stable.

  1. Load the dataset.
  2. Convert PurchaseDate to month and group by CustomerID and Month.
  3. Calculate the slope of spending over time for each customer.
import numpy as np

# Sample retail data with PurchaseDate and Total Spend
data = {'CustomerID': [1, 1, 1, 2, 2, 3, 3],
        'PurchaseDate': ['2023-07-01', '2023-08-01', '2023-09-01', '2023-07-01', '2023-08-01', '2023-07-01', '2023-08-01'],
        'Total Spend': [200, 250, 300, 400, 350, 150, 100]}
df = pd.DataFrame(data)
df['PurchaseDate'] = pd.to_datetime(df['PurchaseDate'])
df['PurchaseMonth'] = df['PurchaseDate'].dt.to_period('M')

# Calculate monthly spending and slope
monthly_spend = df.groupby(['CustomerID', 'PurchaseMonth'])['Total Spend'].sum().reset_index()

# Function to calculate trend slope
def calculate_trend(customer_df):
    x = np.arange(len(customer_df))
    y = customer_df['Total Spend'].values
    if len(x) > 1:
        return np.polyfit(x, y, 1)[0]  # Linear trend slope
    return 0

# Apply trend calculation
trend_df = monthly_spend.groupby('CustomerID').apply(calculate_trend).reset_index(name='PurchaseTrend')

print("\\nData with Purchase Trend Feature:")
print(trend_df)

In this solution:

Purchase Trend captures the slope of spending over time, revealing whether a customer’s spending is rising, falling, or stable.

Exercise 5: Build a Logistic Regression Model Using Engineered Features

Using features such as RecencyFrequency, and Monetary Value, train a Logistic Regression model to predict churn.

  1. Calculate each feature (recency, frequency, monetary value).
  2. Use these features to train a logistic regression model.
  3. Evaluate model performance with accuracy.
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# Sample engineered data with churn label
data = {'CustomerID': [1, 2, 3, 4, 5],
        'Recency': [10, 30, 5, 40, 15],
        'Frequency': [5, 2, 7, 1, 3],
        'AvgPurchaseValue': [200, 150, 250, 100, 300],
        'Churn': [0, 1, 0, 1, 0]}  # 0: Not Churned, 1: Churned
df = pd.DataFrame(data)

# Define features and target
X = df[['Recency', 'Frequency', 'AvgPurchaseValue']]
y = df['Churn']

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train logistic regression model
log_reg = LogisticRegression()
log_reg.fit(X_train, y_train)

# Predictions and evaluation
y_pred = log_reg.predict(X_test)
print("Model Accuracy:", accuracy_score(y_test, y_pred))

In this solution:

  • RecencyFrequency, and AvgPurchaseValue are used as features in a logistic regression model to predict churn.
  • Model accuracy is calculated to assess performance.

These exercises cover essential feature engineering steps, from calculating customer engagement metrics like Recency and Frequency to implementing models that leverage engineered features. By working through these exercises, you’ll gain hands-on experience in building and evaluating features for predictive modeling.

2.3 Practical Exercises for Chapter 2

These exercises will help you practice feature engineering techniques specifically for classification and regression models. Each exercise comes with a solution that includes code for guidance.

Exercise 1: Calculate Recency for Each Customer

In a retail dataset, calculate the Recency feature for each customer, which represents the number of days since their last purchase. Use this feature to predict customer engagement.

  1. Load the dataset.
  2. Convert the PurchaseDate column to datetime.
  3. Calculate Recency as the number of days since the most recent purchase.
import pandas as pd

# Sample retail data
data = {'CustomerID': [1, 2, 1, 3, 2],
        'PurchaseDate': ['2023-07-01', '2023-07-10', '2023-07-15', '2023-07-20', '2023-08-01']}
df = pd.DataFrame(data)

# Solution: Calculate Recency
df['PurchaseDate'] = pd.to_datetime(df['PurchaseDate'])
most_recent_date = df['PurchaseDate'].max()
df['Recency'] = (most_recent_date - df['PurchaseDate']).dt.days

# Get the minimum recency for each customer
recency_df = df.groupby('CustomerID')['Recency'].min().reset_index()

print("\\nData with Recency Feature:")
print(recency_df)

In this solution:

Recency is calculated as the number of days since the last purchase for each customer, indicating recent engagement.

Exercise 2: Calculate Average Purchase Value (Monetary Value)

Calculate the Average Purchase Value for each customer, indicating their typical spending behavior. This is a key feature for predicting customer lifetime value (CLTV).

  1. Load the dataset.
  2. Group by CustomerID and calculate the average Total Spend.
# Sample retail data with Total Spend
data = {'CustomerID': [1, 2, 1, 3, 2],
        'Total Spend': [200, 150, 300, 250, 400]}
df = pd.DataFrame(data)

# Solution: Calculate Average Purchase Value
monetary_value_df = df.groupby('CustomerID')['Total Spend'].mean().reset_index()
monetary_value_df.rename(columns={'Total Spend': 'AvgPurchaseValue'}, inplace=True)

print("\\nData with Average Purchase Value Feature:")
print(monetary_value_df)

In this solution:

AvgPurchaseValue represents each customer’s average transaction value, providing insight into their spending habits.

Exercise 3: Calculate Purchase Frequency for Each Customer

Calculate Purchase Frequency for each customer, indicating how often they make purchases. High purchase frequency often correlates with high engagement and loyalty.

  1. Load the dataset.
  2. Group by CustomerID and count the number of transactions.
# Sample retail data with Purchase Frequency
data = {'CustomerID': [1, 2, 1, 3, 2, 3, 1],
        'PurchaseDate': ['2023-07-01', '2023-07-10', '2023-07-15', '2023-07-20', '2023-08-01', '2023-08-05', '2023-08-10']}
df = pd.DataFrame(data)

# Solution: Calculate Purchase Frequency
frequency_df = df.groupby('CustomerID').size().reset_index(name='Frequency')

print("\\nData with Frequency Feature:")
print(frequency_df)

In this solution:

Frequency is calculated as the number of transactions per CustomerID, showing how often each customer engages with the service.

Exercise 4: Calculate Purchase Trend Using Spending Data

Calculate Purchase Trend to capture changes in customer spending over time. For each customer, use their monthly spending trend to determine if their spending is increasing, decreasing, or stable.

  1. Load the dataset.
  2. Convert PurchaseDate to month and group by CustomerID and Month.
  3. Calculate the slope of spending over time for each customer.
import numpy as np

# Sample retail data with PurchaseDate and Total Spend
data = {'CustomerID': [1, 1, 1, 2, 2, 3, 3],
        'PurchaseDate': ['2023-07-01', '2023-08-01', '2023-09-01', '2023-07-01', '2023-08-01', '2023-07-01', '2023-08-01'],
        'Total Spend': [200, 250, 300, 400, 350, 150, 100]}
df = pd.DataFrame(data)
df['PurchaseDate'] = pd.to_datetime(df['PurchaseDate'])
df['PurchaseMonth'] = df['PurchaseDate'].dt.to_period('M')

# Calculate monthly spending and slope
monthly_spend = df.groupby(['CustomerID', 'PurchaseMonth'])['Total Spend'].sum().reset_index()

# Function to calculate trend slope
def calculate_trend(customer_df):
    x = np.arange(len(customer_df))
    y = customer_df['Total Spend'].values
    if len(x) > 1:
        return np.polyfit(x, y, 1)[0]  # Linear trend slope
    return 0

# Apply trend calculation
trend_df = monthly_spend.groupby('CustomerID').apply(calculate_trend).reset_index(name='PurchaseTrend')

print("\\nData with Purchase Trend Feature:")
print(trend_df)

In this solution:

Purchase Trend captures the slope of spending over time, revealing whether a customer’s spending is rising, falling, or stable.

Exercise 5: Build a Logistic Regression Model Using Engineered Features

Using features such as RecencyFrequency, and Monetary Value, train a Logistic Regression model to predict churn.

  1. Calculate each feature (recency, frequency, monetary value).
  2. Use these features to train a logistic regression model.
  3. Evaluate model performance with accuracy.
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# Sample engineered data with churn label
data = {'CustomerID': [1, 2, 3, 4, 5],
        'Recency': [10, 30, 5, 40, 15],
        'Frequency': [5, 2, 7, 1, 3],
        'AvgPurchaseValue': [200, 150, 250, 100, 300],
        'Churn': [0, 1, 0, 1, 0]}  # 0: Not Churned, 1: Churned
df = pd.DataFrame(data)

# Define features and target
X = df[['Recency', 'Frequency', 'AvgPurchaseValue']]
y = df['Churn']

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train logistic regression model
log_reg = LogisticRegression()
log_reg.fit(X_train, y_train)

# Predictions and evaluation
y_pred = log_reg.predict(X_test)
print("Model Accuracy:", accuracy_score(y_test, y_pred))

In this solution:

  • RecencyFrequency, and AvgPurchaseValue are used as features in a logistic regression model to predict churn.
  • Model accuracy is calculated to assess performance.

These exercises cover essential feature engineering steps, from calculating customer engagement metrics like Recency and Frequency to implementing models that leverage engineered features. By working through these exercises, you’ll gain hands-on experience in building and evaluating features for predictive modeling.

2.3 Practical Exercises for Chapter 2

These exercises will help you practice feature engineering techniques specifically for classification and regression models. Each exercise comes with a solution that includes code for guidance.

Exercise 1: Calculate Recency for Each Customer

In a retail dataset, calculate the Recency feature for each customer, which represents the number of days since their last purchase. Use this feature to predict customer engagement.

  1. Load the dataset.
  2. Convert the PurchaseDate column to datetime.
  3. Calculate Recency as the number of days since the most recent purchase.
import pandas as pd

# Sample retail data
data = {'CustomerID': [1, 2, 1, 3, 2],
        'PurchaseDate': ['2023-07-01', '2023-07-10', '2023-07-15', '2023-07-20', '2023-08-01']}
df = pd.DataFrame(data)

# Solution: Calculate Recency
df['PurchaseDate'] = pd.to_datetime(df['PurchaseDate'])
most_recent_date = df['PurchaseDate'].max()
df['Recency'] = (most_recent_date - df['PurchaseDate']).dt.days

# Get the minimum recency for each customer
recency_df = df.groupby('CustomerID')['Recency'].min().reset_index()

print("\\nData with Recency Feature:")
print(recency_df)

In this solution:

Recency is calculated as the number of days since the last purchase for each customer, indicating recent engagement.

Exercise 2: Calculate Average Purchase Value (Monetary Value)

Calculate the Average Purchase Value for each customer, indicating their typical spending behavior. This is a key feature for predicting customer lifetime value (CLTV).

  1. Load the dataset.
  2. Group by CustomerID and calculate the average Total Spend.
# Sample retail data with Total Spend
data = {'CustomerID': [1, 2, 1, 3, 2],
        'Total Spend': [200, 150, 300, 250, 400]}
df = pd.DataFrame(data)

# Solution: Calculate Average Purchase Value
monetary_value_df = df.groupby('CustomerID')['Total Spend'].mean().reset_index()
monetary_value_df.rename(columns={'Total Spend': 'AvgPurchaseValue'}, inplace=True)

print("\\nData with Average Purchase Value Feature:")
print(monetary_value_df)

In this solution:

AvgPurchaseValue represents each customer’s average transaction value, providing insight into their spending habits.

Exercise 3: Calculate Purchase Frequency for Each Customer

Calculate Purchase Frequency for each customer, indicating how often they make purchases. High purchase frequency often correlates with high engagement and loyalty.

  1. Load the dataset.
  2. Group by CustomerID and count the number of transactions.
# Sample retail data with Purchase Frequency
data = {'CustomerID': [1, 2, 1, 3, 2, 3, 1],
        'PurchaseDate': ['2023-07-01', '2023-07-10', '2023-07-15', '2023-07-20', '2023-08-01', '2023-08-05', '2023-08-10']}
df = pd.DataFrame(data)

# Solution: Calculate Purchase Frequency
frequency_df = df.groupby('CustomerID').size().reset_index(name='Frequency')

print("\\nData with Frequency Feature:")
print(frequency_df)

In this solution:

Frequency is calculated as the number of transactions per CustomerID, showing how often each customer engages with the service.

Exercise 4: Calculate Purchase Trend Using Spending Data

Calculate Purchase Trend to capture changes in customer spending over time. For each customer, use their monthly spending trend to determine if their spending is increasing, decreasing, or stable.

  1. Load the dataset.
  2. Convert PurchaseDate to month and group by CustomerID and Month.
  3. Calculate the slope of spending over time for each customer.
import numpy as np

# Sample retail data with PurchaseDate and Total Spend
data = {'CustomerID': [1, 1, 1, 2, 2, 3, 3],
        'PurchaseDate': ['2023-07-01', '2023-08-01', '2023-09-01', '2023-07-01', '2023-08-01', '2023-07-01', '2023-08-01'],
        'Total Spend': [200, 250, 300, 400, 350, 150, 100]}
df = pd.DataFrame(data)
df['PurchaseDate'] = pd.to_datetime(df['PurchaseDate'])
df['PurchaseMonth'] = df['PurchaseDate'].dt.to_period('M')

# Calculate monthly spending and slope
monthly_spend = df.groupby(['CustomerID', 'PurchaseMonth'])['Total Spend'].sum().reset_index()

# Function to calculate trend slope
def calculate_trend(customer_df):
    x = np.arange(len(customer_df))
    y = customer_df['Total Spend'].values
    if len(x) > 1:
        return np.polyfit(x, y, 1)[0]  # Linear trend slope
    return 0

# Apply trend calculation
trend_df = monthly_spend.groupby('CustomerID').apply(calculate_trend).reset_index(name='PurchaseTrend')

print("\\nData with Purchase Trend Feature:")
print(trend_df)

In this solution:

Purchase Trend captures the slope of spending over time, revealing whether a customer’s spending is rising, falling, or stable.

Exercise 5: Build a Logistic Regression Model Using Engineered Features

Using features such as RecencyFrequency, and Monetary Value, train a Logistic Regression model to predict churn.

  1. Calculate each feature (recency, frequency, monetary value).
  2. Use these features to train a logistic regression model.
  3. Evaluate model performance with accuracy.
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# Sample engineered data with churn label
data = {'CustomerID': [1, 2, 3, 4, 5],
        'Recency': [10, 30, 5, 40, 15],
        'Frequency': [5, 2, 7, 1, 3],
        'AvgPurchaseValue': [200, 150, 250, 100, 300],
        'Churn': [0, 1, 0, 1, 0]}  # 0: Not Churned, 1: Churned
df = pd.DataFrame(data)

# Define features and target
X = df[['Recency', 'Frequency', 'AvgPurchaseValue']]
y = df['Churn']

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train logistic regression model
log_reg = LogisticRegression()
log_reg.fit(X_train, y_train)

# Predictions and evaluation
y_pred = log_reg.predict(X_test)
print("Model Accuracy:", accuracy_score(y_test, y_pred))

In this solution:

  • RecencyFrequency, and AvgPurchaseValue are used as features in a logistic regression model to predict churn.
  • Model accuracy is calculated to assess performance.

These exercises cover essential feature engineering steps, from calculating customer engagement metrics like Recency and Frequency to implementing models that leverage engineered features. By working through these exercises, you’ll gain hands-on experience in building and evaluating features for predictive modeling.