Chapter 7: Feature Creation & Interaction Terms
7.3 Practical Exercises for Chapter 7
Now that we’ve explored the concepts of feature creation and interaction terms, it’s time to apply these techniques with some hands-on practical exercises. Each exercise is designed to help you practice creating new features, generating polynomial features, and constructing cross-features and interaction terms. Where necessary, solutions with code are provided.
Exercise 1: Creating a Logarithmic Feature
You are given a dataset with the feature Income, which has a skewed distribution. Your task is to:
Create a new feature, LogIncome, by applying a logarithmic transformation to the Income feature.
Solution:
import numpy as np
import pandas as pd
# Sample data
data = {'Income': [30000, 50000, 75000, 120000, 250000]}
df = pd.DataFrame(data)
# Apply a logarithmic transformation to create the LogIncome feature
df['LogIncome'] = np.log(df['Income'])
# View the original and new features
print(df)
Exercise 2: Extracting Date Features
You are working with a dataset containing a column SaleDate, which records the sale date of houses. Your task is to:
Extract three new features from the SaleDate column: YearSold, MonthSold, and DayOfWeekSold.
Solution:
# Sample data with a date column
data = {'SaleDate': ['2022-01-05', '2021-06-15', '2020-09-22', '2019-11-30']}
df = pd.DataFrame(data)
# Convert SaleDate to a datetime object
df['SaleDate'] = pd.to_datetime(df['SaleDate'])
# Extract new features: Year, Month, Day of the week
df['YearSold'] = df['SaleDate'].dt.year
df['MonthSold'] = df['SaleDate'].dt.month
df['DayOfWeekSold'] = df['SaleDate'].dt.dayofweek
# View the new features
print(df)
Exercise 3: Creating a Cross-feature
You are working with a dataset that contains the features HouseSize (in square feet) and NumBedrooms. Your task is to:
Create a new feature, PricePerBedroom, by dividing the HouseSize by NumBedrooms to normalize house sizes by the number of bedrooms.
Solution:
# Sample data
data = {'HouseSize': [2000, 2500, 3000, 3500, 4000],
'NumBedrooms': [3, 4, 4, 5, 6]}
df = pd.DataFrame(data)
# Create a new feature by dividing HouseSize by NumBedrooms
df['PricePerBedroom'] = df['HouseSize'] / df['NumBedrooms']
# View the new feature
print(df)
Exercise 4: Generating Polynomial Features
You are given a dataset with a single feature Age. Your task is to:
Create polynomial features of degree 2 (squared terms) and degree 3 (cubed terms) for the Age feature.
Solution:
from sklearn.preprocessing import PolynomialFeatures
# Sample data
data = {'Age': [25, 30, 35, 40, 45]}
df = pd.DataFrame(data)
# Initialize PolynomialFeatures object for degrees 2 and 3
poly = PolynomialFeatures(degree=3, include_bias=False)
# Generate polynomial features
polynomial_features = poly.fit_transform(df[['Age']])
# Create a DataFrame for the polynomial features
df_poly = pd.DataFrame(polynomial_features, columns=['Age', 'Age^2', 'Age^3'])
# View the polynomial features
print(df_poly)
Exercise 5: Creating Interaction Terms
You are working with a dataset that contains the features HousePrice, HouseSize, and YearBuilt. Your task is to:
Create three interaction terms: Price_Size_Interaction (HousePrice * HouseSize), Price_Year_Interaction (HousePrice * YearBuilt), and Size_Year_Interaction (HouseSize * YearBuilt).
Solution:
# Sample data
data = {'HousePrice': [300000, 500000, 700000],
'HouseSize': [1500, 2000, 2500],
'YearBuilt': [1990, 2000, 2010]}
df = pd.DataFrame(data)
# Create interaction terms
df['Price_Size_Interaction'] = df['HousePrice'] * df['HouseSize']
df['Price_Year_Interaction'] = df['HousePrice'] * df['YearBuilt']
df['Size_Year_Interaction'] = df['HouseSize'] * df['YearBuilt']
# View the interaction terms
print(df)
Exercise 6: Combining Polynomial and Interaction Features
You are working with the same dataset from Exercise 5, and now your task is to:
Create a polynomial interaction feature by squaring the Price_Size_Interaction term to capture higher-order effects.
Solution:
# Create a polynomial interaction feature by squaring the Price_Size_Interaction term
df['Price_Size_Interaction_Squared'] = df['Price_Size_Interaction'] ** 2
# View the polynomial interaction feature
print(df)
These exercises give you hands-on practice with feature creation and interaction terms, helping you understand how to generate new features and uncover hidden relationships in the data. By mastering these techniques, you can enhance your machine learning models' performance and better capture the complexity of the relationships in your datasets. Keep experimenting with different features and interactions to see how they impact your model!
7.3 Practical Exercises for Chapter 7
Now that we’ve explored the concepts of feature creation and interaction terms, it’s time to apply these techniques with some hands-on practical exercises. Each exercise is designed to help you practice creating new features, generating polynomial features, and constructing cross-features and interaction terms. Where necessary, solutions with code are provided.
Exercise 1: Creating a Logarithmic Feature
You are given a dataset with the feature Income, which has a skewed distribution. Your task is to:
Create a new feature, LogIncome, by applying a logarithmic transformation to the Income feature.
Solution:
import numpy as np
import pandas as pd
# Sample data
data = {'Income': [30000, 50000, 75000, 120000, 250000]}
df = pd.DataFrame(data)
# Apply a logarithmic transformation to create the LogIncome feature
df['LogIncome'] = np.log(df['Income'])
# View the original and new features
print(df)
Exercise 2: Extracting Date Features
You are working with a dataset containing a column SaleDate, which records the sale date of houses. Your task is to:
Extract three new features from the SaleDate column: YearSold, MonthSold, and DayOfWeekSold.
Solution:
# Sample data with a date column
data = {'SaleDate': ['2022-01-05', '2021-06-15', '2020-09-22', '2019-11-30']}
df = pd.DataFrame(data)
# Convert SaleDate to a datetime object
df['SaleDate'] = pd.to_datetime(df['SaleDate'])
# Extract new features: Year, Month, Day of the week
df['YearSold'] = df['SaleDate'].dt.year
df['MonthSold'] = df['SaleDate'].dt.month
df['DayOfWeekSold'] = df['SaleDate'].dt.dayofweek
# View the new features
print(df)
Exercise 3: Creating a Cross-feature
You are working with a dataset that contains the features HouseSize (in square feet) and NumBedrooms. Your task is to:
Create a new feature, PricePerBedroom, by dividing the HouseSize by NumBedrooms to normalize house sizes by the number of bedrooms.
Solution:
# Sample data
data = {'HouseSize': [2000, 2500, 3000, 3500, 4000],
'NumBedrooms': [3, 4, 4, 5, 6]}
df = pd.DataFrame(data)
# Create a new feature by dividing HouseSize by NumBedrooms
df['PricePerBedroom'] = df['HouseSize'] / df['NumBedrooms']
# View the new feature
print(df)
Exercise 4: Generating Polynomial Features
You are given a dataset with a single feature Age. Your task is to:
Create polynomial features of degree 2 (squared terms) and degree 3 (cubed terms) for the Age feature.
Solution:
from sklearn.preprocessing import PolynomialFeatures
# Sample data
data = {'Age': [25, 30, 35, 40, 45]}
df = pd.DataFrame(data)
# Initialize PolynomialFeatures object for degrees 2 and 3
poly = PolynomialFeatures(degree=3, include_bias=False)
# Generate polynomial features
polynomial_features = poly.fit_transform(df[['Age']])
# Create a DataFrame for the polynomial features
df_poly = pd.DataFrame(polynomial_features, columns=['Age', 'Age^2', 'Age^3'])
# View the polynomial features
print(df_poly)
Exercise 5: Creating Interaction Terms
You are working with a dataset that contains the features HousePrice, HouseSize, and YearBuilt. Your task is to:
Create three interaction terms: Price_Size_Interaction (HousePrice * HouseSize), Price_Year_Interaction (HousePrice * YearBuilt), and Size_Year_Interaction (HouseSize * YearBuilt).
Solution:
# Sample data
data = {'HousePrice': [300000, 500000, 700000],
'HouseSize': [1500, 2000, 2500],
'YearBuilt': [1990, 2000, 2010]}
df = pd.DataFrame(data)
# Create interaction terms
df['Price_Size_Interaction'] = df['HousePrice'] * df['HouseSize']
df['Price_Year_Interaction'] = df['HousePrice'] * df['YearBuilt']
df['Size_Year_Interaction'] = df['HouseSize'] * df['YearBuilt']
# View the interaction terms
print(df)
Exercise 6: Combining Polynomial and Interaction Features
You are working with the same dataset from Exercise 5, and now your task is to:
Create a polynomial interaction feature by squaring the Price_Size_Interaction term to capture higher-order effects.
Solution:
# Create a polynomial interaction feature by squaring the Price_Size_Interaction term
df['Price_Size_Interaction_Squared'] = df['Price_Size_Interaction'] ** 2
# View the polynomial interaction feature
print(df)
These exercises give you hands-on practice with feature creation and interaction terms, helping you understand how to generate new features and uncover hidden relationships in the data. By mastering these techniques, you can enhance your machine learning models' performance and better capture the complexity of the relationships in your datasets. Keep experimenting with different features and interactions to see how they impact your model!
7.3 Practical Exercises for Chapter 7
Now that we’ve explored the concepts of feature creation and interaction terms, it’s time to apply these techniques with some hands-on practical exercises. Each exercise is designed to help you practice creating new features, generating polynomial features, and constructing cross-features and interaction terms. Where necessary, solutions with code are provided.
Exercise 1: Creating a Logarithmic Feature
You are given a dataset with the feature Income, which has a skewed distribution. Your task is to:
Create a new feature, LogIncome, by applying a logarithmic transformation to the Income feature.
Solution:
import numpy as np
import pandas as pd
# Sample data
data = {'Income': [30000, 50000, 75000, 120000, 250000]}
df = pd.DataFrame(data)
# Apply a logarithmic transformation to create the LogIncome feature
df['LogIncome'] = np.log(df['Income'])
# View the original and new features
print(df)
Exercise 2: Extracting Date Features
You are working with a dataset containing a column SaleDate, which records the sale date of houses. Your task is to:
Extract three new features from the SaleDate column: YearSold, MonthSold, and DayOfWeekSold.
Solution:
# Sample data with a date column
data = {'SaleDate': ['2022-01-05', '2021-06-15', '2020-09-22', '2019-11-30']}
df = pd.DataFrame(data)
# Convert SaleDate to a datetime object
df['SaleDate'] = pd.to_datetime(df['SaleDate'])
# Extract new features: Year, Month, Day of the week
df['YearSold'] = df['SaleDate'].dt.year
df['MonthSold'] = df['SaleDate'].dt.month
df['DayOfWeekSold'] = df['SaleDate'].dt.dayofweek
# View the new features
print(df)
Exercise 3: Creating a Cross-feature
You are working with a dataset that contains the features HouseSize (in square feet) and NumBedrooms. Your task is to:
Create a new feature, PricePerBedroom, by dividing the HouseSize by NumBedrooms to normalize house sizes by the number of bedrooms.
Solution:
# Sample data
data = {'HouseSize': [2000, 2500, 3000, 3500, 4000],
'NumBedrooms': [3, 4, 4, 5, 6]}
df = pd.DataFrame(data)
# Create a new feature by dividing HouseSize by NumBedrooms
df['PricePerBedroom'] = df['HouseSize'] / df['NumBedrooms']
# View the new feature
print(df)
Exercise 4: Generating Polynomial Features
You are given a dataset with a single feature Age. Your task is to:
Create polynomial features of degree 2 (squared terms) and degree 3 (cubed terms) for the Age feature.
Solution:
from sklearn.preprocessing import PolynomialFeatures
# Sample data
data = {'Age': [25, 30, 35, 40, 45]}
df = pd.DataFrame(data)
# Initialize PolynomialFeatures object for degrees 2 and 3
poly = PolynomialFeatures(degree=3, include_bias=False)
# Generate polynomial features
polynomial_features = poly.fit_transform(df[['Age']])
# Create a DataFrame for the polynomial features
df_poly = pd.DataFrame(polynomial_features, columns=['Age', 'Age^2', 'Age^3'])
# View the polynomial features
print(df_poly)
Exercise 5: Creating Interaction Terms
You are working with a dataset that contains the features HousePrice, HouseSize, and YearBuilt. Your task is to:
Create three interaction terms: Price_Size_Interaction (HousePrice * HouseSize), Price_Year_Interaction (HousePrice * YearBuilt), and Size_Year_Interaction (HouseSize * YearBuilt).
Solution:
# Sample data
data = {'HousePrice': [300000, 500000, 700000],
'HouseSize': [1500, 2000, 2500],
'YearBuilt': [1990, 2000, 2010]}
df = pd.DataFrame(data)
# Create interaction terms
df['Price_Size_Interaction'] = df['HousePrice'] * df['HouseSize']
df['Price_Year_Interaction'] = df['HousePrice'] * df['YearBuilt']
df['Size_Year_Interaction'] = df['HouseSize'] * df['YearBuilt']
# View the interaction terms
print(df)
Exercise 6: Combining Polynomial and Interaction Features
You are working with the same dataset from Exercise 5, and now your task is to:
Create a polynomial interaction feature by squaring the Price_Size_Interaction term to capture higher-order effects.
Solution:
# Create a polynomial interaction feature by squaring the Price_Size_Interaction term
df['Price_Size_Interaction_Squared'] = df['Price_Size_Interaction'] ** 2
# View the polynomial interaction feature
print(df)
These exercises give you hands-on practice with feature creation and interaction terms, helping you understand how to generate new features and uncover hidden relationships in the data. By mastering these techniques, you can enhance your machine learning models' performance and better capture the complexity of the relationships in your datasets. Keep experimenting with different features and interactions to see how they impact your model!
7.3 Practical Exercises for Chapter 7
Now that we’ve explored the concepts of feature creation and interaction terms, it’s time to apply these techniques with some hands-on practical exercises. Each exercise is designed to help you practice creating new features, generating polynomial features, and constructing cross-features and interaction terms. Where necessary, solutions with code are provided.
Exercise 1: Creating a Logarithmic Feature
You are given a dataset with the feature Income, which has a skewed distribution. Your task is to:
Create a new feature, LogIncome, by applying a logarithmic transformation to the Income feature.
Solution:
import numpy as np
import pandas as pd
# Sample data
data = {'Income': [30000, 50000, 75000, 120000, 250000]}
df = pd.DataFrame(data)
# Apply a logarithmic transformation to create the LogIncome feature
df['LogIncome'] = np.log(df['Income'])
# View the original and new features
print(df)
Exercise 2: Extracting Date Features
You are working with a dataset containing a column SaleDate, which records the sale date of houses. Your task is to:
Extract three new features from the SaleDate column: YearSold, MonthSold, and DayOfWeekSold.
Solution:
# Sample data with a date column
data = {'SaleDate': ['2022-01-05', '2021-06-15', '2020-09-22', '2019-11-30']}
df = pd.DataFrame(data)
# Convert SaleDate to a datetime object
df['SaleDate'] = pd.to_datetime(df['SaleDate'])
# Extract new features: Year, Month, Day of the week
df['YearSold'] = df['SaleDate'].dt.year
df['MonthSold'] = df['SaleDate'].dt.month
df['DayOfWeekSold'] = df['SaleDate'].dt.dayofweek
# View the new features
print(df)
Exercise 3: Creating a Cross-feature
You are working with a dataset that contains the features HouseSize (in square feet) and NumBedrooms. Your task is to:
Create a new feature, PricePerBedroom, by dividing the HouseSize by NumBedrooms to normalize house sizes by the number of bedrooms.
Solution:
# Sample data
data = {'HouseSize': [2000, 2500, 3000, 3500, 4000],
'NumBedrooms': [3, 4, 4, 5, 6]}
df = pd.DataFrame(data)
# Create a new feature by dividing HouseSize by NumBedrooms
df['PricePerBedroom'] = df['HouseSize'] / df['NumBedrooms']
# View the new feature
print(df)
Exercise 4: Generating Polynomial Features
You are given a dataset with a single feature Age. Your task is to:
Create polynomial features of degree 2 (squared terms) and degree 3 (cubed terms) for the Age feature.
Solution:
from sklearn.preprocessing import PolynomialFeatures
# Sample data
data = {'Age': [25, 30, 35, 40, 45]}
df = pd.DataFrame(data)
# Initialize PolynomialFeatures object for degrees 2 and 3
poly = PolynomialFeatures(degree=3, include_bias=False)
# Generate polynomial features
polynomial_features = poly.fit_transform(df[['Age']])
# Create a DataFrame for the polynomial features
df_poly = pd.DataFrame(polynomial_features, columns=['Age', 'Age^2', 'Age^3'])
# View the polynomial features
print(df_poly)
Exercise 5: Creating Interaction Terms
You are working with a dataset that contains the features HousePrice, HouseSize, and YearBuilt. Your task is to:
Create three interaction terms: Price_Size_Interaction (HousePrice * HouseSize), Price_Year_Interaction (HousePrice * YearBuilt), and Size_Year_Interaction (HouseSize * YearBuilt).
Solution:
# Sample data
data = {'HousePrice': [300000, 500000, 700000],
'HouseSize': [1500, 2000, 2500],
'YearBuilt': [1990, 2000, 2010]}
df = pd.DataFrame(data)
# Create interaction terms
df['Price_Size_Interaction'] = df['HousePrice'] * df['HouseSize']
df['Price_Year_Interaction'] = df['HousePrice'] * df['YearBuilt']
df['Size_Year_Interaction'] = df['HouseSize'] * df['YearBuilt']
# View the interaction terms
print(df)
Exercise 6: Combining Polynomial and Interaction Features
You are working with the same dataset from Exercise 5, and now your task is to:
Create a polynomial interaction feature by squaring the Price_Size_Interaction term to capture higher-order effects.
Solution:
# Create a polynomial interaction feature by squaring the Price_Size_Interaction term
df['Price_Size_Interaction_Squared'] = df['Price_Size_Interaction'] ** 2
# View the polynomial interaction feature
print(df)
These exercises give you hands-on practice with feature creation and interaction terms, helping you understand how to generate new features and uncover hidden relationships in the data. By mastering these techniques, you can enhance your machine learning models' performance and better capture the complexity of the relationships in your datasets. Keep experimenting with different features and interactions to see how they impact your model!