Chapter 5: Transforming and Scaling Features
5.3 Practical Exercises for Chapter 5
Now that you’ve completed Chapter 5, it’s time to apply what you’ve learned through hands-on practical exercises. These exercises focus on scaling and non-linear transformations, covering logarithmic, square root, cube root, and power transformations. Each exercise includes solutions to help you solidify your understanding of these key concepts.
Exercise 1: Min-Max Scaling
You are working with a dataset containing the following columns: Age and Income. Your task is to:
Apply Min-Max scaling to both the Age and Income columns, transforming the values to a range between 0 and 1.
Solution:
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
# Sample data
data = {'Age': [25, 40, 35, 50, 60],
'Income': [40000, 50000, 60000, 80000, 100000]}
df = pd.DataFrame(data)
# Initialize the Min-Max Scaler
scaler = MinMaxScaler()
# Apply the scaler to the dataframe
df_scaled = pd.DataFrame(scaler.fit_transform(df), columns=df.columns)
# View the scaled dataframe
print(df_scaled)
Exercise 2: Standardization (Z-Score Normalization)
You are working with the same dataset as in Exercise 1. This time, your task is to:
Apply Standardization (Z-score normalization) to both the Age and Income columns.
Solution:
from sklearn.preprocessing import StandardScaler
# Sample data
data = {'Age': [25, 40, 35, 50, 60],
'Income': [40000, 50000, 60000, 80000, 100000]}
df = pd.DataFrame(data)
# Initialize the Standard Scaler
scaler = StandardScaler()
# Apply the scaler to the dataframe
df_standardized = pd.DataFrame(scaler.fit_transform(df), columns=df.columns)
# View the standardized dataframe
print(df_standardized)
Exercise 3: Logarithmic Transformation
You are working with a dataset containing the column HousePrices, which has values that are highly skewed to the right. Your task is to:
Apply a logarithmic transformation to the HousePrices column.
Solution:
import numpy as np
import pandas as pd
# Sample data with a right-skewed distribution
data = {'HousePrices': [50000, 120000, 250000, 500000, 1200000, 2500000]}
df = pd.DataFrame(data)
# Apply a logarithmic transformation
df['LogHousePrices'] = np.log(df['HousePrices'])
# View the transformed data
print(df)
Exercise 4: Square Root Transformation
You are working with the same HousePrices data. Your task is to:
Apply a square root transformation to the HousePrices column.
Solution:
# Apply a square root transformation
df['SqrtHousePrices'] = np.sqrt(df['HousePrices'])
# View the transformed data
print(df)
Exercise 5: Cube Root Transformation
You are given a dataset with the column PropertyValues, which contains both positive and negative values. Your task is to:
Apply a cube root transformation to the PropertyValues column.
Solution:
# Sample data with both positive and negative values
data = {'PropertyValues': [-8000, -5000, 0, 5000, 10000, 20000]}
df = pd.DataFrame(data)
# Apply a cube root transformation
df['CubeRootPropertyValues'] = np.cbrt(df['PropertyValues'])
# View the transformed data
print(df)
Exercise 6: Box-Cox Transformation
You are working with a dataset containing Income values, which are positive but moderately skewed. Your task is to:
Apply the Box-Cox transformation to the Income column using Scikit-learn's PowerTransformer.
Solution:
from sklearn.preprocessing import PowerTransformer
import pandas as pd
# Sample data (positive values only for Box-Cox)
data = {'Income': [30000, 50000, 100000, 200000, 500000]}
df = pd.DataFrame(data)
# Apply the Box-Cox transformation
boxcox_transformer = PowerTransformer(method='box-cox')
df['BoxCoxIncome'] = boxcox_transformer.fit_transform(df[['Income']])
# View the transformed data
print(df)
Exercise 7: Yeo-Johnson Transformation
You are working with a dataset containing the column Profit, which contains both positive and negative values. Your task is to:
Apply the Yeo-Johnson transformation to the Profit column using Scikit-learn’s PowerTransformer.
Solution:
from sklearn.preprocessing import PowerTransformer
import pandas as pd
# Sample data with positive and negative values
data = {'Profit': [-5000, -2000, 0, 3000, 15000]}
df = pd.DataFrame(data)
# Apply the Yeo-Johnson transformation
yeojohnson_transformer = PowerTransformer(method='yeo-johnson')
df['YeoJohnsonProfit'] = yeojohnson_transformer.fit_transform(df[['Profit']])
# View the transformed data
print(df)
These practical exercises provide hands-on experience with scaling, normalization, and non-linear transformations, including logarithmic, square root, cube root, Box-Cox, and Yeo-Johnson transformations. By practicing these techniques, you can confidently preprocess data for machine learning models, ensuring that your features are well-scaled, balanced, and optimized for performance. Keep practicing and exploring these methods to handle a variety of data distributions!
5.3 Practical Exercises for Chapter 5
Now that you’ve completed Chapter 5, it’s time to apply what you’ve learned through hands-on practical exercises. These exercises focus on scaling and non-linear transformations, covering logarithmic, square root, cube root, and power transformations. Each exercise includes solutions to help you solidify your understanding of these key concepts.
Exercise 1: Min-Max Scaling
You are working with a dataset containing the following columns: Age and Income. Your task is to:
Apply Min-Max scaling to both the Age and Income columns, transforming the values to a range between 0 and 1.
Solution:
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
# Sample data
data = {'Age': [25, 40, 35, 50, 60],
'Income': [40000, 50000, 60000, 80000, 100000]}
df = pd.DataFrame(data)
# Initialize the Min-Max Scaler
scaler = MinMaxScaler()
# Apply the scaler to the dataframe
df_scaled = pd.DataFrame(scaler.fit_transform(df), columns=df.columns)
# View the scaled dataframe
print(df_scaled)
Exercise 2: Standardization (Z-Score Normalization)
You are working with the same dataset as in Exercise 1. This time, your task is to:
Apply Standardization (Z-score normalization) to both the Age and Income columns.
Solution:
from sklearn.preprocessing import StandardScaler
# Sample data
data = {'Age': [25, 40, 35, 50, 60],
'Income': [40000, 50000, 60000, 80000, 100000]}
df = pd.DataFrame(data)
# Initialize the Standard Scaler
scaler = StandardScaler()
# Apply the scaler to the dataframe
df_standardized = pd.DataFrame(scaler.fit_transform(df), columns=df.columns)
# View the standardized dataframe
print(df_standardized)
Exercise 3: Logarithmic Transformation
You are working with a dataset containing the column HousePrices, which has values that are highly skewed to the right. Your task is to:
Apply a logarithmic transformation to the HousePrices column.
Solution:
import numpy as np
import pandas as pd
# Sample data with a right-skewed distribution
data = {'HousePrices': [50000, 120000, 250000, 500000, 1200000, 2500000]}
df = pd.DataFrame(data)
# Apply a logarithmic transformation
df['LogHousePrices'] = np.log(df['HousePrices'])
# View the transformed data
print(df)
Exercise 4: Square Root Transformation
You are working with the same HousePrices data. Your task is to:
Apply a square root transformation to the HousePrices column.
Solution:
# Apply a square root transformation
df['SqrtHousePrices'] = np.sqrt(df['HousePrices'])
# View the transformed data
print(df)
Exercise 5: Cube Root Transformation
You are given a dataset with the column PropertyValues, which contains both positive and negative values. Your task is to:
Apply a cube root transformation to the PropertyValues column.
Solution:
# Sample data with both positive and negative values
data = {'PropertyValues': [-8000, -5000, 0, 5000, 10000, 20000]}
df = pd.DataFrame(data)
# Apply a cube root transformation
df['CubeRootPropertyValues'] = np.cbrt(df['PropertyValues'])
# View the transformed data
print(df)
Exercise 6: Box-Cox Transformation
You are working with a dataset containing Income values, which are positive but moderately skewed. Your task is to:
Apply the Box-Cox transformation to the Income column using Scikit-learn's PowerTransformer.
Solution:
from sklearn.preprocessing import PowerTransformer
import pandas as pd
# Sample data (positive values only for Box-Cox)
data = {'Income': [30000, 50000, 100000, 200000, 500000]}
df = pd.DataFrame(data)
# Apply the Box-Cox transformation
boxcox_transformer = PowerTransformer(method='box-cox')
df['BoxCoxIncome'] = boxcox_transformer.fit_transform(df[['Income']])
# View the transformed data
print(df)
Exercise 7: Yeo-Johnson Transformation
You are working with a dataset containing the column Profit, which contains both positive and negative values. Your task is to:
Apply the Yeo-Johnson transformation to the Profit column using Scikit-learn’s PowerTransformer.
Solution:
from sklearn.preprocessing import PowerTransformer
import pandas as pd
# Sample data with positive and negative values
data = {'Profit': [-5000, -2000, 0, 3000, 15000]}
df = pd.DataFrame(data)
# Apply the Yeo-Johnson transformation
yeojohnson_transformer = PowerTransformer(method='yeo-johnson')
df['YeoJohnsonProfit'] = yeojohnson_transformer.fit_transform(df[['Profit']])
# View the transformed data
print(df)
These practical exercises provide hands-on experience with scaling, normalization, and non-linear transformations, including logarithmic, square root, cube root, Box-Cox, and Yeo-Johnson transformations. By practicing these techniques, you can confidently preprocess data for machine learning models, ensuring that your features are well-scaled, balanced, and optimized for performance. Keep practicing and exploring these methods to handle a variety of data distributions!
5.3 Practical Exercises for Chapter 5
Now that you’ve completed Chapter 5, it’s time to apply what you’ve learned through hands-on practical exercises. These exercises focus on scaling and non-linear transformations, covering logarithmic, square root, cube root, and power transformations. Each exercise includes solutions to help you solidify your understanding of these key concepts.
Exercise 1: Min-Max Scaling
You are working with a dataset containing the following columns: Age and Income. Your task is to:
Apply Min-Max scaling to both the Age and Income columns, transforming the values to a range between 0 and 1.
Solution:
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
# Sample data
data = {'Age': [25, 40, 35, 50, 60],
'Income': [40000, 50000, 60000, 80000, 100000]}
df = pd.DataFrame(data)
# Initialize the Min-Max Scaler
scaler = MinMaxScaler()
# Apply the scaler to the dataframe
df_scaled = pd.DataFrame(scaler.fit_transform(df), columns=df.columns)
# View the scaled dataframe
print(df_scaled)
Exercise 2: Standardization (Z-Score Normalization)
You are working with the same dataset as in Exercise 1. This time, your task is to:
Apply Standardization (Z-score normalization) to both the Age and Income columns.
Solution:
from sklearn.preprocessing import StandardScaler
# Sample data
data = {'Age': [25, 40, 35, 50, 60],
'Income': [40000, 50000, 60000, 80000, 100000]}
df = pd.DataFrame(data)
# Initialize the Standard Scaler
scaler = StandardScaler()
# Apply the scaler to the dataframe
df_standardized = pd.DataFrame(scaler.fit_transform(df), columns=df.columns)
# View the standardized dataframe
print(df_standardized)
Exercise 3: Logarithmic Transformation
You are working with a dataset containing the column HousePrices, which has values that are highly skewed to the right. Your task is to:
Apply a logarithmic transformation to the HousePrices column.
Solution:
import numpy as np
import pandas as pd
# Sample data with a right-skewed distribution
data = {'HousePrices': [50000, 120000, 250000, 500000, 1200000, 2500000]}
df = pd.DataFrame(data)
# Apply a logarithmic transformation
df['LogHousePrices'] = np.log(df['HousePrices'])
# View the transformed data
print(df)
Exercise 4: Square Root Transformation
You are working with the same HousePrices data. Your task is to:
Apply a square root transformation to the HousePrices column.
Solution:
# Apply a square root transformation
df['SqrtHousePrices'] = np.sqrt(df['HousePrices'])
# View the transformed data
print(df)
Exercise 5: Cube Root Transformation
You are given a dataset with the column PropertyValues, which contains both positive and negative values. Your task is to:
Apply a cube root transformation to the PropertyValues column.
Solution:
# Sample data with both positive and negative values
data = {'PropertyValues': [-8000, -5000, 0, 5000, 10000, 20000]}
df = pd.DataFrame(data)
# Apply a cube root transformation
df['CubeRootPropertyValues'] = np.cbrt(df['PropertyValues'])
# View the transformed data
print(df)
Exercise 6: Box-Cox Transformation
You are working with a dataset containing Income values, which are positive but moderately skewed. Your task is to:
Apply the Box-Cox transformation to the Income column using Scikit-learn's PowerTransformer.
Solution:
from sklearn.preprocessing import PowerTransformer
import pandas as pd
# Sample data (positive values only for Box-Cox)
data = {'Income': [30000, 50000, 100000, 200000, 500000]}
df = pd.DataFrame(data)
# Apply the Box-Cox transformation
boxcox_transformer = PowerTransformer(method='box-cox')
df['BoxCoxIncome'] = boxcox_transformer.fit_transform(df[['Income']])
# View the transformed data
print(df)
Exercise 7: Yeo-Johnson Transformation
You are working with a dataset containing the column Profit, which contains both positive and negative values. Your task is to:
Apply the Yeo-Johnson transformation to the Profit column using Scikit-learn’s PowerTransformer.
Solution:
from sklearn.preprocessing import PowerTransformer
import pandas as pd
# Sample data with positive and negative values
data = {'Profit': [-5000, -2000, 0, 3000, 15000]}
df = pd.DataFrame(data)
# Apply the Yeo-Johnson transformation
yeojohnson_transformer = PowerTransformer(method='yeo-johnson')
df['YeoJohnsonProfit'] = yeojohnson_transformer.fit_transform(df[['Profit']])
# View the transformed data
print(df)
These practical exercises provide hands-on experience with scaling, normalization, and non-linear transformations, including logarithmic, square root, cube root, Box-Cox, and Yeo-Johnson transformations. By practicing these techniques, you can confidently preprocess data for machine learning models, ensuring that your features are well-scaled, balanced, and optimized for performance. Keep practicing and exploring these methods to handle a variety of data distributions!
5.3 Practical Exercises for Chapter 5
Now that you’ve completed Chapter 5, it’s time to apply what you’ve learned through hands-on practical exercises. These exercises focus on scaling and non-linear transformations, covering logarithmic, square root, cube root, and power transformations. Each exercise includes solutions to help you solidify your understanding of these key concepts.
Exercise 1: Min-Max Scaling
You are working with a dataset containing the following columns: Age and Income. Your task is to:
Apply Min-Max scaling to both the Age and Income columns, transforming the values to a range between 0 and 1.
Solution:
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
# Sample data
data = {'Age': [25, 40, 35, 50, 60],
'Income': [40000, 50000, 60000, 80000, 100000]}
df = pd.DataFrame(data)
# Initialize the Min-Max Scaler
scaler = MinMaxScaler()
# Apply the scaler to the dataframe
df_scaled = pd.DataFrame(scaler.fit_transform(df), columns=df.columns)
# View the scaled dataframe
print(df_scaled)
Exercise 2: Standardization (Z-Score Normalization)
You are working with the same dataset as in Exercise 1. This time, your task is to:
Apply Standardization (Z-score normalization) to both the Age and Income columns.
Solution:
from sklearn.preprocessing import StandardScaler
# Sample data
data = {'Age': [25, 40, 35, 50, 60],
'Income': [40000, 50000, 60000, 80000, 100000]}
df = pd.DataFrame(data)
# Initialize the Standard Scaler
scaler = StandardScaler()
# Apply the scaler to the dataframe
df_standardized = pd.DataFrame(scaler.fit_transform(df), columns=df.columns)
# View the standardized dataframe
print(df_standardized)
Exercise 3: Logarithmic Transformation
You are working with a dataset containing the column HousePrices, which has values that are highly skewed to the right. Your task is to:
Apply a logarithmic transformation to the HousePrices column.
Solution:
import numpy as np
import pandas as pd
# Sample data with a right-skewed distribution
data = {'HousePrices': [50000, 120000, 250000, 500000, 1200000, 2500000]}
df = pd.DataFrame(data)
# Apply a logarithmic transformation
df['LogHousePrices'] = np.log(df['HousePrices'])
# View the transformed data
print(df)
Exercise 4: Square Root Transformation
You are working with the same HousePrices data. Your task is to:
Apply a square root transformation to the HousePrices column.
Solution:
# Apply a square root transformation
df['SqrtHousePrices'] = np.sqrt(df['HousePrices'])
# View the transformed data
print(df)
Exercise 5: Cube Root Transformation
You are given a dataset with the column PropertyValues, which contains both positive and negative values. Your task is to:
Apply a cube root transformation to the PropertyValues column.
Solution:
# Sample data with both positive and negative values
data = {'PropertyValues': [-8000, -5000, 0, 5000, 10000, 20000]}
df = pd.DataFrame(data)
# Apply a cube root transformation
df['CubeRootPropertyValues'] = np.cbrt(df['PropertyValues'])
# View the transformed data
print(df)
Exercise 6: Box-Cox Transformation
You are working with a dataset containing Income values, which are positive but moderately skewed. Your task is to:
Apply the Box-Cox transformation to the Income column using Scikit-learn's PowerTransformer.
Solution:
from sklearn.preprocessing import PowerTransformer
import pandas as pd
# Sample data (positive values only for Box-Cox)
data = {'Income': [30000, 50000, 100000, 200000, 500000]}
df = pd.DataFrame(data)
# Apply the Box-Cox transformation
boxcox_transformer = PowerTransformer(method='box-cox')
df['BoxCoxIncome'] = boxcox_transformer.fit_transform(df[['Income']])
# View the transformed data
print(df)
Exercise 7: Yeo-Johnson Transformation
You are working with a dataset containing the column Profit, which contains both positive and negative values. Your task is to:
Apply the Yeo-Johnson transformation to the Profit column using Scikit-learn’s PowerTransformer.
Solution:
from sklearn.preprocessing import PowerTransformer
import pandas as pd
# Sample data with positive and negative values
data = {'Profit': [-5000, -2000, 0, 3000, 15000]}
df = pd.DataFrame(data)
# Apply the Yeo-Johnson transformation
yeojohnson_transformer = PowerTransformer(method='yeo-johnson')
df['YeoJohnsonProfit'] = yeojohnson_transformer.fit_transform(df[['Profit']])
# View the transformed data
print(df)
These practical exercises provide hands-on experience with scaling, normalization, and non-linear transformations, including logarithmic, square root, cube root, Box-Cox, and Yeo-Johnson transformations. By practicing these techniques, you can confidently preprocess data for machine learning models, ensuring that your features are well-scaled, balanced, and optimized for performance. Keep practicing and exploring these methods to handle a variety of data distributions!