Code icon

The App is Under a Quick Maintenance

We apologize for the inconvenience. Please come back later

Menu iconMenu iconData Engineering Foundations
Data Engineering Foundations

Chapter 5: Transforming and Scaling Features

5.3 Practical Exercises for Chapter 5

Now that you’ve completed Chapter 5, it’s time to apply what you’ve learned through hands-on practical exercises. These exercises focus on scaling and non-linear transformations, covering logarithmic, square root, cube root, and power transformations. Each exercise includes solutions to help you solidify your understanding of these key concepts.

Exercise 1: Min-Max Scaling

You are working with a dataset containing the following columns: Age and Income. Your task is to:

Apply Min-Max scaling to both the Age and Income columns, transforming the values to a range between 0 and 1.

Solution:

import pandas as pd
from sklearn.preprocessing import MinMaxScaler

# Sample data
data = {'Age': [25, 40, 35, 50, 60],
        'Income': [40000, 50000, 60000, 80000, 100000]}

df = pd.DataFrame(data)

# Initialize the Min-Max Scaler
scaler = MinMaxScaler()

# Apply the scaler to the dataframe
df_scaled = pd.DataFrame(scaler.fit_transform(df), columns=df.columns)

# View the scaled dataframe
print(df_scaled)

Exercise 2: Standardization (Z-Score Normalization)

You are working with the same dataset as in Exercise 1. This time, your task is to:

Apply Standardization (Z-score normalization) to both the Age and Income columns.

Solution:

from sklearn.preprocessing import StandardScaler

# Sample data
data = {'Age': [25, 40, 35, 50, 60],
        'Income': [40000, 50000, 60000, 80000, 100000]}

df = pd.DataFrame(data)

# Initialize the Standard Scaler
scaler = StandardScaler()

# Apply the scaler to the dataframe
df_standardized = pd.DataFrame(scaler.fit_transform(df), columns=df.columns)

# View the standardized dataframe
print(df_standardized)

Exercise 3: Logarithmic Transformation

You are working with a dataset containing the column HousePrices, which has values that are highly skewed to the right. Your task is to:

Apply a logarithmic transformation to the HousePrices column.

Solution:

import numpy as np
import pandas as pd

# Sample data with a right-skewed distribution
data = {'HousePrices': [50000, 120000, 250000, 500000, 1200000, 2500000]}

df = pd.DataFrame(data)

# Apply a logarithmic transformation
df['LogHousePrices'] = np.log(df['HousePrices'])

# View the transformed data
print(df)

Exercise 4: Square Root Transformation

You are working with the same HousePrices data. Your task is to:

Apply a square root transformation to the HousePrices column.

Solution:

# Apply a square root transformation
df['SqrtHousePrices'] = np.sqrt(df['HousePrices'])

# View the transformed data
print(df)

Exercise 5: Cube Root Transformation

You are given a dataset with the column PropertyValues, which contains both positive and negative values. Your task is to:

Apply a cube root transformation to the PropertyValues column.

Solution:

# Sample data with both positive and negative values
data = {'PropertyValues': [-8000, -5000, 0, 5000, 10000, 20000]}

df = pd.DataFrame(data)

# Apply a cube root transformation
df['CubeRootPropertyValues'] = np.cbrt(df['PropertyValues'])

# View the transformed data
print(df)

Exercise 6: Box-Cox Transformation

You are working with a dataset containing Income values, which are positive but moderately skewed. Your task is to:

Apply the Box-Cox transformation to the Income column using Scikit-learn's PowerTransformer.

Solution:

from sklearn.preprocessing import PowerTransformer
import pandas as pd

# Sample data (positive values only for Box-Cox)
data = {'Income': [30000, 50000, 100000, 200000, 500000]}

df = pd.DataFrame(data)

# Apply the Box-Cox transformation
boxcox_transformer = PowerTransformer(method='box-cox')
df['BoxCoxIncome'] = boxcox_transformer.fit_transform(df[['Income']])

# View the transformed data
print(df)

Exercise 7: Yeo-Johnson Transformation

You are working with a dataset containing the column Profit, which contains both positive and negative values. Your task is to:

Apply the Yeo-Johnson transformation to the Profit column using Scikit-learn’s PowerTransformer.

Solution:

from sklearn.preprocessing import PowerTransformer
import pandas as pd

# Sample data with positive and negative values
data = {'Profit': [-5000, -2000, 0, 3000, 15000]}

df = pd.DataFrame(data)

# Apply the Yeo-Johnson transformation
yeojohnson_transformer = PowerTransformer(method='yeo-johnson')
df['YeoJohnsonProfit'] = yeojohnson_transformer.fit_transform(df[['Profit']])

# View the transformed data
print(df)

These practical exercises provide hands-on experience with scaling, normalization, and non-linear transformations, including logarithmic, square root, cube root, Box-Cox, and Yeo-Johnson transformations. By practicing these techniques, you can confidently preprocess data for machine learning models, ensuring that your features are well-scaled, balanced, and optimized for performance. Keep practicing and exploring these methods to handle a variety of data distributions!

5.3 Practical Exercises for Chapter 5

Now that you’ve completed Chapter 5, it’s time to apply what you’ve learned through hands-on practical exercises. These exercises focus on scaling and non-linear transformations, covering logarithmic, square root, cube root, and power transformations. Each exercise includes solutions to help you solidify your understanding of these key concepts.

Exercise 1: Min-Max Scaling

You are working with a dataset containing the following columns: Age and Income. Your task is to:

Apply Min-Max scaling to both the Age and Income columns, transforming the values to a range between 0 and 1.

Solution:

import pandas as pd
from sklearn.preprocessing import MinMaxScaler

# Sample data
data = {'Age': [25, 40, 35, 50, 60],
        'Income': [40000, 50000, 60000, 80000, 100000]}

df = pd.DataFrame(data)

# Initialize the Min-Max Scaler
scaler = MinMaxScaler()

# Apply the scaler to the dataframe
df_scaled = pd.DataFrame(scaler.fit_transform(df), columns=df.columns)

# View the scaled dataframe
print(df_scaled)

Exercise 2: Standardization (Z-Score Normalization)

You are working with the same dataset as in Exercise 1. This time, your task is to:

Apply Standardization (Z-score normalization) to both the Age and Income columns.

Solution:

from sklearn.preprocessing import StandardScaler

# Sample data
data = {'Age': [25, 40, 35, 50, 60],
        'Income': [40000, 50000, 60000, 80000, 100000]}

df = pd.DataFrame(data)

# Initialize the Standard Scaler
scaler = StandardScaler()

# Apply the scaler to the dataframe
df_standardized = pd.DataFrame(scaler.fit_transform(df), columns=df.columns)

# View the standardized dataframe
print(df_standardized)

Exercise 3: Logarithmic Transformation

You are working with a dataset containing the column HousePrices, which has values that are highly skewed to the right. Your task is to:

Apply a logarithmic transformation to the HousePrices column.

Solution:

import numpy as np
import pandas as pd

# Sample data with a right-skewed distribution
data = {'HousePrices': [50000, 120000, 250000, 500000, 1200000, 2500000]}

df = pd.DataFrame(data)

# Apply a logarithmic transformation
df['LogHousePrices'] = np.log(df['HousePrices'])

# View the transformed data
print(df)

Exercise 4: Square Root Transformation

You are working with the same HousePrices data. Your task is to:

Apply a square root transformation to the HousePrices column.

Solution:

# Apply a square root transformation
df['SqrtHousePrices'] = np.sqrt(df['HousePrices'])

# View the transformed data
print(df)

Exercise 5: Cube Root Transformation

You are given a dataset with the column PropertyValues, which contains both positive and negative values. Your task is to:

Apply a cube root transformation to the PropertyValues column.

Solution:

# Sample data with both positive and negative values
data = {'PropertyValues': [-8000, -5000, 0, 5000, 10000, 20000]}

df = pd.DataFrame(data)

# Apply a cube root transformation
df['CubeRootPropertyValues'] = np.cbrt(df['PropertyValues'])

# View the transformed data
print(df)

Exercise 6: Box-Cox Transformation

You are working with a dataset containing Income values, which are positive but moderately skewed. Your task is to:

Apply the Box-Cox transformation to the Income column using Scikit-learn's PowerTransformer.

Solution:

from sklearn.preprocessing import PowerTransformer
import pandas as pd

# Sample data (positive values only for Box-Cox)
data = {'Income': [30000, 50000, 100000, 200000, 500000]}

df = pd.DataFrame(data)

# Apply the Box-Cox transformation
boxcox_transformer = PowerTransformer(method='box-cox')
df['BoxCoxIncome'] = boxcox_transformer.fit_transform(df[['Income']])

# View the transformed data
print(df)

Exercise 7: Yeo-Johnson Transformation

You are working with a dataset containing the column Profit, which contains both positive and negative values. Your task is to:

Apply the Yeo-Johnson transformation to the Profit column using Scikit-learn’s PowerTransformer.

Solution:

from sklearn.preprocessing import PowerTransformer
import pandas as pd

# Sample data with positive and negative values
data = {'Profit': [-5000, -2000, 0, 3000, 15000]}

df = pd.DataFrame(data)

# Apply the Yeo-Johnson transformation
yeojohnson_transformer = PowerTransformer(method='yeo-johnson')
df['YeoJohnsonProfit'] = yeojohnson_transformer.fit_transform(df[['Profit']])

# View the transformed data
print(df)

These practical exercises provide hands-on experience with scaling, normalization, and non-linear transformations, including logarithmic, square root, cube root, Box-Cox, and Yeo-Johnson transformations. By practicing these techniques, you can confidently preprocess data for machine learning models, ensuring that your features are well-scaled, balanced, and optimized for performance. Keep practicing and exploring these methods to handle a variety of data distributions!

5.3 Practical Exercises for Chapter 5

Now that you’ve completed Chapter 5, it’s time to apply what you’ve learned through hands-on practical exercises. These exercises focus on scaling and non-linear transformations, covering logarithmic, square root, cube root, and power transformations. Each exercise includes solutions to help you solidify your understanding of these key concepts.

Exercise 1: Min-Max Scaling

You are working with a dataset containing the following columns: Age and Income. Your task is to:

Apply Min-Max scaling to both the Age and Income columns, transforming the values to a range between 0 and 1.

Solution:

import pandas as pd
from sklearn.preprocessing import MinMaxScaler

# Sample data
data = {'Age': [25, 40, 35, 50, 60],
        'Income': [40000, 50000, 60000, 80000, 100000]}

df = pd.DataFrame(data)

# Initialize the Min-Max Scaler
scaler = MinMaxScaler()

# Apply the scaler to the dataframe
df_scaled = pd.DataFrame(scaler.fit_transform(df), columns=df.columns)

# View the scaled dataframe
print(df_scaled)

Exercise 2: Standardization (Z-Score Normalization)

You are working with the same dataset as in Exercise 1. This time, your task is to:

Apply Standardization (Z-score normalization) to both the Age and Income columns.

Solution:

from sklearn.preprocessing import StandardScaler

# Sample data
data = {'Age': [25, 40, 35, 50, 60],
        'Income': [40000, 50000, 60000, 80000, 100000]}

df = pd.DataFrame(data)

# Initialize the Standard Scaler
scaler = StandardScaler()

# Apply the scaler to the dataframe
df_standardized = pd.DataFrame(scaler.fit_transform(df), columns=df.columns)

# View the standardized dataframe
print(df_standardized)

Exercise 3: Logarithmic Transformation

You are working with a dataset containing the column HousePrices, which has values that are highly skewed to the right. Your task is to:

Apply a logarithmic transformation to the HousePrices column.

Solution:

import numpy as np
import pandas as pd

# Sample data with a right-skewed distribution
data = {'HousePrices': [50000, 120000, 250000, 500000, 1200000, 2500000]}

df = pd.DataFrame(data)

# Apply a logarithmic transformation
df['LogHousePrices'] = np.log(df['HousePrices'])

# View the transformed data
print(df)

Exercise 4: Square Root Transformation

You are working with the same HousePrices data. Your task is to:

Apply a square root transformation to the HousePrices column.

Solution:

# Apply a square root transformation
df['SqrtHousePrices'] = np.sqrt(df['HousePrices'])

# View the transformed data
print(df)

Exercise 5: Cube Root Transformation

You are given a dataset with the column PropertyValues, which contains both positive and negative values. Your task is to:

Apply a cube root transformation to the PropertyValues column.

Solution:

# Sample data with both positive and negative values
data = {'PropertyValues': [-8000, -5000, 0, 5000, 10000, 20000]}

df = pd.DataFrame(data)

# Apply a cube root transformation
df['CubeRootPropertyValues'] = np.cbrt(df['PropertyValues'])

# View the transformed data
print(df)

Exercise 6: Box-Cox Transformation

You are working with a dataset containing Income values, which are positive but moderately skewed. Your task is to:

Apply the Box-Cox transformation to the Income column using Scikit-learn's PowerTransformer.

Solution:

from sklearn.preprocessing import PowerTransformer
import pandas as pd

# Sample data (positive values only for Box-Cox)
data = {'Income': [30000, 50000, 100000, 200000, 500000]}

df = pd.DataFrame(data)

# Apply the Box-Cox transformation
boxcox_transformer = PowerTransformer(method='box-cox')
df['BoxCoxIncome'] = boxcox_transformer.fit_transform(df[['Income']])

# View the transformed data
print(df)

Exercise 7: Yeo-Johnson Transformation

You are working with a dataset containing the column Profit, which contains both positive and negative values. Your task is to:

Apply the Yeo-Johnson transformation to the Profit column using Scikit-learn’s PowerTransformer.

Solution:

from sklearn.preprocessing import PowerTransformer
import pandas as pd

# Sample data with positive and negative values
data = {'Profit': [-5000, -2000, 0, 3000, 15000]}

df = pd.DataFrame(data)

# Apply the Yeo-Johnson transformation
yeojohnson_transformer = PowerTransformer(method='yeo-johnson')
df['YeoJohnsonProfit'] = yeojohnson_transformer.fit_transform(df[['Profit']])

# View the transformed data
print(df)

These practical exercises provide hands-on experience with scaling, normalization, and non-linear transformations, including logarithmic, square root, cube root, Box-Cox, and Yeo-Johnson transformations. By practicing these techniques, you can confidently preprocess data for machine learning models, ensuring that your features are well-scaled, balanced, and optimized for performance. Keep practicing and exploring these methods to handle a variety of data distributions!

5.3 Practical Exercises for Chapter 5

Now that you’ve completed Chapter 5, it’s time to apply what you’ve learned through hands-on practical exercises. These exercises focus on scaling and non-linear transformations, covering logarithmic, square root, cube root, and power transformations. Each exercise includes solutions to help you solidify your understanding of these key concepts.

Exercise 1: Min-Max Scaling

You are working with a dataset containing the following columns: Age and Income. Your task is to:

Apply Min-Max scaling to both the Age and Income columns, transforming the values to a range between 0 and 1.

Solution:

import pandas as pd
from sklearn.preprocessing import MinMaxScaler

# Sample data
data = {'Age': [25, 40, 35, 50, 60],
        'Income': [40000, 50000, 60000, 80000, 100000]}

df = pd.DataFrame(data)

# Initialize the Min-Max Scaler
scaler = MinMaxScaler()

# Apply the scaler to the dataframe
df_scaled = pd.DataFrame(scaler.fit_transform(df), columns=df.columns)

# View the scaled dataframe
print(df_scaled)

Exercise 2: Standardization (Z-Score Normalization)

You are working with the same dataset as in Exercise 1. This time, your task is to:

Apply Standardization (Z-score normalization) to both the Age and Income columns.

Solution:

from sklearn.preprocessing import StandardScaler

# Sample data
data = {'Age': [25, 40, 35, 50, 60],
        'Income': [40000, 50000, 60000, 80000, 100000]}

df = pd.DataFrame(data)

# Initialize the Standard Scaler
scaler = StandardScaler()

# Apply the scaler to the dataframe
df_standardized = pd.DataFrame(scaler.fit_transform(df), columns=df.columns)

# View the standardized dataframe
print(df_standardized)

Exercise 3: Logarithmic Transformation

You are working with a dataset containing the column HousePrices, which has values that are highly skewed to the right. Your task is to:

Apply a logarithmic transformation to the HousePrices column.

Solution:

import numpy as np
import pandas as pd

# Sample data with a right-skewed distribution
data = {'HousePrices': [50000, 120000, 250000, 500000, 1200000, 2500000]}

df = pd.DataFrame(data)

# Apply a logarithmic transformation
df['LogHousePrices'] = np.log(df['HousePrices'])

# View the transformed data
print(df)

Exercise 4: Square Root Transformation

You are working with the same HousePrices data. Your task is to:

Apply a square root transformation to the HousePrices column.

Solution:

# Apply a square root transformation
df['SqrtHousePrices'] = np.sqrt(df['HousePrices'])

# View the transformed data
print(df)

Exercise 5: Cube Root Transformation

You are given a dataset with the column PropertyValues, which contains both positive and negative values. Your task is to:

Apply a cube root transformation to the PropertyValues column.

Solution:

# Sample data with both positive and negative values
data = {'PropertyValues': [-8000, -5000, 0, 5000, 10000, 20000]}

df = pd.DataFrame(data)

# Apply a cube root transformation
df['CubeRootPropertyValues'] = np.cbrt(df['PropertyValues'])

# View the transformed data
print(df)

Exercise 6: Box-Cox Transformation

You are working with a dataset containing Income values, which are positive but moderately skewed. Your task is to:

Apply the Box-Cox transformation to the Income column using Scikit-learn's PowerTransformer.

Solution:

from sklearn.preprocessing import PowerTransformer
import pandas as pd

# Sample data (positive values only for Box-Cox)
data = {'Income': [30000, 50000, 100000, 200000, 500000]}

df = pd.DataFrame(data)

# Apply the Box-Cox transformation
boxcox_transformer = PowerTransformer(method='box-cox')
df['BoxCoxIncome'] = boxcox_transformer.fit_transform(df[['Income']])

# View the transformed data
print(df)

Exercise 7: Yeo-Johnson Transformation

You are working with a dataset containing the column Profit, which contains both positive and negative values. Your task is to:

Apply the Yeo-Johnson transformation to the Profit column using Scikit-learn’s PowerTransformer.

Solution:

from sklearn.preprocessing import PowerTransformer
import pandas as pd

# Sample data with positive and negative values
data = {'Profit': [-5000, -2000, 0, 3000, 15000]}

df = pd.DataFrame(data)

# Apply the Yeo-Johnson transformation
yeojohnson_transformer = PowerTransformer(method='yeo-johnson')
df['YeoJohnsonProfit'] = yeojohnson_transformer.fit_transform(df[['Profit']])

# View the transformed data
print(df)

These practical exercises provide hands-on experience with scaling, normalization, and non-linear transformations, including logarithmic, square root, cube root, Box-Cox, and Yeo-Johnson transformations. By practicing these techniques, you can confidently preprocess data for machine learning models, ensuring that your features are well-scaled, balanced, and optimized for performance. Keep practicing and exploring these methods to handle a variety of data distributions!