Chapter 8: Understanding EDA
8.4 Practical Exercises for Chapter 8: Understanding EDA
Exercise 1: Understanding the Importance of EDA
Load a dataset of your choice. Perform initial explorations like .head()
, .info()
and .describe()
to understand the data.
import pandas as pd
# Example Solution:
df = pd.read_csv('your_dataset.csv')
print(df.head())
print(df.info())
print(df.describe())
Download here the your_dataset.csv file
Exercise 2: Identifying Types of Data
Identify at least two columns in your dataset which contain categorical data and two which contain numerical data.
# Example Solution:
# Categorical: 'Gender', 'Country'
# Numerical: 'Age', 'Income'
Exercise 3: Calculating Descriptive Statistics
Calculate the mean, median, and standard deviation of a numerical column in your dataset.
# Example Solution:
mean_age = df['Age'].mean()
median_age = df['Age'].median()
std_age = df['Age'].std()
print(f"Mean Age: {mean_age}")
print(f"Median Age: {median_age}")
print(f"Standard Deviation of Age: {std_age}")
Exercise 4: Understanding Skewness and Kurtosis
Compute the skewness and kurtosis for a numerical column in your dataset.
# Example Solution:
skewness = df['Income'].skew()
kurtosis = df['Income'].kurt()
print(f"Skewness of Income: {skewness}")
print(f"Kurtosis of Income: {kurtosis}")
8.4 Practical Exercises for Chapter 8: Understanding EDA
Exercise 1: Understanding the Importance of EDA
Load a dataset of your choice. Perform initial explorations like .head()
, .info()
and .describe()
to understand the data.
import pandas as pd
# Example Solution:
df = pd.read_csv('your_dataset.csv')
print(df.head())
print(df.info())
print(df.describe())
Download here the your_dataset.csv file
Exercise 2: Identifying Types of Data
Identify at least two columns in your dataset which contain categorical data and two which contain numerical data.
# Example Solution:
# Categorical: 'Gender', 'Country'
# Numerical: 'Age', 'Income'
Exercise 3: Calculating Descriptive Statistics
Calculate the mean, median, and standard deviation of a numerical column in your dataset.
# Example Solution:
mean_age = df['Age'].mean()
median_age = df['Age'].median()
std_age = df['Age'].std()
print(f"Mean Age: {mean_age}")
print(f"Median Age: {median_age}")
print(f"Standard Deviation of Age: {std_age}")
Exercise 4: Understanding Skewness and Kurtosis
Compute the skewness and kurtosis for a numerical column in your dataset.
# Example Solution:
skewness = df['Income'].skew()
kurtosis = df['Income'].kurt()
print(f"Skewness of Income: {skewness}")
print(f"Kurtosis of Income: {kurtosis}")
8.4 Practical Exercises for Chapter 8: Understanding EDA
Exercise 1: Understanding the Importance of EDA
Load a dataset of your choice. Perform initial explorations like .head()
, .info()
and .describe()
to understand the data.
import pandas as pd
# Example Solution:
df = pd.read_csv('your_dataset.csv')
print(df.head())
print(df.info())
print(df.describe())
Download here the your_dataset.csv file
Exercise 2: Identifying Types of Data
Identify at least two columns in your dataset which contain categorical data and two which contain numerical data.
# Example Solution:
# Categorical: 'Gender', 'Country'
# Numerical: 'Age', 'Income'
Exercise 3: Calculating Descriptive Statistics
Calculate the mean, median, and standard deviation of a numerical column in your dataset.
# Example Solution:
mean_age = df['Age'].mean()
median_age = df['Age'].median()
std_age = df['Age'].std()
print(f"Mean Age: {mean_age}")
print(f"Median Age: {median_age}")
print(f"Standard Deviation of Age: {std_age}")
Exercise 4: Understanding Skewness and Kurtosis
Compute the skewness and kurtosis for a numerical column in your dataset.
# Example Solution:
skewness = df['Income'].skew()
kurtosis = df['Income'].kurt()
print(f"Skewness of Income: {skewness}")
print(f"Kurtosis of Income: {kurtosis}")
8.4 Practical Exercises for Chapter 8: Understanding EDA
Exercise 1: Understanding the Importance of EDA
Load a dataset of your choice. Perform initial explorations like .head()
, .info()
and .describe()
to understand the data.
import pandas as pd
# Example Solution:
df = pd.read_csv('your_dataset.csv')
print(df.head())
print(df.info())
print(df.describe())
Download here the your_dataset.csv file
Exercise 2: Identifying Types of Data
Identify at least two columns in your dataset which contain categorical data and two which contain numerical data.
# Example Solution:
# Categorical: 'Gender', 'Country'
# Numerical: 'Age', 'Income'
Exercise 3: Calculating Descriptive Statistics
Calculate the mean, median, and standard deviation of a numerical column in your dataset.
# Example Solution:
mean_age = df['Age'].mean()
median_age = df['Age'].median()
std_age = df['Age'].std()
print(f"Mean Age: {mean_age}")
print(f"Median Age: {median_age}")
print(f"Standard Deviation of Age: {std_age}")
Exercise 4: Understanding Skewness and Kurtosis
Compute the skewness and kurtosis for a numerical column in your dataset.
# Example Solution:
skewness = df['Income'].skew()
kurtosis = df['Income'].kurt()
print(f"Skewness of Income: {skewness}")
print(f"Kurtosis of Income: {kurtosis}")