Menu iconMenu iconMachine Learning with Python
Machine Learning with Python

Chapter 4: Supervised Learning

4.4 Practical Exercises of Chapter 4: Supervised Learning

Exercise 1: Regression Analysis

Using the Boston Housing dataset available in Scikit-learn, perform a simple linear regression analysis to predict the median value of owner-occupied homes. Evaluate your model using the MAE, MSE, RMSE, and R-squared metrics.

import pandas as pd
from sklearn.datasets import load_boston
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
import numpy as np

# Load the Boston Housing dataset
boston = load_boston()

# Create a DataFrame
df = pd.DataFrame(boston.data, columns=boston.feature_names)
df['MEDV'] = boston.target

# Create a LinearRegression model
model = LinearRegression()

# Fit the model
model.fit(df[['RM']], df['MEDV'])

# Predict new values
predictions = model.predict(df[['RM']])

# Calculate metrics
mae = mean_absolute_error(df['MEDV'], predictions)
mse = mean_squared_error(df['MEDV'], predictions)
rmse = np.sqrt(mse)
r2 = r2_score(df['MEDV'], predictions)

print("MAE:", mae)
print("MSE:", mse)
print("RMSE:", rmse)
print("R-squared:", r2)

Exercise 2: Classification Techniques

Using the Iris dataset available in Scikit-learn, perform a logistic regression analysis to predict the species of iris. Evaluate your model using the accuracy, precision, recall, F1 score, and ROC AUC score metrics.

from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, roc_auc_score
from sklearn.preprocessing import LabelBinarizer

# Load the Iris dataset
iris = load_iris()

# Create a DataFrame
df = pd.DataFrame(iris.data, columns=iris.feature_names)
df['species'] = iris.target

# Create a LogisticRegression model
model = LogisticRegression()

# Fit the model
model.fit(df[['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']], df['species'])

# Predict new values
predictions = model.predict(df[['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']])

# Calculate metrics
accuracy = accuracy_score(df['species'], predictions)
precision = precision_score(df['species'], predictions, average='macro')
recall = recall_score(df['species'], predictions, average='macro')
f1 = f1_score(df['species'], predictions, average='macro')

# Binarize the output
lb = LabelBinarizer()
y_true_bin = lb.fit_transform(df['species'])
y_pred_bin = lb.transform(predictions)

roc_auc = roc_auc_score(y_true_bin, y_pred_bin, multi_class='ovr')

print("Accuracy:", accuracy)
print("Precision:", precision)
print("Recall:", recall)
print("F1 Score:", f1)
print("ROC AUC Score:", roc_auc)

These exercises will allow you to apply the concepts learned in this chapter and gain hands-on experience with regression and classification techniques, as well as evaluation metrics for supervised learning.

Chapter 4 Conclusion

In this chapter, we delved into the world of supervised learning, one of the most widely used types of machine learning. We started by exploring regression analysis, a statistical method used to predict a continuous outcome. We learned about simple and multiple linear regression, and how these techniques can be used to model the relationship between a dependent variable and one or more independent variables. We also discussed the evaluation metrics used for regression models, including Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and R-squared.

Next, we turned our attention to classification techniques, which are used to predict a categorical outcome. We discussed several popular classification algorithms, including logistic regression, decision trees, support vector machines, k-nearest neighbors, random forest, and gradient boosting. Each of these techniques has its strengths and weaknesses, and the choice of which to use depends on the specific problem and data at hand.

We also discussed the evaluation metrics used for classification models, including accuracy, precision, recall, F1 score, and Area Under the Receiver Operating Characteristic (ROC) Curve. We emphasized the importance of understanding these metrics and using them correctly, as each provides a different perspective on the model's performance.

Finally, we provided practical exercises for you to apply the concepts learned in this chapter. These exercises involved performing regression and classification analyses on real-world datasets and evaluating the performance of the models using the appropriate metrics.

As we conclude this chapter, it's important to remember that supervised learning is a powerful tool, but it's not without its challenges. Issues such as overfitting, underfitting, and bias-variance tradeoff can affect the performance of your models. Furthermore, the quality of your results depends heavily on the quality of your data and the appropriateness of the chosen model for your data and task.

In the next chapter, we will explore unsupervised learning, another major type of machine learning. Unlike supervised learning, which involves learning from labeled data, unsupervised learning involves learning from unlabeled data. This presents its own set of challenges and opportunities, which we will discuss in detail.

As you continue your journey into the world of machine learning, remember that the key to success is practice. The more you work with these techniques and the more data you get your hands on, the more comfortable you will become with these tools and the better you will get at extracting valuable insights from data. Happy learning!

4.4 Practical Exercises of Chapter 4: Supervised Learning

Exercise 1: Regression Analysis

Using the Boston Housing dataset available in Scikit-learn, perform a simple linear regression analysis to predict the median value of owner-occupied homes. Evaluate your model using the MAE, MSE, RMSE, and R-squared metrics.

import pandas as pd
from sklearn.datasets import load_boston
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
import numpy as np

# Load the Boston Housing dataset
boston = load_boston()

# Create a DataFrame
df = pd.DataFrame(boston.data, columns=boston.feature_names)
df['MEDV'] = boston.target

# Create a LinearRegression model
model = LinearRegression()

# Fit the model
model.fit(df[['RM']], df['MEDV'])

# Predict new values
predictions = model.predict(df[['RM']])

# Calculate metrics
mae = mean_absolute_error(df['MEDV'], predictions)
mse = mean_squared_error(df['MEDV'], predictions)
rmse = np.sqrt(mse)
r2 = r2_score(df['MEDV'], predictions)

print("MAE:", mae)
print("MSE:", mse)
print("RMSE:", rmse)
print("R-squared:", r2)

Exercise 2: Classification Techniques

Using the Iris dataset available in Scikit-learn, perform a logistic regression analysis to predict the species of iris. Evaluate your model using the accuracy, precision, recall, F1 score, and ROC AUC score metrics.

from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, roc_auc_score
from sklearn.preprocessing import LabelBinarizer

# Load the Iris dataset
iris = load_iris()

# Create a DataFrame
df = pd.DataFrame(iris.data, columns=iris.feature_names)
df['species'] = iris.target

# Create a LogisticRegression model
model = LogisticRegression()

# Fit the model
model.fit(df[['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']], df['species'])

# Predict new values
predictions = model.predict(df[['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']])

# Calculate metrics
accuracy = accuracy_score(df['species'], predictions)
precision = precision_score(df['species'], predictions, average='macro')
recall = recall_score(df['species'], predictions, average='macro')
f1 = f1_score(df['species'], predictions, average='macro')

# Binarize the output
lb = LabelBinarizer()
y_true_bin = lb.fit_transform(df['species'])
y_pred_bin = lb.transform(predictions)

roc_auc = roc_auc_score(y_true_bin, y_pred_bin, multi_class='ovr')

print("Accuracy:", accuracy)
print("Precision:", precision)
print("Recall:", recall)
print("F1 Score:", f1)
print("ROC AUC Score:", roc_auc)

These exercises will allow you to apply the concepts learned in this chapter and gain hands-on experience with regression and classification techniques, as well as evaluation metrics for supervised learning.

Chapter 4 Conclusion

In this chapter, we delved into the world of supervised learning, one of the most widely used types of machine learning. We started by exploring regression analysis, a statistical method used to predict a continuous outcome. We learned about simple and multiple linear regression, and how these techniques can be used to model the relationship between a dependent variable and one or more independent variables. We also discussed the evaluation metrics used for regression models, including Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and R-squared.

Next, we turned our attention to classification techniques, which are used to predict a categorical outcome. We discussed several popular classification algorithms, including logistic regression, decision trees, support vector machines, k-nearest neighbors, random forest, and gradient boosting. Each of these techniques has its strengths and weaknesses, and the choice of which to use depends on the specific problem and data at hand.

We also discussed the evaluation metrics used for classification models, including accuracy, precision, recall, F1 score, and Area Under the Receiver Operating Characteristic (ROC) Curve. We emphasized the importance of understanding these metrics and using them correctly, as each provides a different perspective on the model's performance.

Finally, we provided practical exercises for you to apply the concepts learned in this chapter. These exercises involved performing regression and classification analyses on real-world datasets and evaluating the performance of the models using the appropriate metrics.

As we conclude this chapter, it's important to remember that supervised learning is a powerful tool, but it's not without its challenges. Issues such as overfitting, underfitting, and bias-variance tradeoff can affect the performance of your models. Furthermore, the quality of your results depends heavily on the quality of your data and the appropriateness of the chosen model for your data and task.

In the next chapter, we will explore unsupervised learning, another major type of machine learning. Unlike supervised learning, which involves learning from labeled data, unsupervised learning involves learning from unlabeled data. This presents its own set of challenges and opportunities, which we will discuss in detail.

As you continue your journey into the world of machine learning, remember that the key to success is practice. The more you work with these techniques and the more data you get your hands on, the more comfortable you will become with these tools and the better you will get at extracting valuable insights from data. Happy learning!

4.4 Practical Exercises of Chapter 4: Supervised Learning

Exercise 1: Regression Analysis

Using the Boston Housing dataset available in Scikit-learn, perform a simple linear regression analysis to predict the median value of owner-occupied homes. Evaluate your model using the MAE, MSE, RMSE, and R-squared metrics.

import pandas as pd
from sklearn.datasets import load_boston
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
import numpy as np

# Load the Boston Housing dataset
boston = load_boston()

# Create a DataFrame
df = pd.DataFrame(boston.data, columns=boston.feature_names)
df['MEDV'] = boston.target

# Create a LinearRegression model
model = LinearRegression()

# Fit the model
model.fit(df[['RM']], df['MEDV'])

# Predict new values
predictions = model.predict(df[['RM']])

# Calculate metrics
mae = mean_absolute_error(df['MEDV'], predictions)
mse = mean_squared_error(df['MEDV'], predictions)
rmse = np.sqrt(mse)
r2 = r2_score(df['MEDV'], predictions)

print("MAE:", mae)
print("MSE:", mse)
print("RMSE:", rmse)
print("R-squared:", r2)

Exercise 2: Classification Techniques

Using the Iris dataset available in Scikit-learn, perform a logistic regression analysis to predict the species of iris. Evaluate your model using the accuracy, precision, recall, F1 score, and ROC AUC score metrics.

from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, roc_auc_score
from sklearn.preprocessing import LabelBinarizer

# Load the Iris dataset
iris = load_iris()

# Create a DataFrame
df = pd.DataFrame(iris.data, columns=iris.feature_names)
df['species'] = iris.target

# Create a LogisticRegression model
model = LogisticRegression()

# Fit the model
model.fit(df[['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']], df['species'])

# Predict new values
predictions = model.predict(df[['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']])

# Calculate metrics
accuracy = accuracy_score(df['species'], predictions)
precision = precision_score(df['species'], predictions, average='macro')
recall = recall_score(df['species'], predictions, average='macro')
f1 = f1_score(df['species'], predictions, average='macro')

# Binarize the output
lb = LabelBinarizer()
y_true_bin = lb.fit_transform(df['species'])
y_pred_bin = lb.transform(predictions)

roc_auc = roc_auc_score(y_true_bin, y_pred_bin, multi_class='ovr')

print("Accuracy:", accuracy)
print("Precision:", precision)
print("Recall:", recall)
print("F1 Score:", f1)
print("ROC AUC Score:", roc_auc)

These exercises will allow you to apply the concepts learned in this chapter and gain hands-on experience with regression and classification techniques, as well as evaluation metrics for supervised learning.

Chapter 4 Conclusion

In this chapter, we delved into the world of supervised learning, one of the most widely used types of machine learning. We started by exploring regression analysis, a statistical method used to predict a continuous outcome. We learned about simple and multiple linear regression, and how these techniques can be used to model the relationship between a dependent variable and one or more independent variables. We also discussed the evaluation metrics used for regression models, including Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and R-squared.

Next, we turned our attention to classification techniques, which are used to predict a categorical outcome. We discussed several popular classification algorithms, including logistic regression, decision trees, support vector machines, k-nearest neighbors, random forest, and gradient boosting. Each of these techniques has its strengths and weaknesses, and the choice of which to use depends on the specific problem and data at hand.

We also discussed the evaluation metrics used for classification models, including accuracy, precision, recall, F1 score, and Area Under the Receiver Operating Characteristic (ROC) Curve. We emphasized the importance of understanding these metrics and using them correctly, as each provides a different perspective on the model's performance.

Finally, we provided practical exercises for you to apply the concepts learned in this chapter. These exercises involved performing regression and classification analyses on real-world datasets and evaluating the performance of the models using the appropriate metrics.

As we conclude this chapter, it's important to remember that supervised learning is a powerful tool, but it's not without its challenges. Issues such as overfitting, underfitting, and bias-variance tradeoff can affect the performance of your models. Furthermore, the quality of your results depends heavily on the quality of your data and the appropriateness of the chosen model for your data and task.

In the next chapter, we will explore unsupervised learning, another major type of machine learning. Unlike supervised learning, which involves learning from labeled data, unsupervised learning involves learning from unlabeled data. This presents its own set of challenges and opportunities, which we will discuss in detail.

As you continue your journey into the world of machine learning, remember that the key to success is practice. The more you work with these techniques and the more data you get your hands on, the more comfortable you will become with these tools and the better you will get at extracting valuable insights from data. Happy learning!

4.4 Practical Exercises of Chapter 4: Supervised Learning

Exercise 1: Regression Analysis

Using the Boston Housing dataset available in Scikit-learn, perform a simple linear regression analysis to predict the median value of owner-occupied homes. Evaluate your model using the MAE, MSE, RMSE, and R-squared metrics.

import pandas as pd
from sklearn.datasets import load_boston
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
import numpy as np

# Load the Boston Housing dataset
boston = load_boston()

# Create a DataFrame
df = pd.DataFrame(boston.data, columns=boston.feature_names)
df['MEDV'] = boston.target

# Create a LinearRegression model
model = LinearRegression()

# Fit the model
model.fit(df[['RM']], df['MEDV'])

# Predict new values
predictions = model.predict(df[['RM']])

# Calculate metrics
mae = mean_absolute_error(df['MEDV'], predictions)
mse = mean_squared_error(df['MEDV'], predictions)
rmse = np.sqrt(mse)
r2 = r2_score(df['MEDV'], predictions)

print("MAE:", mae)
print("MSE:", mse)
print("RMSE:", rmse)
print("R-squared:", r2)

Exercise 2: Classification Techniques

Using the Iris dataset available in Scikit-learn, perform a logistic regression analysis to predict the species of iris. Evaluate your model using the accuracy, precision, recall, F1 score, and ROC AUC score metrics.

from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, roc_auc_score
from sklearn.preprocessing import LabelBinarizer

# Load the Iris dataset
iris = load_iris()

# Create a DataFrame
df = pd.DataFrame(iris.data, columns=iris.feature_names)
df['species'] = iris.target

# Create a LogisticRegression model
model = LogisticRegression()

# Fit the model
model.fit(df[['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']], df['species'])

# Predict new values
predictions = model.predict(df[['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']])

# Calculate metrics
accuracy = accuracy_score(df['species'], predictions)
precision = precision_score(df['species'], predictions, average='macro')
recall = recall_score(df['species'], predictions, average='macro')
f1 = f1_score(df['species'], predictions, average='macro')

# Binarize the output
lb = LabelBinarizer()
y_true_bin = lb.fit_transform(df['species'])
y_pred_bin = lb.transform(predictions)

roc_auc = roc_auc_score(y_true_bin, y_pred_bin, multi_class='ovr')

print("Accuracy:", accuracy)
print("Precision:", precision)
print("Recall:", recall)
print("F1 Score:", f1)
print("ROC AUC Score:", roc_auc)

These exercises will allow you to apply the concepts learned in this chapter and gain hands-on experience with regression and classification techniques, as well as evaluation metrics for supervised learning.

Chapter 4 Conclusion

In this chapter, we delved into the world of supervised learning, one of the most widely used types of machine learning. We started by exploring regression analysis, a statistical method used to predict a continuous outcome. We learned about simple and multiple linear regression, and how these techniques can be used to model the relationship between a dependent variable and one or more independent variables. We also discussed the evaluation metrics used for regression models, including Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and R-squared.

Next, we turned our attention to classification techniques, which are used to predict a categorical outcome. We discussed several popular classification algorithms, including logistic regression, decision trees, support vector machines, k-nearest neighbors, random forest, and gradient boosting. Each of these techniques has its strengths and weaknesses, and the choice of which to use depends on the specific problem and data at hand.

We also discussed the evaluation metrics used for classification models, including accuracy, precision, recall, F1 score, and Area Under the Receiver Operating Characteristic (ROC) Curve. We emphasized the importance of understanding these metrics and using them correctly, as each provides a different perspective on the model's performance.

Finally, we provided practical exercises for you to apply the concepts learned in this chapter. These exercises involved performing regression and classification analyses on real-world datasets and evaluating the performance of the models using the appropriate metrics.

As we conclude this chapter, it's important to remember that supervised learning is a powerful tool, but it's not without its challenges. Issues such as overfitting, underfitting, and bias-variance tradeoff can affect the performance of your models. Furthermore, the quality of your results depends heavily on the quality of your data and the appropriateness of the chosen model for your data and task.

In the next chapter, we will explore unsupervised learning, another major type of machine learning. Unlike supervised learning, which involves learning from labeled data, unsupervised learning involves learning from unlabeled data. This presents its own set of challenges and opportunities, which we will discuss in detail.

As you continue your journey into the world of machine learning, remember that the key to success is practice. The more you work with these techniques and the more data you get your hands on, the more comfortable you will become with these tools and the better you will get at extracting valuable insights from data. Happy learning!