# Chapter 4: Supervised Learning

## 4.4 Practical Exercises of Chapter 4: Supervised Learning

**Exercise 1: Regression Analysis**

Using the Boston Housing dataset available in Scikit-learn, perform a simple linear regression analysis to predict the median value of owner-occupied homes. Evaluate your model using the MAE, MSE, RMSE, and R-squared metrics.

`import pandas as pd`

from sklearn.datasets import load_boston

from sklearn.linear_model import LinearRegression

from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

import numpy as np

# Load the Boston Housing dataset

boston = load_boston()

# Create a DataFrame

df = pd.DataFrame(boston.data, columns=boston.feature_names)

df['MEDV'] = boston.target

# Create a LinearRegression model

model = LinearRegression()

# Fit the model

model.fit(df[['RM']], df['MEDV'])

# Predict new values

predictions = model.predict(df[['RM']])

# Calculate metrics

mae = mean_absolute_error(df['MEDV'], predictions)

mse = mean_squared_error(df['MEDV'], predictions)

rmse = np.sqrt(mse)

r2 = r2_score(df['MEDV'], predictions)

print("MAE:", mae)

print("MSE:", mse)

print("RMSE:", rmse)

print("R-squared:", r2)

**Exercise 2: Classification Techniques**

Using the Iris dataset available in Scikit-learn, perform a logistic regression analysis to predict the species of iris. Evaluate your model using the accuracy, precision, recall, F1 score, and ROC AUC score metrics.

`from sklearn.datasets import load_iris`

from sklearn.linear_model import LogisticRegression

from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, roc_auc_score

from sklearn.preprocessing import LabelBinarizer

# Load the Iris dataset

iris = load_iris()

# Create a DataFrame

df = pd.DataFrame(iris.data, columns=iris.feature_names)

df['species'] = iris.target

# Create a LogisticRegression model

model = LogisticRegression()

# Fit the model

model.fit(df[['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']], df['species'])

# Predict new values

predictions = model.predict(df[['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']])

# Calculate metrics

accuracy = accuracy_score(df['species'], predictions)

precision = precision_score(df['species'], predictions, average='macro')

recall = recall_score(df['species'], predictions, average='macro')

f1 = f1_score(df['species'], predictions, average='macro')

# Binarize the output

lb = LabelBinarizer()

y_true_bin = lb.fit_transform(df['species'])

y_pred_bin = lb.transform(predictions)

roc_auc = roc_auc_score(y_true_bin, y_pred_bin, multi_class='ovr')

print("Accuracy:", accuracy)

print("Precision:", precision)

print("Recall:", recall)

print("F1 Score:", f1)

print("ROC AUC Score:", roc_auc)

These exercises will allow you to apply the concepts learned in this chapter and gain hands-on experience with regression and classification techniques, as well as evaluation metrics for supervised learning.

**Chapter 4 Conclusion**

In this chapter, we delved into the world of supervised learning, one of the most widely used types of machine learning. We started by exploring regression analysis, a statistical method used to predict a continuous outcome. We learned about simple and multiple linear regression, and how these techniques can be used to model the relationship between a dependent variable and one or more independent variables. We also discussed the evaluation metrics used for regression models, including Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and R-squared.

Next, we turned our attention to classification techniques, which are used to predict a categorical outcome. We discussed several popular classification algorithms, including logistic regression, decision trees, support vector machines, k-nearest neighbors, random forest, and gradient boosting. Each of these techniques has its strengths and weaknesses, and the choice of which to use depends on the specific problem and data at hand.

We also discussed the evaluation metrics used for classification models, including accuracy, precision, recall, F1 score, and Area Under the Receiver Operating Characteristic (ROC) Curve. We emphasized the importance of understanding these metrics and using them correctly, as each provides a different perspective on the model's performance.

Finally, we provided practical exercises for you to apply the concepts learned in this chapter. These exercises involved performing regression and classification analyses on real-world datasets and evaluating the performance of the models using the appropriate metrics.

As we conclude this chapter, it's important to remember that supervised learning is a powerful tool, but it's not without its challenges. Issues such as overfitting, underfitting, and bias-variance tradeoff can affect the performance of your models. Furthermore, the quality of your results depends heavily on the quality of your data and the appropriateness of the chosen model for your data and task.

In the next chapter, we will explore unsupervised learning, another major type of machine learning. Unlike supervised learning, which involves learning from labeled data, unsupervised learning involves learning from unlabeled data. This presents its own set of challenges and opportunities, which we will discuss in detail.

As you continue your journey into the world of machine learning, remember that the key to success is practice. The more you work with these techniques and the more data you get your hands on, the more comfortable you will become with these tools and the better you will get at extracting valuable insights from data. Happy learning!

## 4.4 Practical Exercises of Chapter 4: Supervised Learning

**Exercise 1: Regression Analysis**

Using the Boston Housing dataset available in Scikit-learn, perform a simple linear regression analysis to predict the median value of owner-occupied homes. Evaluate your model using the MAE, MSE, RMSE, and R-squared metrics.

`import pandas as pd`

from sklearn.datasets import load_boston

from sklearn.linear_model import LinearRegression

from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

import numpy as np

# Load the Boston Housing dataset

boston = load_boston()

# Create a DataFrame

df = pd.DataFrame(boston.data, columns=boston.feature_names)

df['MEDV'] = boston.target

# Create a LinearRegression model

model = LinearRegression()

# Fit the model

model.fit(df[['RM']], df['MEDV'])

# Predict new values

predictions = model.predict(df[['RM']])

# Calculate metrics

mae = mean_absolute_error(df['MEDV'], predictions)

mse = mean_squared_error(df['MEDV'], predictions)

rmse = np.sqrt(mse)

r2 = r2_score(df['MEDV'], predictions)

print("MAE:", mae)

print("MSE:", mse)

print("RMSE:", rmse)

print("R-squared:", r2)

**Exercise 2: Classification Techniques**

Using the Iris dataset available in Scikit-learn, perform a logistic regression analysis to predict the species of iris. Evaluate your model using the accuracy, precision, recall, F1 score, and ROC AUC score metrics.

`from sklearn.datasets import load_iris`

from sklearn.linear_model import LogisticRegression

from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, roc_auc_score

from sklearn.preprocessing import LabelBinarizer

# Load the Iris dataset

iris = load_iris()

# Create a DataFrame

df = pd.DataFrame(iris.data, columns=iris.feature_names)

df['species'] = iris.target

# Create a LogisticRegression model

model = LogisticRegression()

# Fit the model

model.fit(df[['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']], df['species'])

# Predict new values

predictions = model.predict(df[['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']])

# Calculate metrics

accuracy = accuracy_score(df['species'], predictions)

precision = precision_score(df['species'], predictions, average='macro')

recall = recall_score(df['species'], predictions, average='macro')

f1 = f1_score(df['species'], predictions, average='macro')

# Binarize the output

lb = LabelBinarizer()

y_true_bin = lb.fit_transform(df['species'])

y_pred_bin = lb.transform(predictions)

roc_auc = roc_auc_score(y_true_bin, y_pred_bin, multi_class='ovr')

print("Accuracy:", accuracy)

print("Precision:", precision)

print("Recall:", recall)

print("F1 Score:", f1)

print("ROC AUC Score:", roc_auc)

These exercises will allow you to apply the concepts learned in this chapter and gain hands-on experience with regression and classification techniques, as well as evaluation metrics for supervised learning.

**Chapter 4 Conclusion**

In this chapter, we delved into the world of supervised learning, one of the most widely used types of machine learning. We started by exploring regression analysis, a statistical method used to predict a continuous outcome. We learned about simple and multiple linear regression, and how these techniques can be used to model the relationship between a dependent variable and one or more independent variables. We also discussed the evaluation metrics used for regression models, including Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and R-squared.

Next, we turned our attention to classification techniques, which are used to predict a categorical outcome. We discussed several popular classification algorithms, including logistic regression, decision trees, support vector machines, k-nearest neighbors, random forest, and gradient boosting. Each of these techniques has its strengths and weaknesses, and the choice of which to use depends on the specific problem and data at hand.

We also discussed the evaluation metrics used for classification models, including accuracy, precision, recall, F1 score, and Area Under the Receiver Operating Characteristic (ROC) Curve. We emphasized the importance of understanding these metrics and using them correctly, as each provides a different perspective on the model's performance.

Finally, we provided practical exercises for you to apply the concepts learned in this chapter. These exercises involved performing regression and classification analyses on real-world datasets and evaluating the performance of the models using the appropriate metrics.

As we conclude this chapter, it's important to remember that supervised learning is a powerful tool, but it's not without its challenges. Issues such as overfitting, underfitting, and bias-variance tradeoff can affect the performance of your models. Furthermore, the quality of your results depends heavily on the quality of your data and the appropriateness of the chosen model for your data and task.

In the next chapter, we will explore unsupervised learning, another major type of machine learning. Unlike supervised learning, which involves learning from labeled data, unsupervised learning involves learning from unlabeled data. This presents its own set of challenges and opportunities, which we will discuss in detail.

As you continue your journey into the world of machine learning, remember that the key to success is practice. The more you work with these techniques and the more data you get your hands on, the more comfortable you will become with these tools and the better you will get at extracting valuable insights from data. Happy learning!

## 4.4 Practical Exercises of Chapter 4: Supervised Learning

**Exercise 1: Regression Analysis**

Using the Boston Housing dataset available in Scikit-learn, perform a simple linear regression analysis to predict the median value of owner-occupied homes. Evaluate your model using the MAE, MSE, RMSE, and R-squared metrics.

`import pandas as pd`

from sklearn.datasets import load_boston

from sklearn.linear_model import LinearRegression

from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

import numpy as np

# Load the Boston Housing dataset

boston = load_boston()

# Create a DataFrame

df = pd.DataFrame(boston.data, columns=boston.feature_names)

df['MEDV'] = boston.target

# Create a LinearRegression model

model = LinearRegression()

# Fit the model

model.fit(df[['RM']], df['MEDV'])

# Predict new values

predictions = model.predict(df[['RM']])

# Calculate metrics

mae = mean_absolute_error(df['MEDV'], predictions)

mse = mean_squared_error(df['MEDV'], predictions)

rmse = np.sqrt(mse)

r2 = r2_score(df['MEDV'], predictions)

print("MAE:", mae)

print("MSE:", mse)

print("RMSE:", rmse)

print("R-squared:", r2)

**Exercise 2: Classification Techniques**

Using the Iris dataset available in Scikit-learn, perform a logistic regression analysis to predict the species of iris. Evaluate your model using the accuracy, precision, recall, F1 score, and ROC AUC score metrics.

`from sklearn.datasets import load_iris`

from sklearn.linear_model import LogisticRegression

from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, roc_auc_score

from sklearn.preprocessing import LabelBinarizer

# Load the Iris dataset

iris = load_iris()

# Create a DataFrame

df = pd.DataFrame(iris.data, columns=iris.feature_names)

df['species'] = iris.target

# Create a LogisticRegression model

model = LogisticRegression()

# Fit the model

model.fit(df[['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']], df['species'])

# Predict new values

predictions = model.predict(df[['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']])

# Calculate metrics

accuracy = accuracy_score(df['species'], predictions)

precision = precision_score(df['species'], predictions, average='macro')

recall = recall_score(df['species'], predictions, average='macro')

f1 = f1_score(df['species'], predictions, average='macro')

# Binarize the output

lb = LabelBinarizer()

y_true_bin = lb.fit_transform(df['species'])

y_pred_bin = lb.transform(predictions)

roc_auc = roc_auc_score(y_true_bin, y_pred_bin, multi_class='ovr')

print("Accuracy:", accuracy)

print("Precision:", precision)

print("Recall:", recall)

print("F1 Score:", f1)

print("ROC AUC Score:", roc_auc)

These exercises will allow you to apply the concepts learned in this chapter and gain hands-on experience with regression and classification techniques, as well as evaluation metrics for supervised learning.

**Chapter 4 Conclusion**

In this chapter, we delved into the world of supervised learning, one of the most widely used types of machine learning. We started by exploring regression analysis, a statistical method used to predict a continuous outcome. We learned about simple and multiple linear regression, and how these techniques can be used to model the relationship between a dependent variable and one or more independent variables. We also discussed the evaluation metrics used for regression models, including Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and R-squared.

Next, we turned our attention to classification techniques, which are used to predict a categorical outcome. We discussed several popular classification algorithms, including logistic regression, decision trees, support vector machines, k-nearest neighbors, random forest, and gradient boosting. Each of these techniques has its strengths and weaknesses, and the choice of which to use depends on the specific problem and data at hand.

We also discussed the evaluation metrics used for classification models, including accuracy, precision, recall, F1 score, and Area Under the Receiver Operating Characteristic (ROC) Curve. We emphasized the importance of understanding these metrics and using them correctly, as each provides a different perspective on the model's performance.

Finally, we provided practical exercises for you to apply the concepts learned in this chapter. These exercises involved performing regression and classification analyses on real-world datasets and evaluating the performance of the models using the appropriate metrics.

As we conclude this chapter, it's important to remember that supervised learning is a powerful tool, but it's not without its challenges. Issues such as overfitting, underfitting, and bias-variance tradeoff can affect the performance of your models. Furthermore, the quality of your results depends heavily on the quality of your data and the appropriateness of the chosen model for your data and task.

In the next chapter, we will explore unsupervised learning, another major type of machine learning. Unlike supervised learning, which involves learning from labeled data, unsupervised learning involves learning from unlabeled data. This presents its own set of challenges and opportunities, which we will discuss in detail.

As you continue your journey into the world of machine learning, remember that the key to success is practice. The more you work with these techniques and the more data you get your hands on, the more comfortable you will become with these tools and the better you will get at extracting valuable insights from data. Happy learning!

## 4.4 Practical Exercises of Chapter 4: Supervised Learning

**Exercise 1: Regression Analysis**

`import pandas as pd`

from sklearn.datasets import load_boston

from sklearn.linear_model import LinearRegression

from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

import numpy as np

# Load the Boston Housing dataset

boston = load_boston()

# Create a DataFrame

df = pd.DataFrame(boston.data, columns=boston.feature_names)

df['MEDV'] = boston.target

# Create a LinearRegression model

model = LinearRegression()

# Fit the model

model.fit(df[['RM']], df['MEDV'])

# Predict new values

predictions = model.predict(df[['RM']])

# Calculate metrics

mae = mean_absolute_error(df['MEDV'], predictions)

mse = mean_squared_error(df['MEDV'], predictions)

rmse = np.sqrt(mse)

r2 = r2_score(df['MEDV'], predictions)

print("MAE:", mae)

print("MSE:", mse)

print("RMSE:", rmse)

print("R-squared:", r2)

**Exercise 2: Classification Techniques**

`from sklearn.datasets import load_iris`

from sklearn.linear_model import LogisticRegression

from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, roc_auc_score

from sklearn.preprocessing import LabelBinarizer

# Load the Iris dataset

iris = load_iris()

# Create a DataFrame

df = pd.DataFrame(iris.data, columns=iris.feature_names)

df['species'] = iris.target

# Create a LogisticRegression model

model = LogisticRegression()

# Fit the model

model.fit(df[['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']], df['species'])

# Predict new values

predictions = model.predict(df[['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']])

# Calculate metrics

accuracy = accuracy_score(df['species'], predictions)

precision = precision_score(df['species'], predictions, average='macro')

recall = recall_score(df['species'], predictions, average='macro')

f1 = f1_score(df['species'], predictions, average='macro')

# Binarize the output

lb = LabelBinarizer()

y_true_bin = lb.fit_transform(df['species'])

y_pred_bin = lb.transform(predictions)

roc_auc = roc_auc_score(y_true_bin, y_pred_bin, multi_class='ovr')

print("Accuracy:", accuracy)

print("Precision:", precision)

print("Recall:", recall)

print("F1 Score:", f1)

print("ROC AUC Score:", roc_auc)