Code icon

The App is Under a Quick Maintenance

We apologize for the inconvenience. Please come back later

Menu iconMenu iconData Analysis Foundations with Python
Data Analysis Foundations with Python

Chapter 15: Unsupervised Learning

15.4 Practical Exercises Chapter 15: Unsupervised Learning

Exercise 1: K-means Clustering 

Task: Cluster the following set of 2D points into 2 clusters using K-means.

import numpy as np
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt

# Sample Data
X = np.array([[1, 2],
              [5, 8],
              [1.5, 1.8],
              [8, 8],
              [1, 0.6],
              [9, 11]])

# Implement K-means
kmeans = KMeans(n_clusters=2)
kmeans.fit(X)
labels = kmeans.labels_

# Visualizing the clusters
for i in range(len(X)):
    plt.scatter(X[i][0], X[i][1], c=['r','g'][labels[i]])
plt.scatter(kmeans.cluster_centers_[:,0], kmeans.cluster_centers_[:,1], marker='x')
plt.show()

Questions:

  1. What are the coordinates of the cluster centers?
  2. How does the number of clusters affect the result?

Exercise 2: Principal Component Analysis (PCA)

Task: Apply PCA to reduce the dimensions of the Iris dataset and then plot it.

from sklearn.decomposition import PCA
from sklearn.datasets import load_iris
import seaborn as sns

# Load the dataset
iris = load_iris()
X = iris.data
y = iris.target

# Apply PCA
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X)

# Visualizing the result
sns.scatterplot(x=X_pca[:, 0], y=X_pca[:, 1], hue=y)
plt.show()

Questions:

  1. How does PCA affect the interpretability of the data?
  2. What are the first and second principal components?

Exercise 3: Anomaly Detection with Isolation Forest

Task: Detect anomalies in a simple dataset using Isolation Forest.

from sklearn.ensemble import IsolationForest

# Sample data (10 normal points and 2 anomalies)
X = np.array([[1, 1],
              [2, 2],
              [3, 3],
              [4, 4],
              [5, 5],
              [6, 6],
              [7, 7],
              [8, 8],
              [100, 100],
              [200, 200]])

# Apply Isolation Forest
clf = IsolationForest(contamination=0.2)
clf.fit(X)
predictions = clf.predict(X)

# Print predictions (-1 indicates anomaly)
print("Predictions:", predictions)

Questions:

  1. Which points were classified as anomalies?
  2. How does the 'contamination' parameter affect the outcome?

Feel free to adjust the code as you like, try different parameter settings, and explore how they affect your results. Remember, the more you practice, the more comfortable you'll become with these powerful techniques. Happy coding! 

15.4 Practical Exercises Chapter 15: Unsupervised Learning

Exercise 1: K-means Clustering 

Task: Cluster the following set of 2D points into 2 clusters using K-means.

import numpy as np
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt

# Sample Data
X = np.array([[1, 2],
              [5, 8],
              [1.5, 1.8],
              [8, 8],
              [1, 0.6],
              [9, 11]])

# Implement K-means
kmeans = KMeans(n_clusters=2)
kmeans.fit(X)
labels = kmeans.labels_

# Visualizing the clusters
for i in range(len(X)):
    plt.scatter(X[i][0], X[i][1], c=['r','g'][labels[i]])
plt.scatter(kmeans.cluster_centers_[:,0], kmeans.cluster_centers_[:,1], marker='x')
plt.show()

Questions:

  1. What are the coordinates of the cluster centers?
  2. How does the number of clusters affect the result?

Exercise 2: Principal Component Analysis (PCA)

Task: Apply PCA to reduce the dimensions of the Iris dataset and then plot it.

from sklearn.decomposition import PCA
from sklearn.datasets import load_iris
import seaborn as sns

# Load the dataset
iris = load_iris()
X = iris.data
y = iris.target

# Apply PCA
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X)

# Visualizing the result
sns.scatterplot(x=X_pca[:, 0], y=X_pca[:, 1], hue=y)
plt.show()

Questions:

  1. How does PCA affect the interpretability of the data?
  2. What are the first and second principal components?

Exercise 3: Anomaly Detection with Isolation Forest

Task: Detect anomalies in a simple dataset using Isolation Forest.

from sklearn.ensemble import IsolationForest

# Sample data (10 normal points and 2 anomalies)
X = np.array([[1, 1],
              [2, 2],
              [3, 3],
              [4, 4],
              [5, 5],
              [6, 6],
              [7, 7],
              [8, 8],
              [100, 100],
              [200, 200]])

# Apply Isolation Forest
clf = IsolationForest(contamination=0.2)
clf.fit(X)
predictions = clf.predict(X)

# Print predictions (-1 indicates anomaly)
print("Predictions:", predictions)

Questions:

  1. Which points were classified as anomalies?
  2. How does the 'contamination' parameter affect the outcome?

Feel free to adjust the code as you like, try different parameter settings, and explore how they affect your results. Remember, the more you practice, the more comfortable you'll become with these powerful techniques. Happy coding! 

15.4 Practical Exercises Chapter 15: Unsupervised Learning

Exercise 1: K-means Clustering 

Task: Cluster the following set of 2D points into 2 clusters using K-means.

import numpy as np
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt

# Sample Data
X = np.array([[1, 2],
              [5, 8],
              [1.5, 1.8],
              [8, 8],
              [1, 0.6],
              [9, 11]])

# Implement K-means
kmeans = KMeans(n_clusters=2)
kmeans.fit(X)
labels = kmeans.labels_

# Visualizing the clusters
for i in range(len(X)):
    plt.scatter(X[i][0], X[i][1], c=['r','g'][labels[i]])
plt.scatter(kmeans.cluster_centers_[:,0], kmeans.cluster_centers_[:,1], marker='x')
plt.show()

Questions:

  1. What are the coordinates of the cluster centers?
  2. How does the number of clusters affect the result?

Exercise 2: Principal Component Analysis (PCA)

Task: Apply PCA to reduce the dimensions of the Iris dataset and then plot it.

from sklearn.decomposition import PCA
from sklearn.datasets import load_iris
import seaborn as sns

# Load the dataset
iris = load_iris()
X = iris.data
y = iris.target

# Apply PCA
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X)

# Visualizing the result
sns.scatterplot(x=X_pca[:, 0], y=X_pca[:, 1], hue=y)
plt.show()

Questions:

  1. How does PCA affect the interpretability of the data?
  2. What are the first and second principal components?

Exercise 3: Anomaly Detection with Isolation Forest

Task: Detect anomalies in a simple dataset using Isolation Forest.

from sklearn.ensemble import IsolationForest

# Sample data (10 normal points and 2 anomalies)
X = np.array([[1, 1],
              [2, 2],
              [3, 3],
              [4, 4],
              [5, 5],
              [6, 6],
              [7, 7],
              [8, 8],
              [100, 100],
              [200, 200]])

# Apply Isolation Forest
clf = IsolationForest(contamination=0.2)
clf.fit(X)
predictions = clf.predict(X)

# Print predictions (-1 indicates anomaly)
print("Predictions:", predictions)

Questions:

  1. Which points were classified as anomalies?
  2. How does the 'contamination' parameter affect the outcome?

Feel free to adjust the code as you like, try different parameter settings, and explore how they affect your results. Remember, the more you practice, the more comfortable you'll become with these powerful techniques. Happy coding! 

15.4 Practical Exercises Chapter 15: Unsupervised Learning

Exercise 1: K-means Clustering 

Task: Cluster the following set of 2D points into 2 clusters using K-means.

import numpy as np
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt

# Sample Data
X = np.array([[1, 2],
              [5, 8],
              [1.5, 1.8],
              [8, 8],
              [1, 0.6],
              [9, 11]])

# Implement K-means
kmeans = KMeans(n_clusters=2)
kmeans.fit(X)
labels = kmeans.labels_

# Visualizing the clusters
for i in range(len(X)):
    plt.scatter(X[i][0], X[i][1], c=['r','g'][labels[i]])
plt.scatter(kmeans.cluster_centers_[:,0], kmeans.cluster_centers_[:,1], marker='x')
plt.show()

Questions:

  1. What are the coordinates of the cluster centers?
  2. How does the number of clusters affect the result?

Exercise 2: Principal Component Analysis (PCA)

Task: Apply PCA to reduce the dimensions of the Iris dataset and then plot it.

from sklearn.decomposition import PCA
from sklearn.datasets import load_iris
import seaborn as sns

# Load the dataset
iris = load_iris()
X = iris.data
y = iris.target

# Apply PCA
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X)

# Visualizing the result
sns.scatterplot(x=X_pca[:, 0], y=X_pca[:, 1], hue=y)
plt.show()

Questions:

  1. How does PCA affect the interpretability of the data?
  2. What are the first and second principal components?

Exercise 3: Anomaly Detection with Isolation Forest

Task: Detect anomalies in a simple dataset using Isolation Forest.

from sklearn.ensemble import IsolationForest

# Sample data (10 normal points and 2 anomalies)
X = np.array([[1, 1],
              [2, 2],
              [3, 3],
              [4, 4],
              [5, 5],
              [6, 6],
              [7, 7],
              [8, 8],
              [100, 100],
              [200, 200]])

# Apply Isolation Forest
clf = IsolationForest(contamination=0.2)
clf.fit(X)
predictions = clf.predict(X)

# Print predictions (-1 indicates anomaly)
print("Predictions:", predictions)

Questions:

  1. Which points were classified as anomalies?
  2. How does the 'contamination' parameter affect the outcome?

Feel free to adjust the code as you like, try different parameter settings, and explore how they affect your results. Remember, the more you practice, the more comfortable you'll become with these powerful techniques. Happy coding!