Code icon

The App is Under a Quick Maintenance

We apologize for the inconvenience. Please come back later

Menu iconMenu iconMachine Learning Hero
Machine Learning Hero

Chapter 5: Unsupervised Learning Techniques

Practical Exercises Chapter 5

Exercise 1: K-Means Clustering

Task: You are given a synthetic dataset containing two features. Use K-Means clustering to group the data into three clusters and visualize the clusters with their centroids.

Solution:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans

# Data: Features 1 and 2
X = np.array([[2.5, 3.1], [1.8, 2.3], [3.4, 3.0], [4.1, 4.2], [1.9, 2.8],
              [3.6, 3.7], [2.2, 3.5], [4.0, 4.5]])

# Apply K-Means clustering with 3 clusters
kmeans = KMeans(n_clusters=3, random_state=42)
kmeans.fit(X)

# Get cluster labels and centroids
labels = kmeans.labels_
centroids = kmeans.cluster_centers_

# Plot the clusters and centroids
plt.scatter(X[:, 0], X[:, 1], c=labels, s=50, cmap='viridis')
plt.scatter(centroids[:, 0], centroids[:, 1], s=200, c='red', marker='x')
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.title("K-Means Clustering")
plt.show()

Exercise 2: Dimensionality Reduction with PCA

Task: You have a dataset with five features. Use Principal Component Analysis (PCA) to reduce the dimensionality to two components and visualize the 2D projection of the data.

Solution:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler

# Data: 5 features
X = np.array([[2.5, 1.2, 3.4, 0.8, 1.5],
              [1.9, 2.1, 1.8, 2.3, 0.7],
              [3.1, 2.5, 2.2, 1.8, 2.0],
              [2.2, 3.4, 2.9, 3.1, 1.8],
              [4.5, 4.0, 3.5, 2.9, 2.7]])

# Standardize the data
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Apply PCA to reduce to 2 components
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_scaled)

# Plot the 2D projection
plt.scatter(X_pca[:, 0], X_pca[:, 1], s=100)
plt.xlabel("Principal Component 1")
plt.ylabel("Principal Component 2")
plt.title("PCA Projection")
plt.show()

Exercise 3: t-SNE for Dimensionality Reduction

Task: Use t-SNE to reduce the dimensionality of a dataset with three features to two dimensions. Visualize the 2D t-SNE projection.

Solution:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.manifold import TSNE
from sklearn.preprocessing import StandardScaler

# Data: 3 features
X = np.array([[2.1, 3.2, 1.1],
              [1.8, 2.5, 3.6],
              [3.0, 3.1, 1.5],
              [2.5, 2.9, 0.8],
              [1.9, 2.4, 3.2]])

# Standardize the data
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Apply t-SNE to reduce to 2 dimensions
tsne = TSNE(n_components=2, random_state=42)
X_tsne = tsne.fit_transform(X_scaled)

# Plot the 2D t-SNE projection
plt.scatter(X_tsne[:, 0], X_tsne[:, 1], s=100)
plt.xlabel("t-SNE Dimension 1")
plt.ylabel("t-SNE Dimension 2")
plt.title("t-SNE Projection")
plt.show()

Exercise 4: UMAP for Dimensionality Reduction

Task: Use UMAP to reduce the dimensionality of a dataset with four features to two dimensions. Visualize the 2D UMAP projection.

Solution:

import umap
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler

# Data: 4 features
X = np.array([[3.1, 2.0, 3.8, 4.0],
              [1.9, 1.5, 3.1, 2.3],
              [2.8, 3.0, 1.5, 3.8],
              [3.4, 2.9, 2.7, 3.5],
              [2.1, 1.8, 2.9, 2.6]])

# Standardize the data
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Apply UMAP to reduce to 2 dimensions
umap_model = umap.UMAP(n_neighbors=5, min_dist=0.3, random_state=42)
X_umap = umap_model.fit_transform(X_scaled)

# Plot the 2D UMAP projection
plt.scatter(X_umap[:, 0], X_umap[:, 1], s=100)
plt.xlabel("UMAP Dimension 1")
plt.ylabel("UMAP Dimension 2")
plt.title("UMAP Projection")
plt.show()

Exercise 5: Clustering Evaluation with Silhouette Score

Task: Apply K-Means clustering to the following dataset and calculate the Silhouette Score to evaluate the clustering performance.

Solution:

from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score
import numpy as np

# Data: Features 1 and 2
X = np.array([[2.5, 3.5], [3.1, 2.9], [1.8, 2.7], [4.2, 3.6], [3.5, 4.0],
              [1.9, 3.3], [4.5, 3.2], [2.0, 2.8]])

# Apply K-Means clustering with 3 clusters
kmeans = KMeans(n_clusters=3, random_state=42)
kmeans.fit(X)
labels = kmeans.labels_

# Calculate the Silhouette Score
silhouette_avg = silhouette_score(X, labels)
print(f"Silhouette Score: {silhouette_avg:.2f}")

Exercise 6: Dimensionality Reduction Evaluation with Explained Variance

Task: Apply PCA to reduce the following dataset from five features to three components. Calculate and plot the explained variance ratio for each component.

Solution:

from sklearn.decomposition import PCA
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler

# Data: 5 features
X = np.array([[2.5, 3.1, 2.8, 4.0, 2.1],
              [3.0, 2.7, 1.9, 2.8, 3.6],
              [1.9, 2.3, 3.7, 3.4, 2.9],
              [4.2, 3.6, 4.1, 2.9, 3.5],
              [3.6, 4.0, 2.9, 2.2, 3.0]])

# Standardize the data
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Apply PCA to reduce to 3 components
pca = PCA(n_components=3)
pca.fit(X_scaled)

# Plot the explained variance ratio for each component
explained_variance = pca.explained_variance_ratio_
plt.bar(range(1, 4), explained_variance, tick_label=["PC1", "PC2", "PC3"])
plt.xlabel("Principal Components")
plt.ylabel("Explained Variance Ratio")
plt.title("Explained Variance by Principal Components")
plt.show()

These practical exercises showcase a range of unsupervised learning techniques, including clustering, dimensionality reduction, and evaluation metrics. Each exercise reinforces key concepts from Chapter 5, offering you hands-on experience in implementing and assessing these powerful methods. 

Practical Exercises Chapter 5

Exercise 1: K-Means Clustering

Task: You are given a synthetic dataset containing two features. Use K-Means clustering to group the data into three clusters and visualize the clusters with their centroids.

Solution:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans

# Data: Features 1 and 2
X = np.array([[2.5, 3.1], [1.8, 2.3], [3.4, 3.0], [4.1, 4.2], [1.9, 2.8],
              [3.6, 3.7], [2.2, 3.5], [4.0, 4.5]])

# Apply K-Means clustering with 3 clusters
kmeans = KMeans(n_clusters=3, random_state=42)
kmeans.fit(X)

# Get cluster labels and centroids
labels = kmeans.labels_
centroids = kmeans.cluster_centers_

# Plot the clusters and centroids
plt.scatter(X[:, 0], X[:, 1], c=labels, s=50, cmap='viridis')
plt.scatter(centroids[:, 0], centroids[:, 1], s=200, c='red', marker='x')
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.title("K-Means Clustering")
plt.show()

Exercise 2: Dimensionality Reduction with PCA

Task: You have a dataset with five features. Use Principal Component Analysis (PCA) to reduce the dimensionality to two components and visualize the 2D projection of the data.

Solution:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler

# Data: 5 features
X = np.array([[2.5, 1.2, 3.4, 0.8, 1.5],
              [1.9, 2.1, 1.8, 2.3, 0.7],
              [3.1, 2.5, 2.2, 1.8, 2.0],
              [2.2, 3.4, 2.9, 3.1, 1.8],
              [4.5, 4.0, 3.5, 2.9, 2.7]])

# Standardize the data
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Apply PCA to reduce to 2 components
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_scaled)

# Plot the 2D projection
plt.scatter(X_pca[:, 0], X_pca[:, 1], s=100)
plt.xlabel("Principal Component 1")
plt.ylabel("Principal Component 2")
plt.title("PCA Projection")
plt.show()

Exercise 3: t-SNE for Dimensionality Reduction

Task: Use t-SNE to reduce the dimensionality of a dataset with three features to two dimensions. Visualize the 2D t-SNE projection.

Solution:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.manifold import TSNE
from sklearn.preprocessing import StandardScaler

# Data: 3 features
X = np.array([[2.1, 3.2, 1.1],
              [1.8, 2.5, 3.6],
              [3.0, 3.1, 1.5],
              [2.5, 2.9, 0.8],
              [1.9, 2.4, 3.2]])

# Standardize the data
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Apply t-SNE to reduce to 2 dimensions
tsne = TSNE(n_components=2, random_state=42)
X_tsne = tsne.fit_transform(X_scaled)

# Plot the 2D t-SNE projection
plt.scatter(X_tsne[:, 0], X_tsne[:, 1], s=100)
plt.xlabel("t-SNE Dimension 1")
plt.ylabel("t-SNE Dimension 2")
plt.title("t-SNE Projection")
plt.show()

Exercise 4: UMAP for Dimensionality Reduction

Task: Use UMAP to reduce the dimensionality of a dataset with four features to two dimensions. Visualize the 2D UMAP projection.

Solution:

import umap
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler

# Data: 4 features
X = np.array([[3.1, 2.0, 3.8, 4.0],
              [1.9, 1.5, 3.1, 2.3],
              [2.8, 3.0, 1.5, 3.8],
              [3.4, 2.9, 2.7, 3.5],
              [2.1, 1.8, 2.9, 2.6]])

# Standardize the data
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Apply UMAP to reduce to 2 dimensions
umap_model = umap.UMAP(n_neighbors=5, min_dist=0.3, random_state=42)
X_umap = umap_model.fit_transform(X_scaled)

# Plot the 2D UMAP projection
plt.scatter(X_umap[:, 0], X_umap[:, 1], s=100)
plt.xlabel("UMAP Dimension 1")
plt.ylabel("UMAP Dimension 2")
plt.title("UMAP Projection")
plt.show()

Exercise 5: Clustering Evaluation with Silhouette Score

Task: Apply K-Means clustering to the following dataset and calculate the Silhouette Score to evaluate the clustering performance.

Solution:

from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score
import numpy as np

# Data: Features 1 and 2
X = np.array([[2.5, 3.5], [3.1, 2.9], [1.8, 2.7], [4.2, 3.6], [3.5, 4.0],
              [1.9, 3.3], [4.5, 3.2], [2.0, 2.8]])

# Apply K-Means clustering with 3 clusters
kmeans = KMeans(n_clusters=3, random_state=42)
kmeans.fit(X)
labels = kmeans.labels_

# Calculate the Silhouette Score
silhouette_avg = silhouette_score(X, labels)
print(f"Silhouette Score: {silhouette_avg:.2f}")

Exercise 6: Dimensionality Reduction Evaluation with Explained Variance

Task: Apply PCA to reduce the following dataset from five features to three components. Calculate and plot the explained variance ratio for each component.

Solution:

from sklearn.decomposition import PCA
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler

# Data: 5 features
X = np.array([[2.5, 3.1, 2.8, 4.0, 2.1],
              [3.0, 2.7, 1.9, 2.8, 3.6],
              [1.9, 2.3, 3.7, 3.4, 2.9],
              [4.2, 3.6, 4.1, 2.9, 3.5],
              [3.6, 4.0, 2.9, 2.2, 3.0]])

# Standardize the data
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Apply PCA to reduce to 3 components
pca = PCA(n_components=3)
pca.fit(X_scaled)

# Plot the explained variance ratio for each component
explained_variance = pca.explained_variance_ratio_
plt.bar(range(1, 4), explained_variance, tick_label=["PC1", "PC2", "PC3"])
plt.xlabel("Principal Components")
plt.ylabel("Explained Variance Ratio")
plt.title("Explained Variance by Principal Components")
plt.show()

These practical exercises showcase a range of unsupervised learning techniques, including clustering, dimensionality reduction, and evaluation metrics. Each exercise reinforces key concepts from Chapter 5, offering you hands-on experience in implementing and assessing these powerful methods. 

Practical Exercises Chapter 5

Exercise 1: K-Means Clustering

Task: You are given a synthetic dataset containing two features. Use K-Means clustering to group the data into three clusters and visualize the clusters with their centroids.

Solution:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans

# Data: Features 1 and 2
X = np.array([[2.5, 3.1], [1.8, 2.3], [3.4, 3.0], [4.1, 4.2], [1.9, 2.8],
              [3.6, 3.7], [2.2, 3.5], [4.0, 4.5]])

# Apply K-Means clustering with 3 clusters
kmeans = KMeans(n_clusters=3, random_state=42)
kmeans.fit(X)

# Get cluster labels and centroids
labels = kmeans.labels_
centroids = kmeans.cluster_centers_

# Plot the clusters and centroids
plt.scatter(X[:, 0], X[:, 1], c=labels, s=50, cmap='viridis')
plt.scatter(centroids[:, 0], centroids[:, 1], s=200, c='red', marker='x')
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.title("K-Means Clustering")
plt.show()

Exercise 2: Dimensionality Reduction with PCA

Task: You have a dataset with five features. Use Principal Component Analysis (PCA) to reduce the dimensionality to two components and visualize the 2D projection of the data.

Solution:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler

# Data: 5 features
X = np.array([[2.5, 1.2, 3.4, 0.8, 1.5],
              [1.9, 2.1, 1.8, 2.3, 0.7],
              [3.1, 2.5, 2.2, 1.8, 2.0],
              [2.2, 3.4, 2.9, 3.1, 1.8],
              [4.5, 4.0, 3.5, 2.9, 2.7]])

# Standardize the data
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Apply PCA to reduce to 2 components
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_scaled)

# Plot the 2D projection
plt.scatter(X_pca[:, 0], X_pca[:, 1], s=100)
plt.xlabel("Principal Component 1")
plt.ylabel("Principal Component 2")
plt.title("PCA Projection")
plt.show()

Exercise 3: t-SNE for Dimensionality Reduction

Task: Use t-SNE to reduce the dimensionality of a dataset with three features to two dimensions. Visualize the 2D t-SNE projection.

Solution:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.manifold import TSNE
from sklearn.preprocessing import StandardScaler

# Data: 3 features
X = np.array([[2.1, 3.2, 1.1],
              [1.8, 2.5, 3.6],
              [3.0, 3.1, 1.5],
              [2.5, 2.9, 0.8],
              [1.9, 2.4, 3.2]])

# Standardize the data
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Apply t-SNE to reduce to 2 dimensions
tsne = TSNE(n_components=2, random_state=42)
X_tsne = tsne.fit_transform(X_scaled)

# Plot the 2D t-SNE projection
plt.scatter(X_tsne[:, 0], X_tsne[:, 1], s=100)
plt.xlabel("t-SNE Dimension 1")
plt.ylabel("t-SNE Dimension 2")
plt.title("t-SNE Projection")
plt.show()

Exercise 4: UMAP for Dimensionality Reduction

Task: Use UMAP to reduce the dimensionality of a dataset with four features to two dimensions. Visualize the 2D UMAP projection.

Solution:

import umap
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler

# Data: 4 features
X = np.array([[3.1, 2.0, 3.8, 4.0],
              [1.9, 1.5, 3.1, 2.3],
              [2.8, 3.0, 1.5, 3.8],
              [3.4, 2.9, 2.7, 3.5],
              [2.1, 1.8, 2.9, 2.6]])

# Standardize the data
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Apply UMAP to reduce to 2 dimensions
umap_model = umap.UMAP(n_neighbors=5, min_dist=0.3, random_state=42)
X_umap = umap_model.fit_transform(X_scaled)

# Plot the 2D UMAP projection
plt.scatter(X_umap[:, 0], X_umap[:, 1], s=100)
plt.xlabel("UMAP Dimension 1")
plt.ylabel("UMAP Dimension 2")
plt.title("UMAP Projection")
plt.show()

Exercise 5: Clustering Evaluation with Silhouette Score

Task: Apply K-Means clustering to the following dataset and calculate the Silhouette Score to evaluate the clustering performance.

Solution:

from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score
import numpy as np

# Data: Features 1 and 2
X = np.array([[2.5, 3.5], [3.1, 2.9], [1.8, 2.7], [4.2, 3.6], [3.5, 4.0],
              [1.9, 3.3], [4.5, 3.2], [2.0, 2.8]])

# Apply K-Means clustering with 3 clusters
kmeans = KMeans(n_clusters=3, random_state=42)
kmeans.fit(X)
labels = kmeans.labels_

# Calculate the Silhouette Score
silhouette_avg = silhouette_score(X, labels)
print(f"Silhouette Score: {silhouette_avg:.2f}")

Exercise 6: Dimensionality Reduction Evaluation with Explained Variance

Task: Apply PCA to reduce the following dataset from five features to three components. Calculate and plot the explained variance ratio for each component.

Solution:

from sklearn.decomposition import PCA
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler

# Data: 5 features
X = np.array([[2.5, 3.1, 2.8, 4.0, 2.1],
              [3.0, 2.7, 1.9, 2.8, 3.6],
              [1.9, 2.3, 3.7, 3.4, 2.9],
              [4.2, 3.6, 4.1, 2.9, 3.5],
              [3.6, 4.0, 2.9, 2.2, 3.0]])

# Standardize the data
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Apply PCA to reduce to 3 components
pca = PCA(n_components=3)
pca.fit(X_scaled)

# Plot the explained variance ratio for each component
explained_variance = pca.explained_variance_ratio_
plt.bar(range(1, 4), explained_variance, tick_label=["PC1", "PC2", "PC3"])
plt.xlabel("Principal Components")
plt.ylabel("Explained Variance Ratio")
plt.title("Explained Variance by Principal Components")
plt.show()

These practical exercises showcase a range of unsupervised learning techniques, including clustering, dimensionality reduction, and evaluation metrics. Each exercise reinforces key concepts from Chapter 5, offering you hands-on experience in implementing and assessing these powerful methods. 

Practical Exercises Chapter 5

Exercise 1: K-Means Clustering

Task: You are given a synthetic dataset containing two features. Use K-Means clustering to group the data into three clusters and visualize the clusters with their centroids.

Solution:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans

# Data: Features 1 and 2
X = np.array([[2.5, 3.1], [1.8, 2.3], [3.4, 3.0], [4.1, 4.2], [1.9, 2.8],
              [3.6, 3.7], [2.2, 3.5], [4.0, 4.5]])

# Apply K-Means clustering with 3 clusters
kmeans = KMeans(n_clusters=3, random_state=42)
kmeans.fit(X)

# Get cluster labels and centroids
labels = kmeans.labels_
centroids = kmeans.cluster_centers_

# Plot the clusters and centroids
plt.scatter(X[:, 0], X[:, 1], c=labels, s=50, cmap='viridis')
plt.scatter(centroids[:, 0], centroids[:, 1], s=200, c='red', marker='x')
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.title("K-Means Clustering")
plt.show()

Exercise 2: Dimensionality Reduction with PCA

Task: You have a dataset with five features. Use Principal Component Analysis (PCA) to reduce the dimensionality to two components and visualize the 2D projection of the data.

Solution:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler

# Data: 5 features
X = np.array([[2.5, 1.2, 3.4, 0.8, 1.5],
              [1.9, 2.1, 1.8, 2.3, 0.7],
              [3.1, 2.5, 2.2, 1.8, 2.0],
              [2.2, 3.4, 2.9, 3.1, 1.8],
              [4.5, 4.0, 3.5, 2.9, 2.7]])

# Standardize the data
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Apply PCA to reduce to 2 components
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_scaled)

# Plot the 2D projection
plt.scatter(X_pca[:, 0], X_pca[:, 1], s=100)
plt.xlabel("Principal Component 1")
plt.ylabel("Principal Component 2")
plt.title("PCA Projection")
plt.show()

Exercise 3: t-SNE for Dimensionality Reduction

Task: Use t-SNE to reduce the dimensionality of a dataset with three features to two dimensions. Visualize the 2D t-SNE projection.

Solution:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.manifold import TSNE
from sklearn.preprocessing import StandardScaler

# Data: 3 features
X = np.array([[2.1, 3.2, 1.1],
              [1.8, 2.5, 3.6],
              [3.0, 3.1, 1.5],
              [2.5, 2.9, 0.8],
              [1.9, 2.4, 3.2]])

# Standardize the data
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Apply t-SNE to reduce to 2 dimensions
tsne = TSNE(n_components=2, random_state=42)
X_tsne = tsne.fit_transform(X_scaled)

# Plot the 2D t-SNE projection
plt.scatter(X_tsne[:, 0], X_tsne[:, 1], s=100)
plt.xlabel("t-SNE Dimension 1")
plt.ylabel("t-SNE Dimension 2")
plt.title("t-SNE Projection")
plt.show()

Exercise 4: UMAP for Dimensionality Reduction

Task: Use UMAP to reduce the dimensionality of a dataset with four features to two dimensions. Visualize the 2D UMAP projection.

Solution:

import umap
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler

# Data: 4 features
X = np.array([[3.1, 2.0, 3.8, 4.0],
              [1.9, 1.5, 3.1, 2.3],
              [2.8, 3.0, 1.5, 3.8],
              [3.4, 2.9, 2.7, 3.5],
              [2.1, 1.8, 2.9, 2.6]])

# Standardize the data
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Apply UMAP to reduce to 2 dimensions
umap_model = umap.UMAP(n_neighbors=5, min_dist=0.3, random_state=42)
X_umap = umap_model.fit_transform(X_scaled)

# Plot the 2D UMAP projection
plt.scatter(X_umap[:, 0], X_umap[:, 1], s=100)
plt.xlabel("UMAP Dimension 1")
plt.ylabel("UMAP Dimension 2")
plt.title("UMAP Projection")
plt.show()

Exercise 5: Clustering Evaluation with Silhouette Score

Task: Apply K-Means clustering to the following dataset and calculate the Silhouette Score to evaluate the clustering performance.

Solution:

from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score
import numpy as np

# Data: Features 1 and 2
X = np.array([[2.5, 3.5], [3.1, 2.9], [1.8, 2.7], [4.2, 3.6], [3.5, 4.0],
              [1.9, 3.3], [4.5, 3.2], [2.0, 2.8]])

# Apply K-Means clustering with 3 clusters
kmeans = KMeans(n_clusters=3, random_state=42)
kmeans.fit(X)
labels = kmeans.labels_

# Calculate the Silhouette Score
silhouette_avg = silhouette_score(X, labels)
print(f"Silhouette Score: {silhouette_avg:.2f}")

Exercise 6: Dimensionality Reduction Evaluation with Explained Variance

Task: Apply PCA to reduce the following dataset from five features to three components. Calculate and plot the explained variance ratio for each component.

Solution:

from sklearn.decomposition import PCA
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler

# Data: 5 features
X = np.array([[2.5, 3.1, 2.8, 4.0, 2.1],
              [3.0, 2.7, 1.9, 2.8, 3.6],
              [1.9, 2.3, 3.7, 3.4, 2.9],
              [4.2, 3.6, 4.1, 2.9, 3.5],
              [3.6, 4.0, 2.9, 2.2, 3.0]])

# Standardize the data
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Apply PCA to reduce to 3 components
pca = PCA(n_components=3)
pca.fit(X_scaled)

# Plot the explained variance ratio for each component
explained_variance = pca.explained_variance_ratio_
plt.bar(range(1, 4), explained_variance, tick_label=["PC1", "PC2", "PC3"])
plt.xlabel("Principal Components")
plt.ylabel("Explained Variance Ratio")
plt.title("Explained Variance by Principal Components")
plt.show()

These practical exercises showcase a range of unsupervised learning techniques, including clustering, dimensionality reduction, and evaluation metrics. Each exercise reinforces key concepts from Chapter 5, offering you hands-on experience in implementing and assessing these powerful methods.