Chapter 6: Introduction to Feature Selection with Lasso and Ridge
6.3 Practical Exercises: Chapter 6
In this exercise section, we’ll apply regularization techniques for feature selection using Lasso and Ridge. These exercises will help solidify your understanding of L1 and L2 regularization and hyperparameter tuning.
Exercise 1: Applying Lasso for Feature Selection
Objective: Use Lasso regression to identify the most important features from a dataset and observe how changing the alpha
parameter affects feature selection.
Instructions:
- Load a dataset with at least 15 features.
- Apply Lasso regression and experiment with different values of
alpha
. - List the non-zero coefficients (selected features) for each
alpha
value and plot them to visualize which features remain relevant asalpha
increases.
Solution:
from sklearn.datasets import make_regression
from sklearn.linear_model import Lasso
import matplotlib.pyplot as plt
# Generate synthetic data with 15 features
X, y = make_regression(n_samples=100, n_features=15, noise=0.1, random_state=42)
# Define different alpha values to test
alpha_values = [0.01, 0.1, 1, 5, 10]
selected_features = {}
# Apply Lasso for each alpha value
for alpha in alpha_values:
lasso = Lasso(alpha=alpha, max_iter=10000)
lasso.fit(X, y)
selected_features[alpha] = lasso.coef_
# Plot non-zero coefficients for each alpha
plt.figure(figsize=(10, 6))
for alpha, coefs in selected_features.items():
plt.plot(range(len(coefs)), coefs, marker='o', label=f'alpha={alpha}')
plt.axhline(0, color='gray', linestyle='--')
plt.xlabel("Feature Index")
plt.ylabel("Coefficient Value")
plt.legend()
plt.title("Lasso Coefficients for Different Alpha Values")
plt.show()
This code shows how the model selects features by adjusting alpha. As alpha increases, more coefficients are driven to zero, removing features with weaker relationships to the target variable.
Exercise 2: Tuning Lasso with Grid Search
Objective: Use GridSearchCV to find the optimal alpha
for Lasso regression based on model performance.
Instructions:
- Load a dataset and split it into training and testing sets.
- Use GridSearchCV to identify the best
alpha
from a predefined range of values. - Evaluate the model with the best alpha on the test set.
Solution:
from sklearn.linear_model import Lasso
from sklearn.model_selection import GridSearchCV, train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.datasets import make_regression
# Generate synthetic data
X, y = make_regression(n_samples=100, n_features=20, noise=0.1, random_state=42)
# Split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Define range of alpha values for GridSearch
alpha_values = {'alpha': [0.001, 0.01, 0.1, 1, 10]}
# Initialize Lasso and GridSearchCV
lasso = Lasso(max_iter=10000)
grid_search = GridSearchCV(lasso, alpha_values, cv=5, scoring='neg_mean_squared_error')
grid_search.fit(X_train, y_train)
# Best alpha value
best_alpha = grid_search.best_params_['alpha']
print("Optimal alpha for Lasso:", best_alpha)
# Evaluate model with best alpha
best_lasso = Lasso(alpha=best_alpha)
best_lasso.fit(X_train, y_train)
y_pred = best_lasso.predict(X_test)
print("Test MSE with optimal alpha:", mean_squared_error(y_test, y_pred))
This exercise demonstrates how to use GridSearchCV to fine-tune alpha for Lasso, improving feature selection and minimizing error.
Exercise 3: Applying Ridge Regression with Cross-Validation
Objective: Explore Ridge regression and determine the optimal regularization strength for a dataset with multicollinear features.
Instructions:
- Load a dataset with multicollinear features (e.g., a dataset with correlated variables).
- Use Ridge regression with cross-validation to determine the best alpha value.
- Compare model performance on the training and testing sets.
Solution:
from sklearn.linear_model import Ridge
from sklearn.model_selection import cross_val_score
import numpy as np
# Generate synthetic data with correlated features
np.random.seed(42)
X = np.random.rand(100, 5)
y = X @ np.array([2, 4, -3, 1, 5]) + np.random.normal(0, 0.1, 100)
# Initialize Ridge model and range of alpha values
alpha_values = [0.01, 0.1, 1, 10, 100]
ridge_scores = []
# Evaluate each alpha with cross-validation
for alpha in alpha_values:
ridge = Ridge(alpha=alpha)
scores = cross_val_score(ridge, X, y, cv=5, scoring='neg_mean_squared_error')
ridge_scores.append((alpha, np.mean(scores)))
# Find best alpha
best_alpha, best_score = max(ridge_scores, key=lambda x: x[1])
print("Optimal alpha for Ridge:", best_alpha)
print("Cross-validated MSE:", -best_score)
In this exercise:
- Cross-Validation: We apply cross-validation to evaluate Ridge regression’s performance across different alpha values.
- Comparison: The optimal alpha reduces multicollinearity effects, stabilizing coefficient estimates and improving generalization.
Exercise 4: Using Randomized Search for Efficient Lasso Tuning
Objective: Use RandomizedSearchCV to efficiently tune alpha for Lasso on a high-dimensional dataset.
Instructions:
- Load a high-dimensional dataset.
- Define a logarithmic range of alpha values and apply RandomizedSearchCV.
- Compare the best model with the baseline Lasso model using test data.
Solution:
from sklearn.model_selection import RandomizedSearchCV
import numpy as np
# Generate synthetic data with high dimensionality
X, y = make_regression(n_samples=100, n_features=50, noise=0.1, random_state=42)
# Define alpha search space for RandomizedSearchCV
alpha_values = {'alpha': np.logspace(-4, 1, 100)}
# Initialize Lasso and RandomizedSearchCV
lasso = Lasso(max_iter=10000)
random_search = RandomizedSearchCV(lasso, alpha_values, cv=5, scoring='neg_mean_squared_error', n_iter=10, random_state=42)
random_search.fit(X, y)
# Display best alpha and score
best_alpha = random_search.best_params_['alpha']
print("Optimal alpha for Lasso (Randomized Search):", best_alpha)
print("Best cross-validated score (negative MSE):", random_search.best_score_)
In this example:
- High-Dimensional Data: We create a dataset with many features and use Randomized Search to quickly identify a suitable alpha value for Lasso.
- Logarithmic Range: By defining a broad, logarithmic range for alpha, we efficiently explore the search space without exhaustive tuning.
These exercises provide hands-on experience with regularization techniques and hyperparameter tuning for feature selection using Lasso and Ridge. By understanding how to select and tune these parameters, you can enhance model performance, reduce overfitting, and achieve more interpretable results.
6.3 Practical Exercises: Chapter 6
In this exercise section, we’ll apply regularization techniques for feature selection using Lasso and Ridge. These exercises will help solidify your understanding of L1 and L2 regularization and hyperparameter tuning.
Exercise 1: Applying Lasso for Feature Selection
Objective: Use Lasso regression to identify the most important features from a dataset and observe how changing the alpha
parameter affects feature selection.
Instructions:
- Load a dataset with at least 15 features.
- Apply Lasso regression and experiment with different values of
alpha
. - List the non-zero coefficients (selected features) for each
alpha
value and plot them to visualize which features remain relevant asalpha
increases.
Solution:
from sklearn.datasets import make_regression
from sklearn.linear_model import Lasso
import matplotlib.pyplot as plt
# Generate synthetic data with 15 features
X, y = make_regression(n_samples=100, n_features=15, noise=0.1, random_state=42)
# Define different alpha values to test
alpha_values = [0.01, 0.1, 1, 5, 10]
selected_features = {}
# Apply Lasso for each alpha value
for alpha in alpha_values:
lasso = Lasso(alpha=alpha, max_iter=10000)
lasso.fit(X, y)
selected_features[alpha] = lasso.coef_
# Plot non-zero coefficients for each alpha
plt.figure(figsize=(10, 6))
for alpha, coefs in selected_features.items():
plt.plot(range(len(coefs)), coefs, marker='o', label=f'alpha={alpha}')
plt.axhline(0, color='gray', linestyle='--')
plt.xlabel("Feature Index")
plt.ylabel("Coefficient Value")
plt.legend()
plt.title("Lasso Coefficients for Different Alpha Values")
plt.show()
This code shows how the model selects features by adjusting alpha. As alpha increases, more coefficients are driven to zero, removing features with weaker relationships to the target variable.
Exercise 2: Tuning Lasso with Grid Search
Objective: Use GridSearchCV to find the optimal alpha
for Lasso regression based on model performance.
Instructions:
- Load a dataset and split it into training and testing sets.
- Use GridSearchCV to identify the best
alpha
from a predefined range of values. - Evaluate the model with the best alpha on the test set.
Solution:
from sklearn.linear_model import Lasso
from sklearn.model_selection import GridSearchCV, train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.datasets import make_regression
# Generate synthetic data
X, y = make_regression(n_samples=100, n_features=20, noise=0.1, random_state=42)
# Split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Define range of alpha values for GridSearch
alpha_values = {'alpha': [0.001, 0.01, 0.1, 1, 10]}
# Initialize Lasso and GridSearchCV
lasso = Lasso(max_iter=10000)
grid_search = GridSearchCV(lasso, alpha_values, cv=5, scoring='neg_mean_squared_error')
grid_search.fit(X_train, y_train)
# Best alpha value
best_alpha = grid_search.best_params_['alpha']
print("Optimal alpha for Lasso:", best_alpha)
# Evaluate model with best alpha
best_lasso = Lasso(alpha=best_alpha)
best_lasso.fit(X_train, y_train)
y_pred = best_lasso.predict(X_test)
print("Test MSE with optimal alpha:", mean_squared_error(y_test, y_pred))
This exercise demonstrates how to use GridSearchCV to fine-tune alpha for Lasso, improving feature selection and minimizing error.
Exercise 3: Applying Ridge Regression with Cross-Validation
Objective: Explore Ridge regression and determine the optimal regularization strength for a dataset with multicollinear features.
Instructions:
- Load a dataset with multicollinear features (e.g., a dataset with correlated variables).
- Use Ridge regression with cross-validation to determine the best alpha value.
- Compare model performance on the training and testing sets.
Solution:
from sklearn.linear_model import Ridge
from sklearn.model_selection import cross_val_score
import numpy as np
# Generate synthetic data with correlated features
np.random.seed(42)
X = np.random.rand(100, 5)
y = X @ np.array([2, 4, -3, 1, 5]) + np.random.normal(0, 0.1, 100)
# Initialize Ridge model and range of alpha values
alpha_values = [0.01, 0.1, 1, 10, 100]
ridge_scores = []
# Evaluate each alpha with cross-validation
for alpha in alpha_values:
ridge = Ridge(alpha=alpha)
scores = cross_val_score(ridge, X, y, cv=5, scoring='neg_mean_squared_error')
ridge_scores.append((alpha, np.mean(scores)))
# Find best alpha
best_alpha, best_score = max(ridge_scores, key=lambda x: x[1])
print("Optimal alpha for Ridge:", best_alpha)
print("Cross-validated MSE:", -best_score)
In this exercise:
- Cross-Validation: We apply cross-validation to evaluate Ridge regression’s performance across different alpha values.
- Comparison: The optimal alpha reduces multicollinearity effects, stabilizing coefficient estimates and improving generalization.
Exercise 4: Using Randomized Search for Efficient Lasso Tuning
Objective: Use RandomizedSearchCV to efficiently tune alpha for Lasso on a high-dimensional dataset.
Instructions:
- Load a high-dimensional dataset.
- Define a logarithmic range of alpha values and apply RandomizedSearchCV.
- Compare the best model with the baseline Lasso model using test data.
Solution:
from sklearn.model_selection import RandomizedSearchCV
import numpy as np
# Generate synthetic data with high dimensionality
X, y = make_regression(n_samples=100, n_features=50, noise=0.1, random_state=42)
# Define alpha search space for RandomizedSearchCV
alpha_values = {'alpha': np.logspace(-4, 1, 100)}
# Initialize Lasso and RandomizedSearchCV
lasso = Lasso(max_iter=10000)
random_search = RandomizedSearchCV(lasso, alpha_values, cv=5, scoring='neg_mean_squared_error', n_iter=10, random_state=42)
random_search.fit(X, y)
# Display best alpha and score
best_alpha = random_search.best_params_['alpha']
print("Optimal alpha for Lasso (Randomized Search):", best_alpha)
print("Best cross-validated score (negative MSE):", random_search.best_score_)
In this example:
- High-Dimensional Data: We create a dataset with many features and use Randomized Search to quickly identify a suitable alpha value for Lasso.
- Logarithmic Range: By defining a broad, logarithmic range for alpha, we efficiently explore the search space without exhaustive tuning.
These exercises provide hands-on experience with regularization techniques and hyperparameter tuning for feature selection using Lasso and Ridge. By understanding how to select and tune these parameters, you can enhance model performance, reduce overfitting, and achieve more interpretable results.
6.3 Practical Exercises: Chapter 6
In this exercise section, we’ll apply regularization techniques for feature selection using Lasso and Ridge. These exercises will help solidify your understanding of L1 and L2 regularization and hyperparameter tuning.
Exercise 1: Applying Lasso for Feature Selection
Objective: Use Lasso regression to identify the most important features from a dataset and observe how changing the alpha
parameter affects feature selection.
Instructions:
- Load a dataset with at least 15 features.
- Apply Lasso regression and experiment with different values of
alpha
. - List the non-zero coefficients (selected features) for each
alpha
value and plot them to visualize which features remain relevant asalpha
increases.
Solution:
from sklearn.datasets import make_regression
from sklearn.linear_model import Lasso
import matplotlib.pyplot as plt
# Generate synthetic data with 15 features
X, y = make_regression(n_samples=100, n_features=15, noise=0.1, random_state=42)
# Define different alpha values to test
alpha_values = [0.01, 0.1, 1, 5, 10]
selected_features = {}
# Apply Lasso for each alpha value
for alpha in alpha_values:
lasso = Lasso(alpha=alpha, max_iter=10000)
lasso.fit(X, y)
selected_features[alpha] = lasso.coef_
# Plot non-zero coefficients for each alpha
plt.figure(figsize=(10, 6))
for alpha, coefs in selected_features.items():
plt.plot(range(len(coefs)), coefs, marker='o', label=f'alpha={alpha}')
plt.axhline(0, color='gray', linestyle='--')
plt.xlabel("Feature Index")
plt.ylabel("Coefficient Value")
plt.legend()
plt.title("Lasso Coefficients for Different Alpha Values")
plt.show()
This code shows how the model selects features by adjusting alpha. As alpha increases, more coefficients are driven to zero, removing features with weaker relationships to the target variable.
Exercise 2: Tuning Lasso with Grid Search
Objective: Use GridSearchCV to find the optimal alpha
for Lasso regression based on model performance.
Instructions:
- Load a dataset and split it into training and testing sets.
- Use GridSearchCV to identify the best
alpha
from a predefined range of values. - Evaluate the model with the best alpha on the test set.
Solution:
from sklearn.linear_model import Lasso
from sklearn.model_selection import GridSearchCV, train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.datasets import make_regression
# Generate synthetic data
X, y = make_regression(n_samples=100, n_features=20, noise=0.1, random_state=42)
# Split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Define range of alpha values for GridSearch
alpha_values = {'alpha': [0.001, 0.01, 0.1, 1, 10]}
# Initialize Lasso and GridSearchCV
lasso = Lasso(max_iter=10000)
grid_search = GridSearchCV(lasso, alpha_values, cv=5, scoring='neg_mean_squared_error')
grid_search.fit(X_train, y_train)
# Best alpha value
best_alpha = grid_search.best_params_['alpha']
print("Optimal alpha for Lasso:", best_alpha)
# Evaluate model with best alpha
best_lasso = Lasso(alpha=best_alpha)
best_lasso.fit(X_train, y_train)
y_pred = best_lasso.predict(X_test)
print("Test MSE with optimal alpha:", mean_squared_error(y_test, y_pred))
This exercise demonstrates how to use GridSearchCV to fine-tune alpha for Lasso, improving feature selection and minimizing error.
Exercise 3: Applying Ridge Regression with Cross-Validation
Objective: Explore Ridge regression and determine the optimal regularization strength for a dataset with multicollinear features.
Instructions:
- Load a dataset with multicollinear features (e.g., a dataset with correlated variables).
- Use Ridge regression with cross-validation to determine the best alpha value.
- Compare model performance on the training and testing sets.
Solution:
from sklearn.linear_model import Ridge
from sklearn.model_selection import cross_val_score
import numpy as np
# Generate synthetic data with correlated features
np.random.seed(42)
X = np.random.rand(100, 5)
y = X @ np.array([2, 4, -3, 1, 5]) + np.random.normal(0, 0.1, 100)
# Initialize Ridge model and range of alpha values
alpha_values = [0.01, 0.1, 1, 10, 100]
ridge_scores = []
# Evaluate each alpha with cross-validation
for alpha in alpha_values:
ridge = Ridge(alpha=alpha)
scores = cross_val_score(ridge, X, y, cv=5, scoring='neg_mean_squared_error')
ridge_scores.append((alpha, np.mean(scores)))
# Find best alpha
best_alpha, best_score = max(ridge_scores, key=lambda x: x[1])
print("Optimal alpha for Ridge:", best_alpha)
print("Cross-validated MSE:", -best_score)
In this exercise:
- Cross-Validation: We apply cross-validation to evaluate Ridge regression’s performance across different alpha values.
- Comparison: The optimal alpha reduces multicollinearity effects, stabilizing coefficient estimates and improving generalization.
Exercise 4: Using Randomized Search for Efficient Lasso Tuning
Objective: Use RandomizedSearchCV to efficiently tune alpha for Lasso on a high-dimensional dataset.
Instructions:
- Load a high-dimensional dataset.
- Define a logarithmic range of alpha values and apply RandomizedSearchCV.
- Compare the best model with the baseline Lasso model using test data.
Solution:
from sklearn.model_selection import RandomizedSearchCV
import numpy as np
# Generate synthetic data with high dimensionality
X, y = make_regression(n_samples=100, n_features=50, noise=0.1, random_state=42)
# Define alpha search space for RandomizedSearchCV
alpha_values = {'alpha': np.logspace(-4, 1, 100)}
# Initialize Lasso and RandomizedSearchCV
lasso = Lasso(max_iter=10000)
random_search = RandomizedSearchCV(lasso, alpha_values, cv=5, scoring='neg_mean_squared_error', n_iter=10, random_state=42)
random_search.fit(X, y)
# Display best alpha and score
best_alpha = random_search.best_params_['alpha']
print("Optimal alpha for Lasso (Randomized Search):", best_alpha)
print("Best cross-validated score (negative MSE):", random_search.best_score_)
In this example:
- High-Dimensional Data: We create a dataset with many features and use Randomized Search to quickly identify a suitable alpha value for Lasso.
- Logarithmic Range: By defining a broad, logarithmic range for alpha, we efficiently explore the search space without exhaustive tuning.
These exercises provide hands-on experience with regularization techniques and hyperparameter tuning for feature selection using Lasso and Ridge. By understanding how to select and tune these parameters, you can enhance model performance, reduce overfitting, and achieve more interpretable results.
6.3 Practical Exercises: Chapter 6
In this exercise section, we’ll apply regularization techniques for feature selection using Lasso and Ridge. These exercises will help solidify your understanding of L1 and L2 regularization and hyperparameter tuning.
Exercise 1: Applying Lasso for Feature Selection
Objective: Use Lasso regression to identify the most important features from a dataset and observe how changing the alpha
parameter affects feature selection.
Instructions:
- Load a dataset with at least 15 features.
- Apply Lasso regression and experiment with different values of
alpha
. - List the non-zero coefficients (selected features) for each
alpha
value and plot them to visualize which features remain relevant asalpha
increases.
Solution:
from sklearn.datasets import make_regression
from sklearn.linear_model import Lasso
import matplotlib.pyplot as plt
# Generate synthetic data with 15 features
X, y = make_regression(n_samples=100, n_features=15, noise=0.1, random_state=42)
# Define different alpha values to test
alpha_values = [0.01, 0.1, 1, 5, 10]
selected_features = {}
# Apply Lasso for each alpha value
for alpha in alpha_values:
lasso = Lasso(alpha=alpha, max_iter=10000)
lasso.fit(X, y)
selected_features[alpha] = lasso.coef_
# Plot non-zero coefficients for each alpha
plt.figure(figsize=(10, 6))
for alpha, coefs in selected_features.items():
plt.plot(range(len(coefs)), coefs, marker='o', label=f'alpha={alpha}')
plt.axhline(0, color='gray', linestyle='--')
plt.xlabel("Feature Index")
plt.ylabel("Coefficient Value")
plt.legend()
plt.title("Lasso Coefficients for Different Alpha Values")
plt.show()
This code shows how the model selects features by adjusting alpha. As alpha increases, more coefficients are driven to zero, removing features with weaker relationships to the target variable.
Exercise 2: Tuning Lasso with Grid Search
Objective: Use GridSearchCV to find the optimal alpha
for Lasso regression based on model performance.
Instructions:
- Load a dataset and split it into training and testing sets.
- Use GridSearchCV to identify the best
alpha
from a predefined range of values. - Evaluate the model with the best alpha on the test set.
Solution:
from sklearn.linear_model import Lasso
from sklearn.model_selection import GridSearchCV, train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.datasets import make_regression
# Generate synthetic data
X, y = make_regression(n_samples=100, n_features=20, noise=0.1, random_state=42)
# Split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Define range of alpha values for GridSearch
alpha_values = {'alpha': [0.001, 0.01, 0.1, 1, 10]}
# Initialize Lasso and GridSearchCV
lasso = Lasso(max_iter=10000)
grid_search = GridSearchCV(lasso, alpha_values, cv=5, scoring='neg_mean_squared_error')
grid_search.fit(X_train, y_train)
# Best alpha value
best_alpha = grid_search.best_params_['alpha']
print("Optimal alpha for Lasso:", best_alpha)
# Evaluate model with best alpha
best_lasso = Lasso(alpha=best_alpha)
best_lasso.fit(X_train, y_train)
y_pred = best_lasso.predict(X_test)
print("Test MSE with optimal alpha:", mean_squared_error(y_test, y_pred))
This exercise demonstrates how to use GridSearchCV to fine-tune alpha for Lasso, improving feature selection and minimizing error.
Exercise 3: Applying Ridge Regression with Cross-Validation
Objective: Explore Ridge regression and determine the optimal regularization strength for a dataset with multicollinear features.
Instructions:
- Load a dataset with multicollinear features (e.g., a dataset with correlated variables).
- Use Ridge regression with cross-validation to determine the best alpha value.
- Compare model performance on the training and testing sets.
Solution:
from sklearn.linear_model import Ridge
from sklearn.model_selection import cross_val_score
import numpy as np
# Generate synthetic data with correlated features
np.random.seed(42)
X = np.random.rand(100, 5)
y = X @ np.array([2, 4, -3, 1, 5]) + np.random.normal(0, 0.1, 100)
# Initialize Ridge model and range of alpha values
alpha_values = [0.01, 0.1, 1, 10, 100]
ridge_scores = []
# Evaluate each alpha with cross-validation
for alpha in alpha_values:
ridge = Ridge(alpha=alpha)
scores = cross_val_score(ridge, X, y, cv=5, scoring='neg_mean_squared_error')
ridge_scores.append((alpha, np.mean(scores)))
# Find best alpha
best_alpha, best_score = max(ridge_scores, key=lambda x: x[1])
print("Optimal alpha for Ridge:", best_alpha)
print("Cross-validated MSE:", -best_score)
In this exercise:
- Cross-Validation: We apply cross-validation to evaluate Ridge regression’s performance across different alpha values.
- Comparison: The optimal alpha reduces multicollinearity effects, stabilizing coefficient estimates and improving generalization.
Exercise 4: Using Randomized Search for Efficient Lasso Tuning
Objective: Use RandomizedSearchCV to efficiently tune alpha for Lasso on a high-dimensional dataset.
Instructions:
- Load a high-dimensional dataset.
- Define a logarithmic range of alpha values and apply RandomizedSearchCV.
- Compare the best model with the baseline Lasso model using test data.
Solution:
from sklearn.model_selection import RandomizedSearchCV
import numpy as np
# Generate synthetic data with high dimensionality
X, y = make_regression(n_samples=100, n_features=50, noise=0.1, random_state=42)
# Define alpha search space for RandomizedSearchCV
alpha_values = {'alpha': np.logspace(-4, 1, 100)}
# Initialize Lasso and RandomizedSearchCV
lasso = Lasso(max_iter=10000)
random_search = RandomizedSearchCV(lasso, alpha_values, cv=5, scoring='neg_mean_squared_error', n_iter=10, random_state=42)
random_search.fit(X, y)
# Display best alpha and score
best_alpha = random_search.best_params_['alpha']
print("Optimal alpha for Lasso (Randomized Search):", best_alpha)
print("Best cross-validated score (negative MSE):", random_search.best_score_)
In this example:
- High-Dimensional Data: We create a dataset with many features and use Randomized Search to quickly identify a suitable alpha value for Lasso.
- Logarithmic Range: By defining a broad, logarithmic range for alpha, we efficiently explore the search space without exhaustive tuning.
These exercises provide hands-on experience with regularization techniques and hyperparameter tuning for feature selection using Lasso and Ridge. By understanding how to select and tune these parameters, you can enhance model performance, reduce overfitting, and achieve more interpretable results.