Chapter 9: Practical Projects
9.4 Project 4: Time Series Forecasting with LSTMs (Improved)
Time series forecasting plays a pivotal role across numerous domains, including but not limited to financial analysis, meteorological predictions, and demand estimation in supply chain management. This project delves into the application of Long Short-Term Memory (LSTM) networks, a sophisticated type of recurrent neural network, for the purpose of predicting future values within a time series. Our specific focus lies in the realm of stock price prediction, a challenging and economically significant application of time series forecasting.
Building upon our original project, we aim to implement a series of enhancements designed to significantly boost both the performance and robustness of our model. These improvements encompass various aspects of the machine learning pipeline, from data preprocessing and feature engineering to model architecture and training methodologies. By incorporating these advancements, we seek to create a more accurate, reliable, and interpretable forecasting system that can effectively capture the complex patterns and dependencies inherent in stock price movements.
Through this enhanced approach, we not only aim to improve predictive accuracy but also to gain deeper insights into the underlying factors driving stock price fluctuations. This project serves as a comprehensive exploration of state-of-the-art techniques in time series forecasting, demonstrating the potential of advanced machine learning methods to tackle real-world financial prediction challenges.
9.4.1 Data Collection and Preprocessing
To enhance the robustness of our dataset, we will implement comprehensive data collection and preprocessing steps. This expansion involves gathering a wider range of historical data, incorporating additional relevant features, and applying advanced preprocessing techniques.
By doing so, we aim to create a more comprehensive and informative dataset that captures the nuanced patterns and relationships within the stock price movements. This improved dataset will serve as a solid foundation for our LSTM model, potentially leading to more accurate and reliable predictions.
import pandas as pd
import numpy as np
import yfinance as yf
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
# Fetch more historical data and additional features
stock_data = yf.download('GOOGL', start='2000-01-01', end='2023-12-31')
stock_data['Returns'] = stock_data['Close'].pct_change()
stock_data['MA50'] = stock_data['Close'].rolling(window=50).mean()
stock_data['MA200'] = stock_data['Close'].rolling(window=200).mean()
stock_data['Volume_MA'] = stock_data['Volume'].rolling(window=20).mean()
stock_data.dropna(inplace=True)
# Normalize the data
scaler = MinMaxScaler(feature_range=(0, 1))
scaled_data = scaler.fit_transform(stock_data[['Close', 'Volume', 'Returns', 'MA50', 'MA200', 'Volume_MA']])
# Create sequences
def create_sequences(data, seq_length):
X, y = [], []
for i in range(len(data) - seq_length):
X.append(data[i:(i + seq_length), :])
y.append(data[i + seq_length, 0])
return np.array(X), np.array(y)
sequence_length = 60
X, y = create_sequences(scaled_data, sequence_length)
# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
Here's a breakdown:
- Data Collection: The code uses the yfinance library to download historical stock data for Google (GOOGL) from January 1, 2000, to December 31, 2023.
- Feature Engineering: Several new features are created:
- Returns: Percentage change in closing price
- MA50: 50-day moving average of closing price
- MA200: 200-day moving average of closing price
- Volume_MA: 20-day moving average of trading volume
- Data Normalization: The MinMaxScaler is used to scale all features to a range between 0 and 1, which is important for neural network training.
- Sequence Creation: A function create_sequences() is defined to generate input sequences and corresponding target values. It uses a sliding window approach with a sequence length of 60 days.
- Data Splitting: The dataset is split into training and testing sets, with 20% of the data reserved for testing.
This preprocessing pipeline creates a robust dataset that captures various aspects of stock price movements, providing a solid foundation for the LSTM model to learn from.
9.4.2 Enhanced LSTM Architecture
In this step, we will engineer a advanced and robust LSTM architecture, incorporating multiple layers and implementing dropout techniques for effective regularization. This enhanced design aims to capture complex temporal dependencies in the time series data while mitigating overfitting issues.
By strategically adding depth to our network and introducing dropout layers, we seek to improve the model's ability to generalize from the training data and make more accurate predictions on unseen stock price patterns. The sophisticated architecture we'll construct will balance the trade-off between model complexity and generalization capability, potentially leading to superior forecasting performance in our stock price prediction task.
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout, BatchNormalization
from tensorflow.keras.optimizers import Adam
def build_improved_lstm_model(input_shape):
model = Sequential([
LSTM(100, return_sequences=True, input_shape=input_shape),
BatchNormalization(),
Dropout(0.2),
LSTM(100, return_sequences=True),
BatchNormalization(),
Dropout(0.2),
LSTM(100),
BatchNormalization(),
Dropout(0.2),
Dense(50, activation='relu'),
Dense(1)
])
model.compile(optimizer=Adam(learning_rate=0.001), loss='mean_squared_error')
return model
model = build_improved_lstm_model((X_train.shape[1], X_train.shape[2]))
model.summary()
Here's a breakdown:
- Imports: The necessary Keras modules are imported to build the model.
- Model Architecture: The
build_improved_lstm_model
function creates a Sequential model with the following layers:- Three LSTM layers with 100 units each, with the first two returning sequences
- BatchNormalization layers after each LSTM layer for normalizing activations
- Dropout layers (20% rate) for regularization to prevent overfitting
- A Dense layer with 50 units and ReLU activation
- A final Dense layer with 1 unit for output prediction
- Model Compilation: The model is compiled using the Adam optimizer with a learning rate of 0.001 and mean squared error as the loss function.
- Model Creation: An instance of the model is created using the input shape from the training data.
- Model Summary: The
model.summary()
call prints out the structure of the model, showing the layers and the number of parameters.
This architecture aims to capture complex temporal dependencies in the stock price data while using techniques like dropout and batch normalization to improve generalization and training stability.
9.4.3 Training with Early Stopping and Learning Rate Scheduling
To enhance the training process and optimize model performance, we will implement two key techniques: early stopping and learning rate scheduling. Early stopping helps prevent overfitting by halting the training process when the model's performance on the validation set stops improving. This ensures that we capture the model at its peak generalization ability.
Learning rate scheduling, on the other hand, dynamically adjusts the learning rate during training. This adaptive approach allows the model to make larger updates in the early stages of training and finer adjustments as it converges, potentially leading to faster convergence and better overall performance.
By incorporating these advanced training strategies, we aim to achieve a more efficient training process and a model that generalizes well to unseen data, ultimately improving our stock price predictions.
from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau
early_stopping = EarlyStopping(monitor='val_loss', patience=20, restore_best_weights=True)
lr_scheduler = ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=10, min_lr=0.00001)
history = model.fit(
X_train, y_train,
epochs=200,
batch_size=32,
validation_split=0.2,
callbacks=[early_stopping, lr_scheduler],
verbose=1
)
Here's a breakdown of the code:
- Importing Callbacks: The code imports EarlyStopping and ReduceLROnPlateau from Keras callbacks.
- Early Stopping: This technique stops training when the model's performance on the validation set stops improving. The parameters are:
- monitor='val_loss': It watches the validation loss
- patience=20: It will wait for 20 epochs before stopping if no improvement is seen
- restore_best_weights=True: It will restore the model weights from the epoch with the best value of the monitored quantity
- Learning Rate Scheduler: This adjusts the learning rate during training. The parameters are:
- monitor='val_loss': It watches the validation loss
- factor=0.5: It will reduce the learning rate by half when triggered
- patience=10: It will wait for 10 epochs before reducing the learning rate
- min_lr=0.00001: The minimum learning rate
- Model Training: The model.fit() function trains the model with these parameters:
- epochs=200: Maximum number of training epochs
- batch_size=32: Number of samples per gradient update
- validation_split=0.2: 20% of the training data will be used for validation
- callbacks=[early_stopping, lr_scheduler]: The early stopping and learning rate scheduler are applied during training
- verbose=1: This will show progress bars during training
These techniques aim to improve the training process, prevent overfitting, and potentially lead to better model performance.
9.4.4 Model Evaluation and Visualization
To thoroughly assess the model's performance and gain deeper insights into its predictions, we will implement a comprehensive evaluation strategy. This approach will include various quantitative metrics to measure accuracy and error, as well as visual representations of the model's predictions compared to actual values. By combining these methods, we can better understand the strengths and limitations of our LSTM model in forecasting stock prices.
Our evaluation will encompass the following key components:
- Calculation of standard regression metrics such as Mean Squared Error (MSE), Mean Absolute Error (MAE), and R-squared (R2) score
- Time series plots comparing predicted values against actual stock prices
- Residual analysis to identify any patterns in prediction errors
- Rolling window evaluation to assess model performance over different time periods
This multi-faceted evaluation approach will provide a nuanced understanding of our model's predictive capabilities and help identify areas for potential improvement in future iterations.
import matplotlib.pyplot as plt
# Make predictions
train_predictions = model.predict(X_train)
test_predictions = model.predict(X_test)
# Inverse transform predictions
train_predictions = scaler.inverse_transform(np.concatenate((train_predictions, np.zeros((len(train_predictions), 5))), axis=1))[:, 0]
test_predictions = scaler.inverse_transform(np.concatenate((test_predictions, np.zeros((len(test_predictions), 5))), axis=1))[:, 0]
y_train_actual = scaler.inverse_transform(np.concatenate((y_train.reshape(-1, 1), np.zeros((len(y_train), 5))), axis=1))[:, 0]
y_test_actual = scaler.inverse_transform(np.concatenate((y_test.reshape(-1, 1), np.zeros((len(y_test), 5))), axis=1))[:, 0]
# Visualize predictions
plt.figure(figsize=(15, 6))
plt.plot(y_test_actual, label='Actual')
plt.plot(test_predictions, label='Predicted')
plt.title('LSTM Model: Actual vs Predicted Stock Prices')
plt.xlabel('Time')
plt.ylabel('Stock Price')
plt.legend()
plt.show()
# Evaluate model performance
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
mse = mean_squared_error(y_test_actual, test_predictions)
mae = mean_absolute_error(y_test_actual, test_predictions)
r2 = r2_score(y_test_actual, test_predictions)
print(f'Mean Squared Error: {mse}')
print(f'Mean Absolute Error: {mae}')
print(f'R-squared Score: {r2}')
Here's a breakdown of what the code does:
- Predictions: The model makes predictions on both the training and test datasets.
- Inverse Transformation: The predictions and actual values are inverse transformed to convert them back to their original scale. This is necessary because the data was initially scaled during preprocessing.
- Visualization: A plot is created to compare the actual stock prices with the predicted ones for the test set. This visual representation helps in understanding how well the model's predictions align with the real data.
- Performance Metrics: The code calculates three key performance metrics:
- Mean Squared Error (MSE): Measures the average squared difference between predicted and actual values.
- Mean Absolute Error (MAE): Measures the average absolute difference between predicted and actual values.
- R-squared Score (R2): Indicates the proportion of the variance in the dependent variable that is predictable from the independent variable(s).
These metrics provide a quantitative assessment of the model's performance, helping to evaluate its accuracy and predictive power in forecasting stock prices.
9.4.5 Feature Importance Analysis
To gain deeper insights into our model's decision-making process, we will implement a comprehensive feature importance analysis. This crucial step will help us understand which features contribute most significantly to the predictions, allowing us to:
- Identify the most influential factors in stock price movements
- Potentially refine our feature selection for future iterations
- Provide valuable insights to stakeholders about key drivers of stock price changes
We'll use permutation importance, a model-agnostic method that measures the increase in prediction error after permuting each feature. This approach will give us a clear picture of each feature's impact on our LSTM model's performance.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.inspection import permutation_importance
def reshape_features(X):
"""Reshape 3D sequence data (samples, timesteps, features) into 2D for feature importance analysis."""
return X.reshape((X.shape[0], -1))
# Reshape X_test for permutation importance analysis
X_test_reshaped = reshape_features(X_test)
# Define a wrapper function for Keras model predictions
def model_predict(X):
X = X.reshape((-1, sequence_length, X.shape[1] // sequence_length)) # Reshape back to 3D
return model.predict(X, verbose=0).flatten()
# Compute permutation importance
r = permutation_importance(model_predict, X_test_reshaped, y_test, n_repeats=10, random_state=42, scoring='neg_mean_squared_error')
# Adjust feature names for the reshaped input
feature_names_expanded = [f"{feature}_t{t}" for t in range(sequence_length) for feature in ['Close', 'Volume', 'Returns', 'MA50', 'MA200', 'Volume_MA']]
feature_importance = pd.DataFrame({'feature': feature_names_expanded, 'importance': r.importances_mean})
# Aggregate importance scores for each original feature
feature_importance = feature_importance.groupby(lambda x: feature_importance['feature'][x].split('_')[0]).mean()
feature_importance = feature_importance.sort_values('importance', ascending=False)
# Plot feature importance
plt.figure(figsize=(10, 6))
plt.bar(feature_importance.index, feature_importance['importance'])
plt.title('Feature Importance (Permutation Importance)')
plt.xlabel('Features')
plt.ylabel('Importance')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()
Here's a breakdown of what the code does:
- Imports
permutation_importance
fromscikit-learn
.- This function helps evaluate how much each feature contributes to model predictions by randomly shuffling feature values and measuring the impact on accuracy.
- Defines
reshape_features()
to flatten 3D sequential input (samples, timesteps, features
) into a 2D format (samples, features × timesteps
).- This is necessary because
permutation_importance
expects a 2D array as input.
- This is necessary because
- Reshapes
X_test
usingreshape_features(X_test)
.- This step ensures that the test data has the correct format for permutation importance analysis.
- Defines
model_predict()
to adapt the LSTM model'spredict()
method to work withpermutation_importance
.- Since LSTMs expect 3D input (
samples, timesteps, features
), this function reshapes the data back to 3D before making predictions.
- Since LSTMs expect 3D input (
- Calculates permutation importance using:
- The trained LSTM model
- The reshaped test data
- The test labels (
y_test
) n_repeats=10
for stability, meaning the importance calculation is repeated 10 times.
- Generates expanded feature names to reflect multiple timesteps in the sequential input.
- Each feature name is appended with its timestep index (e.g.,
Close_t0
,Close_t1
, ...). - This ensures that features from different timesteps are differentiated in the importance analysis.
- Each feature name is appended with its timestep index (e.g.,
- Creates a DataFrame that:
- Maps feature names to their importance scores.
- Groups by original feature names (e.g., aggregating
Close_t0
toClose_t59
intoClose
). - Averages importance scores per feature and sorts them in descending order.
- Creates a bar plot to visualize feature importance scores.
- The most important features appear at the top, helping identify which factors have the highest impact on stock price predictions.
9.4.6 Ensemble Method
To enhance the robustness and accuracy of our predictions, we will implement an ensemble of LSTM models. This approach involves training multiple LSTM models independently and then combining their predictions. By leveraging the collective wisdom of multiple models, we can potentially achieve more stable and accurate forecasts.
The ensemble method can help mitigate individual model biases and reduce the impact of overfitting, leading to improved overall performance in stock price prediction. This technique is particularly valuable in the context of financial forecasting, where small improvements in accuracy can translate to significant real-world implications.
def create_ensemble(n_models, input_shape):
models = []
for _ in range(n_models):
model = build_improved_lstm_model(input_shape)
models.append(model)
return models
n_models = 3
ensemble = create_ensemble(n_models, (X_train.shape[1], X_train.shape[2]))
# Train each model in the ensemble
for i, model in enumerate(ensemble):
print(f"Training model {i+1}/{n_models}")
model.fit(X_train, y_train, epochs=100, batch_size=32, validation_split=0.2,
callbacks=[early_stopping, lr_scheduler], verbose=0)
# Make ensemble predictions
ensemble_predictions = np.mean([model.predict(X_test) for model in ensemble], axis=0)
# Inverse transform ensemble predictions
ensemble_predictions = scaler.inverse_transform(np.concatenate((ensemble_predictions, np.zeros((len(ensemble_predictions), 5))), axis=1))[:, 0]
# Evaluate ensemble performance
ensemble_mse = mean_squared_error(y_test_actual, ensemble_predictions)
ensemble_mae = mean_absolute_error(y_test_actual, ensemble_predictions)
ensemble_r2 = r2_score(y_test_actual, ensemble_predictions)
print(f'Ensemble Mean Squared Error: {ensemble_mse}')
print(f'Ensemble Mean Absolute Error: {ensemble_mae}')
print(f'Ensemble R-squared Score: {ensemble_r2}')
Here's a code breakdown:
- Create Ensemble Function: The
create_ensemble()
function creates multiple LSTM models, each with the same architecture but potentially different initializations. - Ensemble Creation: An ensemble of 3 models is created using the input shape of the training data.
- Model Training: Each model in the ensemble is trained independently on the same training data, using early stopping and learning rate scheduling for optimization.
- Ensemble Predictions: Predictions are made by averaging the outputs of all models in the ensemble.
- Inverse Transformation: The ensemble predictions are inverse transformed to convert them back to their original scale.
- Performance Evaluation: The ensemble's performance is evaluated using Mean Squared Error (MSE), Mean Absolute Error (MAE), and R-squared (R2) score.
This ensemble approach aims to improve prediction accuracy and robustness by leveraging multiple models, potentially mitigating individual model biases and reducing overfitting.
9.4.7 Conclusion
This improved project demonstrates several enhancements to the original LSTM-based time series forecasting task. We've implemented a more sophisticated data preprocessing pipeline, including additional features and proper scaling. The LSTM architecture has been improved with multiple layers, batch normalization, and dropout for better regularization.
We've also incorporated advanced training techniques such as early stopping and learning rate scheduling. The evaluation process now includes comprehensive metrics and visualizations, providing deeper insights into the model's performance. Additionally, we've introduced feature importance analysis to understand the impact of different inputs on the predictions.
Finally, an ensemble method has been implemented to potentially improve prediction accuracy and robustness. These improvements provide a more robust and insightful approach to time series forecasting, particularly in the context of stock price prediction.
9.4 Project 4: Time Series Forecasting with LSTMs (Improved)
Time series forecasting plays a pivotal role across numerous domains, including but not limited to financial analysis, meteorological predictions, and demand estimation in supply chain management. This project delves into the application of Long Short-Term Memory (LSTM) networks, a sophisticated type of recurrent neural network, for the purpose of predicting future values within a time series. Our specific focus lies in the realm of stock price prediction, a challenging and economically significant application of time series forecasting.
Building upon our original project, we aim to implement a series of enhancements designed to significantly boost both the performance and robustness of our model. These improvements encompass various aspects of the machine learning pipeline, from data preprocessing and feature engineering to model architecture and training methodologies. By incorporating these advancements, we seek to create a more accurate, reliable, and interpretable forecasting system that can effectively capture the complex patterns and dependencies inherent in stock price movements.
Through this enhanced approach, we not only aim to improve predictive accuracy but also to gain deeper insights into the underlying factors driving stock price fluctuations. This project serves as a comprehensive exploration of state-of-the-art techniques in time series forecasting, demonstrating the potential of advanced machine learning methods to tackle real-world financial prediction challenges.
9.4.1 Data Collection and Preprocessing
To enhance the robustness of our dataset, we will implement comprehensive data collection and preprocessing steps. This expansion involves gathering a wider range of historical data, incorporating additional relevant features, and applying advanced preprocessing techniques.
By doing so, we aim to create a more comprehensive and informative dataset that captures the nuanced patterns and relationships within the stock price movements. This improved dataset will serve as a solid foundation for our LSTM model, potentially leading to more accurate and reliable predictions.
import pandas as pd
import numpy as np
import yfinance as yf
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
# Fetch more historical data and additional features
stock_data = yf.download('GOOGL', start='2000-01-01', end='2023-12-31')
stock_data['Returns'] = stock_data['Close'].pct_change()
stock_data['MA50'] = stock_data['Close'].rolling(window=50).mean()
stock_data['MA200'] = stock_data['Close'].rolling(window=200).mean()
stock_data['Volume_MA'] = stock_data['Volume'].rolling(window=20).mean()
stock_data.dropna(inplace=True)
# Normalize the data
scaler = MinMaxScaler(feature_range=(0, 1))
scaled_data = scaler.fit_transform(stock_data[['Close', 'Volume', 'Returns', 'MA50', 'MA200', 'Volume_MA']])
# Create sequences
def create_sequences(data, seq_length):
X, y = [], []
for i in range(len(data) - seq_length):
X.append(data[i:(i + seq_length), :])
y.append(data[i + seq_length, 0])
return np.array(X), np.array(y)
sequence_length = 60
X, y = create_sequences(scaled_data, sequence_length)
# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
Here's a breakdown:
- Data Collection: The code uses the yfinance library to download historical stock data for Google (GOOGL) from January 1, 2000, to December 31, 2023.
- Feature Engineering: Several new features are created:
- Returns: Percentage change in closing price
- MA50: 50-day moving average of closing price
- MA200: 200-day moving average of closing price
- Volume_MA: 20-day moving average of trading volume
- Data Normalization: The MinMaxScaler is used to scale all features to a range between 0 and 1, which is important for neural network training.
- Sequence Creation: A function create_sequences() is defined to generate input sequences and corresponding target values. It uses a sliding window approach with a sequence length of 60 days.
- Data Splitting: The dataset is split into training and testing sets, with 20% of the data reserved for testing.
This preprocessing pipeline creates a robust dataset that captures various aspects of stock price movements, providing a solid foundation for the LSTM model to learn from.
9.4.2 Enhanced LSTM Architecture
In this step, we will engineer a advanced and robust LSTM architecture, incorporating multiple layers and implementing dropout techniques for effective regularization. This enhanced design aims to capture complex temporal dependencies in the time series data while mitigating overfitting issues.
By strategically adding depth to our network and introducing dropout layers, we seek to improve the model's ability to generalize from the training data and make more accurate predictions on unseen stock price patterns. The sophisticated architecture we'll construct will balance the trade-off between model complexity and generalization capability, potentially leading to superior forecasting performance in our stock price prediction task.
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout, BatchNormalization
from tensorflow.keras.optimizers import Adam
def build_improved_lstm_model(input_shape):
model = Sequential([
LSTM(100, return_sequences=True, input_shape=input_shape),
BatchNormalization(),
Dropout(0.2),
LSTM(100, return_sequences=True),
BatchNormalization(),
Dropout(0.2),
LSTM(100),
BatchNormalization(),
Dropout(0.2),
Dense(50, activation='relu'),
Dense(1)
])
model.compile(optimizer=Adam(learning_rate=0.001), loss='mean_squared_error')
return model
model = build_improved_lstm_model((X_train.shape[1], X_train.shape[2]))
model.summary()
Here's a breakdown:
- Imports: The necessary Keras modules are imported to build the model.
- Model Architecture: The
build_improved_lstm_model
function creates a Sequential model with the following layers:- Three LSTM layers with 100 units each, with the first two returning sequences
- BatchNormalization layers after each LSTM layer for normalizing activations
- Dropout layers (20% rate) for regularization to prevent overfitting
- A Dense layer with 50 units and ReLU activation
- A final Dense layer with 1 unit for output prediction
- Model Compilation: The model is compiled using the Adam optimizer with a learning rate of 0.001 and mean squared error as the loss function.
- Model Creation: An instance of the model is created using the input shape from the training data.
- Model Summary: The
model.summary()
call prints out the structure of the model, showing the layers and the number of parameters.
This architecture aims to capture complex temporal dependencies in the stock price data while using techniques like dropout and batch normalization to improve generalization and training stability.
9.4.3 Training with Early Stopping and Learning Rate Scheduling
To enhance the training process and optimize model performance, we will implement two key techniques: early stopping and learning rate scheduling. Early stopping helps prevent overfitting by halting the training process when the model's performance on the validation set stops improving. This ensures that we capture the model at its peak generalization ability.
Learning rate scheduling, on the other hand, dynamically adjusts the learning rate during training. This adaptive approach allows the model to make larger updates in the early stages of training and finer adjustments as it converges, potentially leading to faster convergence and better overall performance.
By incorporating these advanced training strategies, we aim to achieve a more efficient training process and a model that generalizes well to unseen data, ultimately improving our stock price predictions.
from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau
early_stopping = EarlyStopping(monitor='val_loss', patience=20, restore_best_weights=True)
lr_scheduler = ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=10, min_lr=0.00001)
history = model.fit(
X_train, y_train,
epochs=200,
batch_size=32,
validation_split=0.2,
callbacks=[early_stopping, lr_scheduler],
verbose=1
)
Here's a breakdown of the code:
- Importing Callbacks: The code imports EarlyStopping and ReduceLROnPlateau from Keras callbacks.
- Early Stopping: This technique stops training when the model's performance on the validation set stops improving. The parameters are:
- monitor='val_loss': It watches the validation loss
- patience=20: It will wait for 20 epochs before stopping if no improvement is seen
- restore_best_weights=True: It will restore the model weights from the epoch with the best value of the monitored quantity
- Learning Rate Scheduler: This adjusts the learning rate during training. The parameters are:
- monitor='val_loss': It watches the validation loss
- factor=0.5: It will reduce the learning rate by half when triggered
- patience=10: It will wait for 10 epochs before reducing the learning rate
- min_lr=0.00001: The minimum learning rate
- Model Training: The model.fit() function trains the model with these parameters:
- epochs=200: Maximum number of training epochs
- batch_size=32: Number of samples per gradient update
- validation_split=0.2: 20% of the training data will be used for validation
- callbacks=[early_stopping, lr_scheduler]: The early stopping and learning rate scheduler are applied during training
- verbose=1: This will show progress bars during training
These techniques aim to improve the training process, prevent overfitting, and potentially lead to better model performance.
9.4.4 Model Evaluation and Visualization
To thoroughly assess the model's performance and gain deeper insights into its predictions, we will implement a comprehensive evaluation strategy. This approach will include various quantitative metrics to measure accuracy and error, as well as visual representations of the model's predictions compared to actual values. By combining these methods, we can better understand the strengths and limitations of our LSTM model in forecasting stock prices.
Our evaluation will encompass the following key components:
- Calculation of standard regression metrics such as Mean Squared Error (MSE), Mean Absolute Error (MAE), and R-squared (R2) score
- Time series plots comparing predicted values against actual stock prices
- Residual analysis to identify any patterns in prediction errors
- Rolling window evaluation to assess model performance over different time periods
This multi-faceted evaluation approach will provide a nuanced understanding of our model's predictive capabilities and help identify areas for potential improvement in future iterations.
import matplotlib.pyplot as plt
# Make predictions
train_predictions = model.predict(X_train)
test_predictions = model.predict(X_test)
# Inverse transform predictions
train_predictions = scaler.inverse_transform(np.concatenate((train_predictions, np.zeros((len(train_predictions), 5))), axis=1))[:, 0]
test_predictions = scaler.inverse_transform(np.concatenate((test_predictions, np.zeros((len(test_predictions), 5))), axis=1))[:, 0]
y_train_actual = scaler.inverse_transform(np.concatenate((y_train.reshape(-1, 1), np.zeros((len(y_train), 5))), axis=1))[:, 0]
y_test_actual = scaler.inverse_transform(np.concatenate((y_test.reshape(-1, 1), np.zeros((len(y_test), 5))), axis=1))[:, 0]
# Visualize predictions
plt.figure(figsize=(15, 6))
plt.plot(y_test_actual, label='Actual')
plt.plot(test_predictions, label='Predicted')
plt.title('LSTM Model: Actual vs Predicted Stock Prices')
plt.xlabel('Time')
plt.ylabel('Stock Price')
plt.legend()
plt.show()
# Evaluate model performance
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
mse = mean_squared_error(y_test_actual, test_predictions)
mae = mean_absolute_error(y_test_actual, test_predictions)
r2 = r2_score(y_test_actual, test_predictions)
print(f'Mean Squared Error: {mse}')
print(f'Mean Absolute Error: {mae}')
print(f'R-squared Score: {r2}')
Here's a breakdown of what the code does:
- Predictions: The model makes predictions on both the training and test datasets.
- Inverse Transformation: The predictions and actual values are inverse transformed to convert them back to their original scale. This is necessary because the data was initially scaled during preprocessing.
- Visualization: A plot is created to compare the actual stock prices with the predicted ones for the test set. This visual representation helps in understanding how well the model's predictions align with the real data.
- Performance Metrics: The code calculates three key performance metrics:
- Mean Squared Error (MSE): Measures the average squared difference between predicted and actual values.
- Mean Absolute Error (MAE): Measures the average absolute difference between predicted and actual values.
- R-squared Score (R2): Indicates the proportion of the variance in the dependent variable that is predictable from the independent variable(s).
These metrics provide a quantitative assessment of the model's performance, helping to evaluate its accuracy and predictive power in forecasting stock prices.
9.4.5 Feature Importance Analysis
To gain deeper insights into our model's decision-making process, we will implement a comprehensive feature importance analysis. This crucial step will help us understand which features contribute most significantly to the predictions, allowing us to:
- Identify the most influential factors in stock price movements
- Potentially refine our feature selection for future iterations
- Provide valuable insights to stakeholders about key drivers of stock price changes
We'll use permutation importance, a model-agnostic method that measures the increase in prediction error after permuting each feature. This approach will give us a clear picture of each feature's impact on our LSTM model's performance.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.inspection import permutation_importance
def reshape_features(X):
"""Reshape 3D sequence data (samples, timesteps, features) into 2D for feature importance analysis."""
return X.reshape((X.shape[0], -1))
# Reshape X_test for permutation importance analysis
X_test_reshaped = reshape_features(X_test)
# Define a wrapper function for Keras model predictions
def model_predict(X):
X = X.reshape((-1, sequence_length, X.shape[1] // sequence_length)) # Reshape back to 3D
return model.predict(X, verbose=0).flatten()
# Compute permutation importance
r = permutation_importance(model_predict, X_test_reshaped, y_test, n_repeats=10, random_state=42, scoring='neg_mean_squared_error')
# Adjust feature names for the reshaped input
feature_names_expanded = [f"{feature}_t{t}" for t in range(sequence_length) for feature in ['Close', 'Volume', 'Returns', 'MA50', 'MA200', 'Volume_MA']]
feature_importance = pd.DataFrame({'feature': feature_names_expanded, 'importance': r.importances_mean})
# Aggregate importance scores for each original feature
feature_importance = feature_importance.groupby(lambda x: feature_importance['feature'][x].split('_')[0]).mean()
feature_importance = feature_importance.sort_values('importance', ascending=False)
# Plot feature importance
plt.figure(figsize=(10, 6))
plt.bar(feature_importance.index, feature_importance['importance'])
plt.title('Feature Importance (Permutation Importance)')
plt.xlabel('Features')
plt.ylabel('Importance')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()
Here's a breakdown of what the code does:
- Imports
permutation_importance
fromscikit-learn
.- This function helps evaluate how much each feature contributes to model predictions by randomly shuffling feature values and measuring the impact on accuracy.
- Defines
reshape_features()
to flatten 3D sequential input (samples, timesteps, features
) into a 2D format (samples, features × timesteps
).- This is necessary because
permutation_importance
expects a 2D array as input.
- This is necessary because
- Reshapes
X_test
usingreshape_features(X_test)
.- This step ensures that the test data has the correct format for permutation importance analysis.
- Defines
model_predict()
to adapt the LSTM model'spredict()
method to work withpermutation_importance
.- Since LSTMs expect 3D input (
samples, timesteps, features
), this function reshapes the data back to 3D before making predictions.
- Since LSTMs expect 3D input (
- Calculates permutation importance using:
- The trained LSTM model
- The reshaped test data
- The test labels (
y_test
) n_repeats=10
for stability, meaning the importance calculation is repeated 10 times.
- Generates expanded feature names to reflect multiple timesteps in the sequential input.
- Each feature name is appended with its timestep index (e.g.,
Close_t0
,Close_t1
, ...). - This ensures that features from different timesteps are differentiated in the importance analysis.
- Each feature name is appended with its timestep index (e.g.,
- Creates a DataFrame that:
- Maps feature names to their importance scores.
- Groups by original feature names (e.g., aggregating
Close_t0
toClose_t59
intoClose
). - Averages importance scores per feature and sorts them in descending order.
- Creates a bar plot to visualize feature importance scores.
- The most important features appear at the top, helping identify which factors have the highest impact on stock price predictions.
9.4.6 Ensemble Method
To enhance the robustness and accuracy of our predictions, we will implement an ensemble of LSTM models. This approach involves training multiple LSTM models independently and then combining their predictions. By leveraging the collective wisdom of multiple models, we can potentially achieve more stable and accurate forecasts.
The ensemble method can help mitigate individual model biases and reduce the impact of overfitting, leading to improved overall performance in stock price prediction. This technique is particularly valuable in the context of financial forecasting, where small improvements in accuracy can translate to significant real-world implications.
def create_ensemble(n_models, input_shape):
models = []
for _ in range(n_models):
model = build_improved_lstm_model(input_shape)
models.append(model)
return models
n_models = 3
ensemble = create_ensemble(n_models, (X_train.shape[1], X_train.shape[2]))
# Train each model in the ensemble
for i, model in enumerate(ensemble):
print(f"Training model {i+1}/{n_models}")
model.fit(X_train, y_train, epochs=100, batch_size=32, validation_split=0.2,
callbacks=[early_stopping, lr_scheduler], verbose=0)
# Make ensemble predictions
ensemble_predictions = np.mean([model.predict(X_test) for model in ensemble], axis=0)
# Inverse transform ensemble predictions
ensemble_predictions = scaler.inverse_transform(np.concatenate((ensemble_predictions, np.zeros((len(ensemble_predictions), 5))), axis=1))[:, 0]
# Evaluate ensemble performance
ensemble_mse = mean_squared_error(y_test_actual, ensemble_predictions)
ensemble_mae = mean_absolute_error(y_test_actual, ensemble_predictions)
ensemble_r2 = r2_score(y_test_actual, ensemble_predictions)
print(f'Ensemble Mean Squared Error: {ensemble_mse}')
print(f'Ensemble Mean Absolute Error: {ensemble_mae}')
print(f'Ensemble R-squared Score: {ensemble_r2}')
Here's a code breakdown:
- Create Ensemble Function: The
create_ensemble()
function creates multiple LSTM models, each with the same architecture but potentially different initializations. - Ensemble Creation: An ensemble of 3 models is created using the input shape of the training data.
- Model Training: Each model in the ensemble is trained independently on the same training data, using early stopping and learning rate scheduling for optimization.
- Ensemble Predictions: Predictions are made by averaging the outputs of all models in the ensemble.
- Inverse Transformation: The ensemble predictions are inverse transformed to convert them back to their original scale.
- Performance Evaluation: The ensemble's performance is evaluated using Mean Squared Error (MSE), Mean Absolute Error (MAE), and R-squared (R2) score.
This ensemble approach aims to improve prediction accuracy and robustness by leveraging multiple models, potentially mitigating individual model biases and reducing overfitting.
9.4.7 Conclusion
This improved project demonstrates several enhancements to the original LSTM-based time series forecasting task. We've implemented a more sophisticated data preprocessing pipeline, including additional features and proper scaling. The LSTM architecture has been improved with multiple layers, batch normalization, and dropout for better regularization.
We've also incorporated advanced training techniques such as early stopping and learning rate scheduling. The evaluation process now includes comprehensive metrics and visualizations, providing deeper insights into the model's performance. Additionally, we've introduced feature importance analysis to understand the impact of different inputs on the predictions.
Finally, an ensemble method has been implemented to potentially improve prediction accuracy and robustness. These improvements provide a more robust and insightful approach to time series forecasting, particularly in the context of stock price prediction.
9.4 Project 4: Time Series Forecasting with LSTMs (Improved)
Time series forecasting plays a pivotal role across numerous domains, including but not limited to financial analysis, meteorological predictions, and demand estimation in supply chain management. This project delves into the application of Long Short-Term Memory (LSTM) networks, a sophisticated type of recurrent neural network, for the purpose of predicting future values within a time series. Our specific focus lies in the realm of stock price prediction, a challenging and economically significant application of time series forecasting.
Building upon our original project, we aim to implement a series of enhancements designed to significantly boost both the performance and robustness of our model. These improvements encompass various aspects of the machine learning pipeline, from data preprocessing and feature engineering to model architecture and training methodologies. By incorporating these advancements, we seek to create a more accurate, reliable, and interpretable forecasting system that can effectively capture the complex patterns and dependencies inherent in stock price movements.
Through this enhanced approach, we not only aim to improve predictive accuracy but also to gain deeper insights into the underlying factors driving stock price fluctuations. This project serves as a comprehensive exploration of state-of-the-art techniques in time series forecasting, demonstrating the potential of advanced machine learning methods to tackle real-world financial prediction challenges.
9.4.1 Data Collection and Preprocessing
To enhance the robustness of our dataset, we will implement comprehensive data collection and preprocessing steps. This expansion involves gathering a wider range of historical data, incorporating additional relevant features, and applying advanced preprocessing techniques.
By doing so, we aim to create a more comprehensive and informative dataset that captures the nuanced patterns and relationships within the stock price movements. This improved dataset will serve as a solid foundation for our LSTM model, potentially leading to more accurate and reliable predictions.
import pandas as pd
import numpy as np
import yfinance as yf
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
# Fetch more historical data and additional features
stock_data = yf.download('GOOGL', start='2000-01-01', end='2023-12-31')
stock_data['Returns'] = stock_data['Close'].pct_change()
stock_data['MA50'] = stock_data['Close'].rolling(window=50).mean()
stock_data['MA200'] = stock_data['Close'].rolling(window=200).mean()
stock_data['Volume_MA'] = stock_data['Volume'].rolling(window=20).mean()
stock_data.dropna(inplace=True)
# Normalize the data
scaler = MinMaxScaler(feature_range=(0, 1))
scaled_data = scaler.fit_transform(stock_data[['Close', 'Volume', 'Returns', 'MA50', 'MA200', 'Volume_MA']])
# Create sequences
def create_sequences(data, seq_length):
X, y = [], []
for i in range(len(data) - seq_length):
X.append(data[i:(i + seq_length), :])
y.append(data[i + seq_length, 0])
return np.array(X), np.array(y)
sequence_length = 60
X, y = create_sequences(scaled_data, sequence_length)
# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
Here's a breakdown:
- Data Collection: The code uses the yfinance library to download historical stock data for Google (GOOGL) from January 1, 2000, to December 31, 2023.
- Feature Engineering: Several new features are created:
- Returns: Percentage change in closing price
- MA50: 50-day moving average of closing price
- MA200: 200-day moving average of closing price
- Volume_MA: 20-day moving average of trading volume
- Data Normalization: The MinMaxScaler is used to scale all features to a range between 0 and 1, which is important for neural network training.
- Sequence Creation: A function create_sequences() is defined to generate input sequences and corresponding target values. It uses a sliding window approach with a sequence length of 60 days.
- Data Splitting: The dataset is split into training and testing sets, with 20% of the data reserved for testing.
This preprocessing pipeline creates a robust dataset that captures various aspects of stock price movements, providing a solid foundation for the LSTM model to learn from.
9.4.2 Enhanced LSTM Architecture
In this step, we will engineer a advanced and robust LSTM architecture, incorporating multiple layers and implementing dropout techniques for effective regularization. This enhanced design aims to capture complex temporal dependencies in the time series data while mitigating overfitting issues.
By strategically adding depth to our network and introducing dropout layers, we seek to improve the model's ability to generalize from the training data and make more accurate predictions on unseen stock price patterns. The sophisticated architecture we'll construct will balance the trade-off between model complexity and generalization capability, potentially leading to superior forecasting performance in our stock price prediction task.
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout, BatchNormalization
from tensorflow.keras.optimizers import Adam
def build_improved_lstm_model(input_shape):
model = Sequential([
LSTM(100, return_sequences=True, input_shape=input_shape),
BatchNormalization(),
Dropout(0.2),
LSTM(100, return_sequences=True),
BatchNormalization(),
Dropout(0.2),
LSTM(100),
BatchNormalization(),
Dropout(0.2),
Dense(50, activation='relu'),
Dense(1)
])
model.compile(optimizer=Adam(learning_rate=0.001), loss='mean_squared_error')
return model
model = build_improved_lstm_model((X_train.shape[1], X_train.shape[2]))
model.summary()
Here's a breakdown:
- Imports: The necessary Keras modules are imported to build the model.
- Model Architecture: The
build_improved_lstm_model
function creates a Sequential model with the following layers:- Three LSTM layers with 100 units each, with the first two returning sequences
- BatchNormalization layers after each LSTM layer for normalizing activations
- Dropout layers (20% rate) for regularization to prevent overfitting
- A Dense layer with 50 units and ReLU activation
- A final Dense layer with 1 unit for output prediction
- Model Compilation: The model is compiled using the Adam optimizer with a learning rate of 0.001 and mean squared error as the loss function.
- Model Creation: An instance of the model is created using the input shape from the training data.
- Model Summary: The
model.summary()
call prints out the structure of the model, showing the layers and the number of parameters.
This architecture aims to capture complex temporal dependencies in the stock price data while using techniques like dropout and batch normalization to improve generalization and training stability.
9.4.3 Training with Early Stopping and Learning Rate Scheduling
To enhance the training process and optimize model performance, we will implement two key techniques: early stopping and learning rate scheduling. Early stopping helps prevent overfitting by halting the training process when the model's performance on the validation set stops improving. This ensures that we capture the model at its peak generalization ability.
Learning rate scheduling, on the other hand, dynamically adjusts the learning rate during training. This adaptive approach allows the model to make larger updates in the early stages of training and finer adjustments as it converges, potentially leading to faster convergence and better overall performance.
By incorporating these advanced training strategies, we aim to achieve a more efficient training process and a model that generalizes well to unseen data, ultimately improving our stock price predictions.
from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau
early_stopping = EarlyStopping(monitor='val_loss', patience=20, restore_best_weights=True)
lr_scheduler = ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=10, min_lr=0.00001)
history = model.fit(
X_train, y_train,
epochs=200,
batch_size=32,
validation_split=0.2,
callbacks=[early_stopping, lr_scheduler],
verbose=1
)
Here's a breakdown of the code:
- Importing Callbacks: The code imports EarlyStopping and ReduceLROnPlateau from Keras callbacks.
- Early Stopping: This technique stops training when the model's performance on the validation set stops improving. The parameters are:
- monitor='val_loss': It watches the validation loss
- patience=20: It will wait for 20 epochs before stopping if no improvement is seen
- restore_best_weights=True: It will restore the model weights from the epoch with the best value of the monitored quantity
- Learning Rate Scheduler: This adjusts the learning rate during training. The parameters are:
- monitor='val_loss': It watches the validation loss
- factor=0.5: It will reduce the learning rate by half when triggered
- patience=10: It will wait for 10 epochs before reducing the learning rate
- min_lr=0.00001: The minimum learning rate
- Model Training: The model.fit() function trains the model with these parameters:
- epochs=200: Maximum number of training epochs
- batch_size=32: Number of samples per gradient update
- validation_split=0.2: 20% of the training data will be used for validation
- callbacks=[early_stopping, lr_scheduler]: The early stopping and learning rate scheduler are applied during training
- verbose=1: This will show progress bars during training
These techniques aim to improve the training process, prevent overfitting, and potentially lead to better model performance.
9.4.4 Model Evaluation and Visualization
To thoroughly assess the model's performance and gain deeper insights into its predictions, we will implement a comprehensive evaluation strategy. This approach will include various quantitative metrics to measure accuracy and error, as well as visual representations of the model's predictions compared to actual values. By combining these methods, we can better understand the strengths and limitations of our LSTM model in forecasting stock prices.
Our evaluation will encompass the following key components:
- Calculation of standard regression metrics such as Mean Squared Error (MSE), Mean Absolute Error (MAE), and R-squared (R2) score
- Time series plots comparing predicted values against actual stock prices
- Residual analysis to identify any patterns in prediction errors
- Rolling window evaluation to assess model performance over different time periods
This multi-faceted evaluation approach will provide a nuanced understanding of our model's predictive capabilities and help identify areas for potential improvement in future iterations.
import matplotlib.pyplot as plt
# Make predictions
train_predictions = model.predict(X_train)
test_predictions = model.predict(X_test)
# Inverse transform predictions
train_predictions = scaler.inverse_transform(np.concatenate((train_predictions, np.zeros((len(train_predictions), 5))), axis=1))[:, 0]
test_predictions = scaler.inverse_transform(np.concatenate((test_predictions, np.zeros((len(test_predictions), 5))), axis=1))[:, 0]
y_train_actual = scaler.inverse_transform(np.concatenate((y_train.reshape(-1, 1), np.zeros((len(y_train), 5))), axis=1))[:, 0]
y_test_actual = scaler.inverse_transform(np.concatenate((y_test.reshape(-1, 1), np.zeros((len(y_test), 5))), axis=1))[:, 0]
# Visualize predictions
plt.figure(figsize=(15, 6))
plt.plot(y_test_actual, label='Actual')
plt.plot(test_predictions, label='Predicted')
plt.title('LSTM Model: Actual vs Predicted Stock Prices')
plt.xlabel('Time')
plt.ylabel('Stock Price')
plt.legend()
plt.show()
# Evaluate model performance
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
mse = mean_squared_error(y_test_actual, test_predictions)
mae = mean_absolute_error(y_test_actual, test_predictions)
r2 = r2_score(y_test_actual, test_predictions)
print(f'Mean Squared Error: {mse}')
print(f'Mean Absolute Error: {mae}')
print(f'R-squared Score: {r2}')
Here's a breakdown of what the code does:
- Predictions: The model makes predictions on both the training and test datasets.
- Inverse Transformation: The predictions and actual values are inverse transformed to convert them back to their original scale. This is necessary because the data was initially scaled during preprocessing.
- Visualization: A plot is created to compare the actual stock prices with the predicted ones for the test set. This visual representation helps in understanding how well the model's predictions align with the real data.
- Performance Metrics: The code calculates three key performance metrics:
- Mean Squared Error (MSE): Measures the average squared difference between predicted and actual values.
- Mean Absolute Error (MAE): Measures the average absolute difference between predicted and actual values.
- R-squared Score (R2): Indicates the proportion of the variance in the dependent variable that is predictable from the independent variable(s).
These metrics provide a quantitative assessment of the model's performance, helping to evaluate its accuracy and predictive power in forecasting stock prices.
9.4.5 Feature Importance Analysis
To gain deeper insights into our model's decision-making process, we will implement a comprehensive feature importance analysis. This crucial step will help us understand which features contribute most significantly to the predictions, allowing us to:
- Identify the most influential factors in stock price movements
- Potentially refine our feature selection for future iterations
- Provide valuable insights to stakeholders about key drivers of stock price changes
We'll use permutation importance, a model-agnostic method that measures the increase in prediction error after permuting each feature. This approach will give us a clear picture of each feature's impact on our LSTM model's performance.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.inspection import permutation_importance
def reshape_features(X):
"""Reshape 3D sequence data (samples, timesteps, features) into 2D for feature importance analysis."""
return X.reshape((X.shape[0], -1))
# Reshape X_test for permutation importance analysis
X_test_reshaped = reshape_features(X_test)
# Define a wrapper function for Keras model predictions
def model_predict(X):
X = X.reshape((-1, sequence_length, X.shape[1] // sequence_length)) # Reshape back to 3D
return model.predict(X, verbose=0).flatten()
# Compute permutation importance
r = permutation_importance(model_predict, X_test_reshaped, y_test, n_repeats=10, random_state=42, scoring='neg_mean_squared_error')
# Adjust feature names for the reshaped input
feature_names_expanded = [f"{feature}_t{t}" for t in range(sequence_length) for feature in ['Close', 'Volume', 'Returns', 'MA50', 'MA200', 'Volume_MA']]
feature_importance = pd.DataFrame({'feature': feature_names_expanded, 'importance': r.importances_mean})
# Aggregate importance scores for each original feature
feature_importance = feature_importance.groupby(lambda x: feature_importance['feature'][x].split('_')[0]).mean()
feature_importance = feature_importance.sort_values('importance', ascending=False)
# Plot feature importance
plt.figure(figsize=(10, 6))
plt.bar(feature_importance.index, feature_importance['importance'])
plt.title('Feature Importance (Permutation Importance)')
plt.xlabel('Features')
plt.ylabel('Importance')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()
Here's a breakdown of what the code does:
- Imports
permutation_importance
fromscikit-learn
.- This function helps evaluate how much each feature contributes to model predictions by randomly shuffling feature values and measuring the impact on accuracy.
- Defines
reshape_features()
to flatten 3D sequential input (samples, timesteps, features
) into a 2D format (samples, features × timesteps
).- This is necessary because
permutation_importance
expects a 2D array as input.
- This is necessary because
- Reshapes
X_test
usingreshape_features(X_test)
.- This step ensures that the test data has the correct format for permutation importance analysis.
- Defines
model_predict()
to adapt the LSTM model'spredict()
method to work withpermutation_importance
.- Since LSTMs expect 3D input (
samples, timesteps, features
), this function reshapes the data back to 3D before making predictions.
- Since LSTMs expect 3D input (
- Calculates permutation importance using:
- The trained LSTM model
- The reshaped test data
- The test labels (
y_test
) n_repeats=10
for stability, meaning the importance calculation is repeated 10 times.
- Generates expanded feature names to reflect multiple timesteps in the sequential input.
- Each feature name is appended with its timestep index (e.g.,
Close_t0
,Close_t1
, ...). - This ensures that features from different timesteps are differentiated in the importance analysis.
- Each feature name is appended with its timestep index (e.g.,
- Creates a DataFrame that:
- Maps feature names to their importance scores.
- Groups by original feature names (e.g., aggregating
Close_t0
toClose_t59
intoClose
). - Averages importance scores per feature and sorts them in descending order.
- Creates a bar plot to visualize feature importance scores.
- The most important features appear at the top, helping identify which factors have the highest impact on stock price predictions.
9.4.6 Ensemble Method
To enhance the robustness and accuracy of our predictions, we will implement an ensemble of LSTM models. This approach involves training multiple LSTM models independently and then combining their predictions. By leveraging the collective wisdom of multiple models, we can potentially achieve more stable and accurate forecasts.
The ensemble method can help mitigate individual model biases and reduce the impact of overfitting, leading to improved overall performance in stock price prediction. This technique is particularly valuable in the context of financial forecasting, where small improvements in accuracy can translate to significant real-world implications.
def create_ensemble(n_models, input_shape):
models = []
for _ in range(n_models):
model = build_improved_lstm_model(input_shape)
models.append(model)
return models
n_models = 3
ensemble = create_ensemble(n_models, (X_train.shape[1], X_train.shape[2]))
# Train each model in the ensemble
for i, model in enumerate(ensemble):
print(f"Training model {i+1}/{n_models}")
model.fit(X_train, y_train, epochs=100, batch_size=32, validation_split=0.2,
callbacks=[early_stopping, lr_scheduler], verbose=0)
# Make ensemble predictions
ensemble_predictions = np.mean([model.predict(X_test) for model in ensemble], axis=0)
# Inverse transform ensemble predictions
ensemble_predictions = scaler.inverse_transform(np.concatenate((ensemble_predictions, np.zeros((len(ensemble_predictions), 5))), axis=1))[:, 0]
# Evaluate ensemble performance
ensemble_mse = mean_squared_error(y_test_actual, ensemble_predictions)
ensemble_mae = mean_absolute_error(y_test_actual, ensemble_predictions)
ensemble_r2 = r2_score(y_test_actual, ensemble_predictions)
print(f'Ensemble Mean Squared Error: {ensemble_mse}')
print(f'Ensemble Mean Absolute Error: {ensemble_mae}')
print(f'Ensemble R-squared Score: {ensemble_r2}')
Here's a code breakdown:
- Create Ensemble Function: The
create_ensemble()
function creates multiple LSTM models, each with the same architecture but potentially different initializations. - Ensemble Creation: An ensemble of 3 models is created using the input shape of the training data.
- Model Training: Each model in the ensemble is trained independently on the same training data, using early stopping and learning rate scheduling for optimization.
- Ensemble Predictions: Predictions are made by averaging the outputs of all models in the ensemble.
- Inverse Transformation: The ensemble predictions are inverse transformed to convert them back to their original scale.
- Performance Evaluation: The ensemble's performance is evaluated using Mean Squared Error (MSE), Mean Absolute Error (MAE), and R-squared (R2) score.
This ensemble approach aims to improve prediction accuracy and robustness by leveraging multiple models, potentially mitigating individual model biases and reducing overfitting.
9.4.7 Conclusion
This improved project demonstrates several enhancements to the original LSTM-based time series forecasting task. We've implemented a more sophisticated data preprocessing pipeline, including additional features and proper scaling. The LSTM architecture has been improved with multiple layers, batch normalization, and dropout for better regularization.
We've also incorporated advanced training techniques such as early stopping and learning rate scheduling. The evaluation process now includes comprehensive metrics and visualizations, providing deeper insights into the model's performance. Additionally, we've introduced feature importance analysis to understand the impact of different inputs on the predictions.
Finally, an ensemble method has been implemented to potentially improve prediction accuracy and robustness. These improvements provide a more robust and insightful approach to time series forecasting, particularly in the context of stock price prediction.
9.4 Project 4: Time Series Forecasting with LSTMs (Improved)
Time series forecasting plays a pivotal role across numerous domains, including but not limited to financial analysis, meteorological predictions, and demand estimation in supply chain management. This project delves into the application of Long Short-Term Memory (LSTM) networks, a sophisticated type of recurrent neural network, for the purpose of predicting future values within a time series. Our specific focus lies in the realm of stock price prediction, a challenging and economically significant application of time series forecasting.
Building upon our original project, we aim to implement a series of enhancements designed to significantly boost both the performance and robustness of our model. These improvements encompass various aspects of the machine learning pipeline, from data preprocessing and feature engineering to model architecture and training methodologies. By incorporating these advancements, we seek to create a more accurate, reliable, and interpretable forecasting system that can effectively capture the complex patterns and dependencies inherent in stock price movements.
Through this enhanced approach, we not only aim to improve predictive accuracy but also to gain deeper insights into the underlying factors driving stock price fluctuations. This project serves as a comprehensive exploration of state-of-the-art techniques in time series forecasting, demonstrating the potential of advanced machine learning methods to tackle real-world financial prediction challenges.
9.4.1 Data Collection and Preprocessing
To enhance the robustness of our dataset, we will implement comprehensive data collection and preprocessing steps. This expansion involves gathering a wider range of historical data, incorporating additional relevant features, and applying advanced preprocessing techniques.
By doing so, we aim to create a more comprehensive and informative dataset that captures the nuanced patterns and relationships within the stock price movements. This improved dataset will serve as a solid foundation for our LSTM model, potentially leading to more accurate and reliable predictions.
import pandas as pd
import numpy as np
import yfinance as yf
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
# Fetch more historical data and additional features
stock_data = yf.download('GOOGL', start='2000-01-01', end='2023-12-31')
stock_data['Returns'] = stock_data['Close'].pct_change()
stock_data['MA50'] = stock_data['Close'].rolling(window=50).mean()
stock_data['MA200'] = stock_data['Close'].rolling(window=200).mean()
stock_data['Volume_MA'] = stock_data['Volume'].rolling(window=20).mean()
stock_data.dropna(inplace=True)
# Normalize the data
scaler = MinMaxScaler(feature_range=(0, 1))
scaled_data = scaler.fit_transform(stock_data[['Close', 'Volume', 'Returns', 'MA50', 'MA200', 'Volume_MA']])
# Create sequences
def create_sequences(data, seq_length):
X, y = [], []
for i in range(len(data) - seq_length):
X.append(data[i:(i + seq_length), :])
y.append(data[i + seq_length, 0])
return np.array(X), np.array(y)
sequence_length = 60
X, y = create_sequences(scaled_data, sequence_length)
# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
Here's a breakdown:
- Data Collection: The code uses the yfinance library to download historical stock data for Google (GOOGL) from January 1, 2000, to December 31, 2023.
- Feature Engineering: Several new features are created:
- Returns: Percentage change in closing price
- MA50: 50-day moving average of closing price
- MA200: 200-day moving average of closing price
- Volume_MA: 20-day moving average of trading volume
- Data Normalization: The MinMaxScaler is used to scale all features to a range between 0 and 1, which is important for neural network training.
- Sequence Creation: A function create_sequences() is defined to generate input sequences and corresponding target values. It uses a sliding window approach with a sequence length of 60 days.
- Data Splitting: The dataset is split into training and testing sets, with 20% of the data reserved for testing.
This preprocessing pipeline creates a robust dataset that captures various aspects of stock price movements, providing a solid foundation for the LSTM model to learn from.
9.4.2 Enhanced LSTM Architecture
In this step, we will engineer a advanced and robust LSTM architecture, incorporating multiple layers and implementing dropout techniques for effective regularization. This enhanced design aims to capture complex temporal dependencies in the time series data while mitigating overfitting issues.
By strategically adding depth to our network and introducing dropout layers, we seek to improve the model's ability to generalize from the training data and make more accurate predictions on unseen stock price patterns. The sophisticated architecture we'll construct will balance the trade-off between model complexity and generalization capability, potentially leading to superior forecasting performance in our stock price prediction task.
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout, BatchNormalization
from tensorflow.keras.optimizers import Adam
def build_improved_lstm_model(input_shape):
model = Sequential([
LSTM(100, return_sequences=True, input_shape=input_shape),
BatchNormalization(),
Dropout(0.2),
LSTM(100, return_sequences=True),
BatchNormalization(),
Dropout(0.2),
LSTM(100),
BatchNormalization(),
Dropout(0.2),
Dense(50, activation='relu'),
Dense(1)
])
model.compile(optimizer=Adam(learning_rate=0.001), loss='mean_squared_error')
return model
model = build_improved_lstm_model((X_train.shape[1], X_train.shape[2]))
model.summary()
Here's a breakdown:
- Imports: The necessary Keras modules are imported to build the model.
- Model Architecture: The
build_improved_lstm_model
function creates a Sequential model with the following layers:- Three LSTM layers with 100 units each, with the first two returning sequences
- BatchNormalization layers after each LSTM layer for normalizing activations
- Dropout layers (20% rate) for regularization to prevent overfitting
- A Dense layer with 50 units and ReLU activation
- A final Dense layer with 1 unit for output prediction
- Model Compilation: The model is compiled using the Adam optimizer with a learning rate of 0.001 and mean squared error as the loss function.
- Model Creation: An instance of the model is created using the input shape from the training data.
- Model Summary: The
model.summary()
call prints out the structure of the model, showing the layers and the number of parameters.
This architecture aims to capture complex temporal dependencies in the stock price data while using techniques like dropout and batch normalization to improve generalization and training stability.
9.4.3 Training with Early Stopping and Learning Rate Scheduling
To enhance the training process and optimize model performance, we will implement two key techniques: early stopping and learning rate scheduling. Early stopping helps prevent overfitting by halting the training process when the model's performance on the validation set stops improving. This ensures that we capture the model at its peak generalization ability.
Learning rate scheduling, on the other hand, dynamically adjusts the learning rate during training. This adaptive approach allows the model to make larger updates in the early stages of training and finer adjustments as it converges, potentially leading to faster convergence and better overall performance.
By incorporating these advanced training strategies, we aim to achieve a more efficient training process and a model that generalizes well to unseen data, ultimately improving our stock price predictions.
from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau
early_stopping = EarlyStopping(monitor='val_loss', patience=20, restore_best_weights=True)
lr_scheduler = ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=10, min_lr=0.00001)
history = model.fit(
X_train, y_train,
epochs=200,
batch_size=32,
validation_split=0.2,
callbacks=[early_stopping, lr_scheduler],
verbose=1
)
Here's a breakdown of the code:
- Importing Callbacks: The code imports EarlyStopping and ReduceLROnPlateau from Keras callbacks.
- Early Stopping: This technique stops training when the model's performance on the validation set stops improving. The parameters are:
- monitor='val_loss': It watches the validation loss
- patience=20: It will wait for 20 epochs before stopping if no improvement is seen
- restore_best_weights=True: It will restore the model weights from the epoch with the best value of the monitored quantity
- Learning Rate Scheduler: This adjusts the learning rate during training. The parameters are:
- monitor='val_loss': It watches the validation loss
- factor=0.5: It will reduce the learning rate by half when triggered
- patience=10: It will wait for 10 epochs before reducing the learning rate
- min_lr=0.00001: The minimum learning rate
- Model Training: The model.fit() function trains the model with these parameters:
- epochs=200: Maximum number of training epochs
- batch_size=32: Number of samples per gradient update
- validation_split=0.2: 20% of the training data will be used for validation
- callbacks=[early_stopping, lr_scheduler]: The early stopping and learning rate scheduler are applied during training
- verbose=1: This will show progress bars during training
These techniques aim to improve the training process, prevent overfitting, and potentially lead to better model performance.
9.4.4 Model Evaluation and Visualization
To thoroughly assess the model's performance and gain deeper insights into its predictions, we will implement a comprehensive evaluation strategy. This approach will include various quantitative metrics to measure accuracy and error, as well as visual representations of the model's predictions compared to actual values. By combining these methods, we can better understand the strengths and limitations of our LSTM model in forecasting stock prices.
Our evaluation will encompass the following key components:
- Calculation of standard regression metrics such as Mean Squared Error (MSE), Mean Absolute Error (MAE), and R-squared (R2) score
- Time series plots comparing predicted values against actual stock prices
- Residual analysis to identify any patterns in prediction errors
- Rolling window evaluation to assess model performance over different time periods
This multi-faceted evaluation approach will provide a nuanced understanding of our model's predictive capabilities and help identify areas for potential improvement in future iterations.
import matplotlib.pyplot as plt
# Make predictions
train_predictions = model.predict(X_train)
test_predictions = model.predict(X_test)
# Inverse transform predictions
train_predictions = scaler.inverse_transform(np.concatenate((train_predictions, np.zeros((len(train_predictions), 5))), axis=1))[:, 0]
test_predictions = scaler.inverse_transform(np.concatenate((test_predictions, np.zeros((len(test_predictions), 5))), axis=1))[:, 0]
y_train_actual = scaler.inverse_transform(np.concatenate((y_train.reshape(-1, 1), np.zeros((len(y_train), 5))), axis=1))[:, 0]
y_test_actual = scaler.inverse_transform(np.concatenate((y_test.reshape(-1, 1), np.zeros((len(y_test), 5))), axis=1))[:, 0]
# Visualize predictions
plt.figure(figsize=(15, 6))
plt.plot(y_test_actual, label='Actual')
plt.plot(test_predictions, label='Predicted')
plt.title('LSTM Model: Actual vs Predicted Stock Prices')
plt.xlabel('Time')
plt.ylabel('Stock Price')
plt.legend()
plt.show()
# Evaluate model performance
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
mse = mean_squared_error(y_test_actual, test_predictions)
mae = mean_absolute_error(y_test_actual, test_predictions)
r2 = r2_score(y_test_actual, test_predictions)
print(f'Mean Squared Error: {mse}')
print(f'Mean Absolute Error: {mae}')
print(f'R-squared Score: {r2}')
Here's a breakdown of what the code does:
- Predictions: The model makes predictions on both the training and test datasets.
- Inverse Transformation: The predictions and actual values are inverse transformed to convert them back to their original scale. This is necessary because the data was initially scaled during preprocessing.
- Visualization: A plot is created to compare the actual stock prices with the predicted ones for the test set. This visual representation helps in understanding how well the model's predictions align with the real data.
- Performance Metrics: The code calculates three key performance metrics:
- Mean Squared Error (MSE): Measures the average squared difference between predicted and actual values.
- Mean Absolute Error (MAE): Measures the average absolute difference between predicted and actual values.
- R-squared Score (R2): Indicates the proportion of the variance in the dependent variable that is predictable from the independent variable(s).
These metrics provide a quantitative assessment of the model's performance, helping to evaluate its accuracy and predictive power in forecasting stock prices.
9.4.5 Feature Importance Analysis
To gain deeper insights into our model's decision-making process, we will implement a comprehensive feature importance analysis. This crucial step will help us understand which features contribute most significantly to the predictions, allowing us to:
- Identify the most influential factors in stock price movements
- Potentially refine our feature selection for future iterations
- Provide valuable insights to stakeholders about key drivers of stock price changes
We'll use permutation importance, a model-agnostic method that measures the increase in prediction error after permuting each feature. This approach will give us a clear picture of each feature's impact on our LSTM model's performance.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.inspection import permutation_importance
def reshape_features(X):
"""Reshape 3D sequence data (samples, timesteps, features) into 2D for feature importance analysis."""
return X.reshape((X.shape[0], -1))
# Reshape X_test for permutation importance analysis
X_test_reshaped = reshape_features(X_test)
# Define a wrapper function for Keras model predictions
def model_predict(X):
X = X.reshape((-1, sequence_length, X.shape[1] // sequence_length)) # Reshape back to 3D
return model.predict(X, verbose=0).flatten()
# Compute permutation importance
r = permutation_importance(model_predict, X_test_reshaped, y_test, n_repeats=10, random_state=42, scoring='neg_mean_squared_error')
# Adjust feature names for the reshaped input
feature_names_expanded = [f"{feature}_t{t}" for t in range(sequence_length) for feature in ['Close', 'Volume', 'Returns', 'MA50', 'MA200', 'Volume_MA']]
feature_importance = pd.DataFrame({'feature': feature_names_expanded, 'importance': r.importances_mean})
# Aggregate importance scores for each original feature
feature_importance = feature_importance.groupby(lambda x: feature_importance['feature'][x].split('_')[0]).mean()
feature_importance = feature_importance.sort_values('importance', ascending=False)
# Plot feature importance
plt.figure(figsize=(10, 6))
plt.bar(feature_importance.index, feature_importance['importance'])
plt.title('Feature Importance (Permutation Importance)')
plt.xlabel('Features')
plt.ylabel('Importance')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()
Here's a breakdown of what the code does:
- Imports
permutation_importance
fromscikit-learn
.- This function helps evaluate how much each feature contributes to model predictions by randomly shuffling feature values and measuring the impact on accuracy.
- Defines
reshape_features()
to flatten 3D sequential input (samples, timesteps, features
) into a 2D format (samples, features × timesteps
).- This is necessary because
permutation_importance
expects a 2D array as input.
- This is necessary because
- Reshapes
X_test
usingreshape_features(X_test)
.- This step ensures that the test data has the correct format for permutation importance analysis.
- Defines
model_predict()
to adapt the LSTM model'spredict()
method to work withpermutation_importance
.- Since LSTMs expect 3D input (
samples, timesteps, features
), this function reshapes the data back to 3D before making predictions.
- Since LSTMs expect 3D input (
- Calculates permutation importance using:
- The trained LSTM model
- The reshaped test data
- The test labels (
y_test
) n_repeats=10
for stability, meaning the importance calculation is repeated 10 times.
- Generates expanded feature names to reflect multiple timesteps in the sequential input.
- Each feature name is appended with its timestep index (e.g.,
Close_t0
,Close_t1
, ...). - This ensures that features from different timesteps are differentiated in the importance analysis.
- Each feature name is appended with its timestep index (e.g.,
- Creates a DataFrame that:
- Maps feature names to their importance scores.
- Groups by original feature names (e.g., aggregating
Close_t0
toClose_t59
intoClose
). - Averages importance scores per feature and sorts them in descending order.
- Creates a bar plot to visualize feature importance scores.
- The most important features appear at the top, helping identify which factors have the highest impact on stock price predictions.
9.4.6 Ensemble Method
To enhance the robustness and accuracy of our predictions, we will implement an ensemble of LSTM models. This approach involves training multiple LSTM models independently and then combining their predictions. By leveraging the collective wisdom of multiple models, we can potentially achieve more stable and accurate forecasts.
The ensemble method can help mitigate individual model biases and reduce the impact of overfitting, leading to improved overall performance in stock price prediction. This technique is particularly valuable in the context of financial forecasting, where small improvements in accuracy can translate to significant real-world implications.
def create_ensemble(n_models, input_shape):
models = []
for _ in range(n_models):
model = build_improved_lstm_model(input_shape)
models.append(model)
return models
n_models = 3
ensemble = create_ensemble(n_models, (X_train.shape[1], X_train.shape[2]))
# Train each model in the ensemble
for i, model in enumerate(ensemble):
print(f"Training model {i+1}/{n_models}")
model.fit(X_train, y_train, epochs=100, batch_size=32, validation_split=0.2,
callbacks=[early_stopping, lr_scheduler], verbose=0)
# Make ensemble predictions
ensemble_predictions = np.mean([model.predict(X_test) for model in ensemble], axis=0)
# Inverse transform ensemble predictions
ensemble_predictions = scaler.inverse_transform(np.concatenate((ensemble_predictions, np.zeros((len(ensemble_predictions), 5))), axis=1))[:, 0]
# Evaluate ensemble performance
ensemble_mse = mean_squared_error(y_test_actual, ensemble_predictions)
ensemble_mae = mean_absolute_error(y_test_actual, ensemble_predictions)
ensemble_r2 = r2_score(y_test_actual, ensemble_predictions)
print(f'Ensemble Mean Squared Error: {ensemble_mse}')
print(f'Ensemble Mean Absolute Error: {ensemble_mae}')
print(f'Ensemble R-squared Score: {ensemble_r2}')
Here's a code breakdown:
- Create Ensemble Function: The
create_ensemble()
function creates multiple LSTM models, each with the same architecture but potentially different initializations. - Ensemble Creation: An ensemble of 3 models is created using the input shape of the training data.
- Model Training: Each model in the ensemble is trained independently on the same training data, using early stopping and learning rate scheduling for optimization.
- Ensemble Predictions: Predictions are made by averaging the outputs of all models in the ensemble.
- Inverse Transformation: The ensemble predictions are inverse transformed to convert them back to their original scale.
- Performance Evaluation: The ensemble's performance is evaluated using Mean Squared Error (MSE), Mean Absolute Error (MAE), and R-squared (R2) score.
This ensemble approach aims to improve prediction accuracy and robustness by leveraging multiple models, potentially mitigating individual model biases and reducing overfitting.
9.4.7 Conclusion
This improved project demonstrates several enhancements to the original LSTM-based time series forecasting task. We've implemented a more sophisticated data preprocessing pipeline, including additional features and proper scaling. The LSTM architecture has been improved with multiple layers, batch normalization, and dropout for better regularization.
We've also incorporated advanced training techniques such as early stopping and learning rate scheduling. The evaluation process now includes comprehensive metrics and visualizations, providing deeper insights into the model's performance. Additionally, we've introduced feature importance analysis to understand the impact of different inputs on the predictions.
Finally, an ensemble method has been implemented to potentially improve prediction accuracy and robustness. These improvements provide a more robust and insightful approach to time series forecasting, particularly in the context of stock price prediction.