Menu iconMenu iconData Analysis Foundations with Python
Data Analysis Foundations with Python

Project 2: Predicting House Prices

Model Building and Evaluation

Having crafted some wonderful features for our dataset, we're now ready for the grand finale—the part where we actually build our predictive model! Exciting, right? Let's dive in.  

Data Splitting

The first order of business is to divide our dataset into training and testing sets. This way, we can evaluate how well our model performs on unseen data.

from sklearn.model_selection import train_test_split

# Features and target variable
X = df.drop('House_Price', axis=1)
y = df['House_Price']

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Model Selection

For predicting house prices, a regression algorithm would be most appropriate. Let's start with a simple Linear Regression model.

from sklearn.linear_model import LinearRegression

# Initialize the model
model = LinearRegression()

# Train the model
model.fit(X_train, y_train)

Model Evaluation

After training, it's critical to assess how well our model is performing. We'll use metrics like R-squared and Root Mean Square Error (RMSE) for this purpose.

from sklearn.metrics import mean_squared_error, r2_score

# Predict on test data
y_pred = model.predict(X_test)

# Evaluate the model
r2 = r2_score(y_test, y_pred)
rmse = np.sqrt(mean_squared_error(y_test, y_pred))

print(f'R2 Score: {r2}')
print(f'RMSE: {rmse}')

Fine-Tuning

If the results are not satisfactory, consider fine-tuning your model by adding regularization, or try more advanced models like Random Forest or Gradient Boosting.

from sklearn.ensemble import RandomForestRegressor

# Initialize the Random Forest model
rf_model = RandomForestRegressor(n_estimators=100, random_state=42)

# Train the model
rf_model.fit(X_train, y_train)

# Evaluate the model
rf_y_pred = rf_model.predict(X_test)
rf_r2 = r2_score(y_test, rf_y_pred)
rf_rmse = np.sqrt(mean_squared_error(y_test, rf_y_pred))

print(f'Random Forest R2 Score: {rf_r2}')
print(f'Random Forest RMSE: {rf_rmse}')

Exporting the Trained Model

import joblib

# Save the model as a binary file
joblib.dump(your_final_model, 'house_price_predictor.pkl')

After all your hard work training and fine-tuning your model, you might want to save it for future use. By exporting the model using joblib, you can later reload it to make predictions on new data without having to retrain it.

And voila! You've completed your journey from gathering data to building and evaluating a model. This journey will help you understand the essence of machine learning and how to use it to solve real-world problems like predicting house prices.

Remember, machine learning is both an art and a science. It's an iterative process that requires a lot of fine-tuning and experimentation. So don't be discouraged if your first model isn't perfect. With practice, you'll become more adept at knowing which features to engineer, which models to use, and how to fine-tune them.

Thanks for following along! 

Model Building and Evaluation

Having crafted some wonderful features for our dataset, we're now ready for the grand finale—the part where we actually build our predictive model! Exciting, right? Let's dive in.  

Data Splitting

The first order of business is to divide our dataset into training and testing sets. This way, we can evaluate how well our model performs on unseen data.

from sklearn.model_selection import train_test_split

# Features and target variable
X = df.drop('House_Price', axis=1)
y = df['House_Price']

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Model Selection

For predicting house prices, a regression algorithm would be most appropriate. Let's start with a simple Linear Regression model.

from sklearn.linear_model import LinearRegression

# Initialize the model
model = LinearRegression()

# Train the model
model.fit(X_train, y_train)

Model Evaluation

After training, it's critical to assess how well our model is performing. We'll use metrics like R-squared and Root Mean Square Error (RMSE) for this purpose.

from sklearn.metrics import mean_squared_error, r2_score

# Predict on test data
y_pred = model.predict(X_test)

# Evaluate the model
r2 = r2_score(y_test, y_pred)
rmse = np.sqrt(mean_squared_error(y_test, y_pred))

print(f'R2 Score: {r2}')
print(f'RMSE: {rmse}')

Fine-Tuning

If the results are not satisfactory, consider fine-tuning your model by adding regularization, or try more advanced models like Random Forest or Gradient Boosting.

from sklearn.ensemble import RandomForestRegressor

# Initialize the Random Forest model
rf_model = RandomForestRegressor(n_estimators=100, random_state=42)

# Train the model
rf_model.fit(X_train, y_train)

# Evaluate the model
rf_y_pred = rf_model.predict(X_test)
rf_r2 = r2_score(y_test, rf_y_pred)
rf_rmse = np.sqrt(mean_squared_error(y_test, rf_y_pred))

print(f'Random Forest R2 Score: {rf_r2}')
print(f'Random Forest RMSE: {rf_rmse}')

Exporting the Trained Model

import joblib

# Save the model as a binary file
joblib.dump(your_final_model, 'house_price_predictor.pkl')

After all your hard work training and fine-tuning your model, you might want to save it for future use. By exporting the model using joblib, you can later reload it to make predictions on new data without having to retrain it.

And voila! You've completed your journey from gathering data to building and evaluating a model. This journey will help you understand the essence of machine learning and how to use it to solve real-world problems like predicting house prices.

Remember, machine learning is both an art and a science. It's an iterative process that requires a lot of fine-tuning and experimentation. So don't be discouraged if your first model isn't perfect. With practice, you'll become more adept at knowing which features to engineer, which models to use, and how to fine-tune them.

Thanks for following along! 

Model Building and Evaluation

Having crafted some wonderful features for our dataset, we're now ready for the grand finale—the part where we actually build our predictive model! Exciting, right? Let's dive in.  

Data Splitting

The first order of business is to divide our dataset into training and testing sets. This way, we can evaluate how well our model performs on unseen data.

from sklearn.model_selection import train_test_split

# Features and target variable
X = df.drop('House_Price', axis=1)
y = df['House_Price']

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Model Selection

For predicting house prices, a regression algorithm would be most appropriate. Let's start with a simple Linear Regression model.

from sklearn.linear_model import LinearRegression

# Initialize the model
model = LinearRegression()

# Train the model
model.fit(X_train, y_train)

Model Evaluation

After training, it's critical to assess how well our model is performing. We'll use metrics like R-squared and Root Mean Square Error (RMSE) for this purpose.

from sklearn.metrics import mean_squared_error, r2_score

# Predict on test data
y_pred = model.predict(X_test)

# Evaluate the model
r2 = r2_score(y_test, y_pred)
rmse = np.sqrt(mean_squared_error(y_test, y_pred))

print(f'R2 Score: {r2}')
print(f'RMSE: {rmse}')

Fine-Tuning

If the results are not satisfactory, consider fine-tuning your model by adding regularization, or try more advanced models like Random Forest or Gradient Boosting.

from sklearn.ensemble import RandomForestRegressor

# Initialize the Random Forest model
rf_model = RandomForestRegressor(n_estimators=100, random_state=42)

# Train the model
rf_model.fit(X_train, y_train)

# Evaluate the model
rf_y_pred = rf_model.predict(X_test)
rf_r2 = r2_score(y_test, rf_y_pred)
rf_rmse = np.sqrt(mean_squared_error(y_test, rf_y_pred))

print(f'Random Forest R2 Score: {rf_r2}')
print(f'Random Forest RMSE: {rf_rmse}')

Exporting the Trained Model

import joblib

# Save the model as a binary file
joblib.dump(your_final_model, 'house_price_predictor.pkl')

After all your hard work training and fine-tuning your model, you might want to save it for future use. By exporting the model using joblib, you can later reload it to make predictions on new data without having to retrain it.

And voila! You've completed your journey from gathering data to building and evaluating a model. This journey will help you understand the essence of machine learning and how to use it to solve real-world problems like predicting house prices.

Remember, machine learning is both an art and a science. It's an iterative process that requires a lot of fine-tuning and experimentation. So don't be discouraged if your first model isn't perfect. With practice, you'll become more adept at knowing which features to engineer, which models to use, and how to fine-tune them.

Thanks for following along! 

Model Building and Evaluation

Having crafted some wonderful features for our dataset, we're now ready for the grand finale—the part where we actually build our predictive model! Exciting, right? Let's dive in.  

Data Splitting

The first order of business is to divide our dataset into training and testing sets. This way, we can evaluate how well our model performs on unseen data.

from sklearn.model_selection import train_test_split

# Features and target variable
X = df.drop('House_Price', axis=1)
y = df['House_Price']

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Model Selection

For predicting house prices, a regression algorithm would be most appropriate. Let's start with a simple Linear Regression model.

from sklearn.linear_model import LinearRegression

# Initialize the model
model = LinearRegression()

# Train the model
model.fit(X_train, y_train)

Model Evaluation

After training, it's critical to assess how well our model is performing. We'll use metrics like R-squared and Root Mean Square Error (RMSE) for this purpose.

from sklearn.metrics import mean_squared_error, r2_score

# Predict on test data
y_pred = model.predict(X_test)

# Evaluate the model
r2 = r2_score(y_test, y_pred)
rmse = np.sqrt(mean_squared_error(y_test, y_pred))

print(f'R2 Score: {r2}')
print(f'RMSE: {rmse}')

Fine-Tuning

If the results are not satisfactory, consider fine-tuning your model by adding regularization, or try more advanced models like Random Forest or Gradient Boosting.

from sklearn.ensemble import RandomForestRegressor

# Initialize the Random Forest model
rf_model = RandomForestRegressor(n_estimators=100, random_state=42)

# Train the model
rf_model.fit(X_train, y_train)

# Evaluate the model
rf_y_pred = rf_model.predict(X_test)
rf_r2 = r2_score(y_test, rf_y_pred)
rf_rmse = np.sqrt(mean_squared_error(y_test, rf_y_pred))

print(f'Random Forest R2 Score: {rf_r2}')
print(f'Random Forest RMSE: {rf_rmse}')

Exporting the Trained Model

import joblib

# Save the model as a binary file
joblib.dump(your_final_model, 'house_price_predictor.pkl')

After all your hard work training and fine-tuning your model, you might want to save it for future use. By exporting the model using joblib, you can later reload it to make predictions on new data without having to retrain it.

And voila! You've completed your journey from gathering data to building and evaluating a model. This journey will help you understand the essence of machine learning and how to use it to solve real-world problems like predicting house prices.

Remember, machine learning is both an art and a science. It's an iterative process that requires a lot of fine-tuning and experimentation. So don't be discouraged if your first model isn't perfect. With practice, you'll become more adept at knowing which features to engineer, which models to use, and how to fine-tune them.

Thanks for following along!