Uber Lyft Fare Prediction

2021-07-21| Category: Regression, Web App| GitHub: Click to View GitHub Repository
work-single-image

Uber and Lyft EDA and Price Rate Prediction

Project Summary

My very first personal project that I had the courage to do it myself using Python is this simple project that used the Uber and Lyft Boston MA dataset. Based on the raw dataset that involved both ride and weather information, this project went through the data science process which performed exploratory data analysis (EDA) before focusing on machine learning with the aim to predict the fare rate of Uber and Lyft rides in Boston. Some of the core objectives are:

  1. To conduct exploratory analysis on the features of the dataset using statistics and visualisations.
  2. To compare the overall performance of different models.
Results

From the correlation heatmap shown above, most numeric features do not have strong correlation with price per mile which is the target except for some obvious features like price and distance.

Using the PyCaret library for machine learning, the comparison shows that Catboost regressor model has the best performance, followed by XGBoost regressor and Light Gradient Boosting Machine, obtaining a RMSE value of 0.9168 and a R-squared of 0.9857. After performing grid search hyperparameter tuning on Catboost regressor, the best performances of the model were shown to be 0.9020 for RMSE and 0.9860 for R-squared.

Model deployment

After saving the best model, an interactive prototype web application was built to create an experimental data product for further exploration. Using the Streamlit library, a simple app that predicts fare rates from user inputs was deployed successfully on Heroku PaaS. Although this is only a prototype web app, but it showcases the model’s ability to predict the output for any users which is good enough.

Note: Heroku server is pretty unstable and this may cause the app to stop functioning so you may have to refresh a few times, apologies for the inconvenience caused!

Deployed web app: Gradio app on Heroku

Dataset source: Kaggle dataset link

Dataset
Uber and Lyft Dataset Boston, MA dataset
Source of Data
Kaggle Open Dataset
Completion Date
February 15, 2021
Main Objective
Regression

SO WHAT DO YOU THINK ?

I’m planning to add a few more projects into my portfolio in future when I am free.
If you like my work and would like to have a chat, feel free to contact me or connect with me on LinkedIn!

Get in touch