Predicting hourly boarding demand of bus passengers using
imbalanced records from Deep learning approach
Under the esteemed Guidance of
- Dr. [Link]
Professor, CSE Department, GRIET
Team Members
[Link] - 21241A05L7
[Link] - 21241A05Q0
[Link] - 21241A05Q8
[Link] - 21241A05Q9
[Link] - 21241A05S0
ABSTRACT
Reference: Using asymmetric registrations from smart cards to predict hourly bus ridership demand.
By applying deep learning methods to forecast bus passengers' hourly boarding demand based on smart-card data,
which is frequently unbalanced. There are far fewer positive cases-when a passenger boards at a designated stop
and time-than negative cases-where no boarding takes place. Predictive models become less accurate as a result
of this imbalance. In order to counter this, the paper suggests balancing the dataset by generating synthetic
boarding instances using Deep Generative Adversarial Networks .The predicted accuracy of the model is then
increased by training a deep neural network (DNN) with this artificial, balanced dataset. Additionally, by
collecting unique travel patterns—which are frequently missed in aggregated data models—this method offers
deeper insights into passenger behavior. The study emphasizes how important model choice and data quality are
to precisely estimating demand for public transportation. All things considered, the results highlight how
Deep-GAN may be used to tackle complicated data imbalance problems, giving transportation authorities more
trustworthy and useful information.
Keywords: Deep Neural Network, Deep-GAN,Imbalanced data,Smart-card data
PROBLEM DESCRIPTION:
This project focuses on the naturally unbalanced problem of estimating hourly boarding demand at bus
stops using smart-card data. Machine learning models trained on such data typically perform badly,
especially in predicting uncommon occurrences like exact boarding times and places, because there are
more negative examples (non-boarding events) than positive ones (actual boarding events). Conventional
data balancing methods, including SMOTE and ADASYN, suffer from problems like noise and outlier
susceptibility, while undersampling strategies run the danger of losing important data. The suggested
approach uses Deep-GAN to create synthetic boarding data in order to overcome these difficulties. This
improves the training dataset's quality and balance, which in turn raises the prediction models' accuracy
and dependability.
The project also looks at how data imbalance affects public transportation planning, highlighting the
necessity for reliable systems that can deal with complicated real-world situations. The problem
description emphasizes how important it is to improve data-driven methods so that they more accurately
represent real passenger behavior patterns. Transit authorities may make more informed judgments that
reflect real demand patterns by resolving these issues.
DOMAIN EXPLORATION :
The domain of this project lies at the intersection of public transport management and machine learning.
Smart-card data has emerged as a crucial tool for understanding passenger behaviors and planning public
transport operations. However, the inherent imbalance and complexity of the data pose significant
challenges. This research explores machine learning approaches, particularly deep learning models like
Fully Connected Networks (FCN), Recurrent Neural Networks (RNN), and Long Short-Term Memory
networks (LSTM), to predict passenger boarding behavior.
The project not only addresses the data imbalance issue but also evaluates how these models capture both
the temporal (when passengers board) and spatial (where they board) aspects of ridership, contributing to a
more refined understanding of travel demand dynamics. By integrating advanced data analytics with
transport management, the study opens new avenues for optimizing public transport systems. This domain
exploration highlights the transformative potential of data-driven solutions in enhancing the efficiency and
reliability of public transportation. Furthermore, it aims to provide actionable insights for policy makers
and transit authorities to better align services with actual demand patterns, ultimately improving overall
passenger satisfaction and system sustainability.
SCOPE OF IMPROVISATION
1. Enhancing Data Quality and Variety: Incorporating additional data sources such as GPS data
from buses, weather conditions, event schedules, and real-time traffic information can provide a
more comprehensive view of factors influencing boarding demand.
2. Refining Model Architecture and Training Techniques: Experimenting with more advanced or
hybrid architectures, such as integrating attention mechanisms or Transformer-based models, could
enhance the model's ability to capture complex patterns in the data.
3. Improving Real-Time Prediction Capabilities: Developing a real-time prediction module that
continuously updates as new data streams in (e.g., real-time smart-card swipes, GPS data) could
greatly enhance the model’s responsiveness.
4. Enhancing Model Interpretability and User Integration: The existing models do not provide an
output with high accuracy and precision. The proposed model offers a system which can provide
highly accurate and precise output.
FEASIBILITY ANALYSIS
1. EXECUTIVE SUMMARY
Overview:This project aims to develop a predictive model using deep learning techniques, specifically
Deep Generative Adversarial Networks (Deep-GAN), to address the data imbalance in smart-card
records and improve the accuracy of hourly boarding demand predictions for bus passengers. The goal
is to create a reliable, data-driven tool that helps public transport authorities optimize scheduling,
resource allocation, and overall service planning.
2. TECHNICAL FEASIBILITY
Tools: Python, PyTorch,Tensorflow,MySQL
Data: High-quality smart-card data with detailed boarding records,timestamps and locations
Skills: Generative Adversarial Networks, machine learning frameworks
3. OPERATIONAL FEASIBILITY
Data Collection: Gather and preprocess smart-card data, ensuring it is cleaned, balanced, and ready for
model training.
Model Development: Develop and train the Deep-GAN model to generate synthetic data.
Validation: Evaluate the model on a separate test set of smart-card data to ensure its accuracy and
robustness.
Deployment: Develop a prototype of the prediction tool that can be integrated into public transport
management systems
CONCLUSION
This project effectively illustrates how to use Deep Generative Adversarial Networks (Deep-GAN) to
anticipate hourly boarding demand in public transportation systems, a crucial issue caused by data
imbalance. When compared to conventional resampling strategies, the suggested strategy greatly
increases the accuracy and reliability of prediction models by producing synthetic data that closely
mimics real-world boarding occurrences. This methodology not only augments the comprehension of
individual passenger behavior but also furnishes pragmatic discernments for public transportation
planning and administration. The findings highlight the ability of cutting-edge deep learning methods to
solve challenging data problems, opening the door for more precise and effective transport demand
forecasting models.
Transit authorities can reduce operating costs and improve customer satisfaction by better aligning
services with actual demand by utilizing Deep-GAN. The study's findings emphasize how crucial it is for
data analytics to constantly innovate in order to satisfy the changing demands of urban transportation.
Future urban transportation will be shaped by techniques like Deep-GAN, which will be increasingly
important as public transit systems become more data-driven.
REFERENCES
1. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014).
"Generative Adversarial Nets." Advances in Neural Information Processing Systems, 27, 2672-2680.
2. Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). "SMOTE: Synthetic Minority Over-sampling
Technique." Journal of Artificial Intelligence Research, 16, 321-357.
3. Liu, X., He, Z., & Jin, H. (2019). "A Deep Learning Framework for Traffic Sign Detection and Analysis." IEEE
Transactions on Intelligent Transportation Systems, 20(2), 630-641.
4. Sun, Y., Wong, G., & Axhausen, K. W. (2015). "Modeling Public Transport Usage: The Role of Mode Choice Models in
Predicting Ridership." Transportation Research Part A: Policy and Practice, 78, 240-252.
5. Zhang, J., Zheng, Y., & Qi, D. (2017). "Deep Spatio-Temporal Residual Networks for Citywide Crowd Flows
Prediction." AAAI Conference on Artificial Intelligence, 1655-1661.
6. Zhou, X., & Li, X. (2018). "Urban Public Transport Big Data and Travel Behavior Analysis Using Smart Card Data."
Transportation Research Part C: Emerging Technologies, 86, 474-489.