Research Article 2026-04-20 under-review v1

Benchmarking Transformer-Based NLP Models for Multi-Platform Public Engagement Analysis in Transportation Projects

A
Alireza Shamshiri The University of Texas at Arlington
M
Mahdi Jaberizadeh The University of Texas at Arlington
S
Shah Salah Uddin Chowdhury The University of Texas at Arlington
M
Mahdis Hamisi The University of Texas at Arlington
K
Kyeong Rok Ryu The University of Texas at Arlington
J
Jiseul Kim The University of Texas at Arlington

Abstract

Meaningful public engagement is central to transportation planning, yet agencies face challenges in synthesizing large volumes of unstructured comments from hearings, news media, and social platforms. Although natural language processing methods are increasingly used for this purpose, clear guidance is lacking on which models are most suitable for different data characteristics and analytical goals. This study compares transformer-based and classical approaches for sentiment analysis and topic modeling in transportation contexts. A curated multi-source corpus from the North Houston Highway Improvement Project was developed, including Facebook posts, news articles, and public hearing documents. Sentiment classification using Bidirectional Encoder Representations from Transformers (BERT) models, specifically DistilBERT and RoBERTa, was benchmarked against lexicon-based approaches, while topic discovery using BERTopic was compared with probabilistic and matrix factorization models. Model performance was evaluated using classification accuracy and F1-Micro scores, topic coherence and interpretability, and cross-platform consistency. Transformer-based methods outperformed classical approaches, particularly in informal and context-rich settings where lexicon-based tools struggled with nuanced language and mixed sentiment. In addition, BERTopic produced more coherent and transferable topic structures across heterogeneous datasets, while lexicon-based methods remained useful for rapid screening. These findings show that model selection should be guided by data characteristics and analytical objectives rather than reliance on a single technique. The study introduces a method selection framework that provides practical guidance for transportation agencies.

Citation Information

@article{alirezashamshiri2026,
  title={Benchmarking Transformer-Based NLP Models for Multi-Platform Public Engagement Analysis in Transportation Projects},
  author={Alireza Shamshiri and Mahdi Jaberizadeh and Shah Salah Uddin Chowdhury and Mahdis Hamisi and Kyeong Rok Ryu and Jiseul Kim},
  journal={Data Science for Transportation},
  year={2026},
  doi={https://doi.org/10.21203/rs.3.rs-9351360/v1}
}
Back to Top
Home
Paper List
Submit
0.023624s