People share real-time updates on social media platforms (i.e. Twitter) when disaster occurs, this information is very valuable for disaster relief and response teams as it can alert them immediately in order to prioritize tasks. Text mining and Machine learning algorithm can scan the huge generated unstructured data on social media platforms such as Twitter, to spot such information through keywords and phrases that refers to disasters. One challenge that the algorithm might face is whether a tweet text is talking about a real disaster or uses those keywords as a metaphor, which can lead to huge mislabeling of tweets. Hence, this research aims on using Natural Language Processing (NLP) and classification models to distinguish between real and fake disaster tweets. The dataset was acquired from Kaggle website, and it contain tweets that are related to real disasters, and other tweets that refers to fake disasters. Furthermore, using RStudio software, exploratory data analysis (EDA), feature selections, and data cleaning were performed prior to the data modeling, two different training to testing split were tested. In addition, four classifiers were built, which are SVM, KNN, Naïve Bayes, and XGBoost. As a result, the best accuracies achieved with 80/20 ratio split, and with using the whole dataset rather than sampling, SVM and XGBoost performed well with accuracies of 80% and 78% respectively, while KNN suffered overfitting (99% accuracy) and Naïve Bayes performed poorly (65%).
Professional Studies (MS)
Department, Program, or Center
Graduate Programs & Research (Dubai)
Alhammadi, Humaid, "Using Machine Learning in Disaster Tweets Classification" (2022). Thesis. Rochester Institute of Technology. Accessed from