Abstract

People share real-time updates on social media platforms (i.e. Twitter) when disaster occurs, this information is very valuable for disaster relief and response teams as it can alert them immediately in order to prioritize tasks. Text mining and Machine learning algorithm can scan the huge generated unstructured data on social media platforms such as Twitter, to spot such information through keywords and phrases that refers to disasters. One challenge that the algorithm might face is whether a tweet text is talking about a real disaster or uses those keywords as a metaphor, which can lead to huge mislabeling of tweets. Hence, this research aims on using Natural Language Processing (NLP) and classification models to distinguish between real and fake disaster tweets. The dataset was acquired from Kaggle website, and it contain tweets that are related to real disasters, and other tweets that refers to fake disasters. Furthermore, using RStudio software, exploratory data analysis (EDA), feature selections, and data cleaning were performed prior to the data modeling, two different training to testing split were tested. In addition, four classifiers were built, which are SVM, KNN, Naïve Bayes, and XGBoost. As a result, the best accuracies achieved with 80/20 ratio split, and with using the whole dataset rather than sampling, SVM and XGBoost performed well with accuracies of 80% and 78% respectively, while KNN suffered overfitting (99% accuracy) and Naïve Bayes performed poorly (65%).

Publication Date

4-2022

Document Type

Master's Project

Student Type

Graduate

Degree Name

Professional Studies (MS)

Department, Program, or Center

Graduate Programs & Research (Dubai)

Advisor

Sanjay Modak

Advisor/Committee Member

Ehsan Warriach

Campus

RIT Dubai

Share

COinS