The growing use of the internet resulted in emerging of new websites every day (Total number of Websites - Internet Live Stats, 2020). Web surfing has become important for everyone regardless of their occupation, age or location. However, as the use of the internet is increasing so is the vulnerability to malware attacks through malicious websites (Softpedia, 2016). Identifying and dealing with such malicious website has been quite difficult in the past as it is quite challenging to separate good websites from bad websites. However, by using machine learning algorithms on large datasets it is now possible to detect such websites beforehand. Classifiers trained using algorithms such as logistic regression and Support Vector Machine (SVM) can be used to detect malicious websites and the users can be warned about the risk before they visit such sites. This project focuses on using a variety of different classification algorithms to distinguish whether a website is malicious or not using the Kaggle Malicious and Benign Website Dataset. We have showcased that it is possible to detect malicious websites with a reasonable amount of certainty (90% of the 75 malicious websites in the test set were identified) using machine learning models. We have also determined the features that were critical in predicting the likelihood of a website being malicious. Most of our key features are easily available (URL Length, number of Special characters, Country, Age of website).
Professional Studies (MS)
Department, Program, or Center
Graduate Programs & Research (Dubai)
Al Tamimi, Saeed Ahmad, "Detecting Malicious Websites Using Machine Learning" (2020). Thesis. Rochester Institute of Technology. Accessed from