Abstract

In the world of cybersecurity, intrusion detection systems (IDS) have leveraged the power of artificial intelligence for the efficient detection of attacks. This is done by applying supervised machine learning (ML) techniques on labeled datasets. A growing body of literature has been devoted to the use of BoT-IoT dataset for IDS based ML frameworks. A few number of related works have recognized the need for a balanced dataset and applied techniques to alleviate the issue of imbalance. However, a significant amount of related research works failed to treat the imbalance in the BoT-IoT dataset. A lack of unanimity was observed in the literature towards the definition of taxonomy for balancing techniques. The study presented here seeks to explore the degree to which the imbalance of the dataset has been treated and to determine the taxonomy of techniques used. In this thesis, a comparison analysis is performed by using a small subset of an entire dataset to determine the threshold sample limit at which the model achieves the highest accuracy. In addition to this analysis, a study was conducted to determine the extent to which each feature of the dataset has an impact on the threshold performance. The study is implemented on the BoT-IoT dataset using three supervised ML classifiers: K-nearest Neighbor, Random Forest, and Logistic Regression. The four principal findings of this thesis are: existing taxonomies are not understood and imbalance of the dataset is not treated; high performance across all metrics is achieved on a highly imbalanced dataset; model is able to achieve the threshold performance using a small subset of samples; certain features had varying impact on the threshold value using different techniques.

Publication Date

1-2021

Document Type

Thesis

Student Type

Graduate

Degree Name

Networking and System Administration (MS)

Advisor

Ali Raza

Advisor/Committee Member

Muhieddin Amer

Campus

RIT Dubai

Share

COinS