Abstract

SOC (Security Operation Center) analysts historically struggled to keep up with the growing sophistication and daily prevalence of cyber attackers. To aid in the detection of cyber threats, many tools like IDS’s (Intrusion Detection Systems) are utilized to monitor cyber threats on a network. However, a common problem with these tools is the volume of the logs generated is extreme and does not stop, further increasing the chance for an adversary to go unnoticed until it’s too late. Typically, the initial evidence of an attack is not an isolated event but a part of a larger attack campaign describing prior events that the attacker took to reach their final goal. If an analyst can quickly identify each step of an attack campaign, a timely response can be made to limit the impact of the attack or future attacks. In this work, we ask the question “Given IDS alerts, can we extract out the cyber-attack kill chain for an observed threat that is meaningful to the analyst?” We present HeAT-PATRL, an IDS attack campaign extractor that leverages multiple deep machine learning techniques, network-agnostic feature engineering, and the analyst’s knowledge of potential threats to extract out cyber-attack campaigns from IDS alert logs. HeAT-PATRL is the culmination of two works. Our first work “PATRL” (Pseudo-Active Transfer Learning), translates the complex alert signature description to the Action-Intent Framework (AIF), a customized set of attack stages. PATRL employs a deep language model with cyber security texts (CVE’s, C-Sec Blogs, etc.) and then uses transfer learning to classify alert descriptions. To further leverage the cyber-context learned in the language model, we develop Pseudo-Active learning to self-label unknown unlabeled alerts to use as additional training data. We show PATRL classifying the entire Suricata database (~70k signatures) with a top-1 of 87\% and top-3 of 99\% with less than 1,200 manually labeled signatures. The final work, HeAT (Heated Alert Triage), captures the analyst’s domain knowledge and opinion of the contribution of IDS events to an attack campaign given a critical IoC (indicator of compromise). We developed network-agnostic features to characterize and generalize attack campaign contributions so that prior triages can aid in identifying attack campaigns for other attack types, new attackers, or network infrastructures. With the use of cyber-attack competition data (CPTC) and data from a real SOC operation, we demonstrate that the HeAT process can identify campaigns reflective of the analysts thinking while greatly reducing the number of actions to be assessed by the analyst. HeAT has the unique ability to uncover attack campaigns meaningful to the analyst across drastically different network structures while maintaining the important attack campaign relationships defined by the analyst.

Library of Congress Subject Headings

Cyberterrorism--Prevention; Computer networks--Security measures; Transfer learning (Machine learning)

Publication Date

12-2021

Document Type

Dissertation

Student Type

Graduate

Degree Name

Engineering (Ph.D.)

Department, Program, or Center

Engineering (KGCOE)

Advisor

Shanchieh Jay Yang

Advisor/Committee Member

Michael E. Kuhl

Advisor/Committee Member

Andres Kwasinski

Campus

RIT – Main Campus

Plan Codes

ENGR-PHD

Share

COinS