Abstract

Machine learning models have been found to be vulnerable to adversarial attacks that apply small perturbations to input samples to get them misclassified. Attacks that search for and apply the perturbations are performed in both white-box and black-box settings, depending on the information available to the attacker about the target. For black-box attacks, the attacker can only query the target with specially crafted inputs and observing the outputs returned by the model. These outputs are used to guide the perturbations and create adversarial examples that are then misclassified.

Current black-box attacks on API-based malware classifiers rely solely on feature insertion when applying perturbations. This restriction is set in place to ensure that no changes are introduced to the malware's originally intended functionality. Additionally, the API calls being inserted in the malware are null or no-op APIs that have no functional affect to avoid any unintentional impact on malware behavior. Due to the nature of these API calls, they can be easily detected through non-ML techniques by analyzing their arguments and return values.

In this dissertation, we explore other attacks on API-based malware detection models that are not restricted to feature addition. Specifically, we explore feature replacement as a possible avenue for creating adversarial malware examples. To retain the malware's original functionality, we replace API calls with other functionally equivalent API calls. We find the API alternatives by using a hierarchical unsupervised learning approach on the API's documentation. Our attack, which we call AdversarialPSO, uses Particle Swarm Optimization to guide the perturbations according to available function alternatives. Results show that creating adversarial malware examples by feature replacement is possible even under the more restrictive search space of limited function alternatives.

Unlike the malware domain, which lacks benchmark datasets and publicly available classification models, image classification has multiple benchmarks to test new attacks. Therefore, to evaluate the efficacy and wide-applicability of AdversarialPSO, we re-implement the attack in the image classification domain, where we create adversarial examples from images by adding small often unrecognizable perturbations to the inputs. As a result of these perturbations, highly-accurate models misclassify the inputs resulting in a drastic drop in their accuracy. We evaluate this attack against both defended and undefended models and show that AdversarialPSO performs comparably to state-of-the-art adversarial attacks.

Library of Congress Subject Headings

Computer security; Machine learning--Security measures; Learning classifier systems--Security measures; Mathematical optimization; Swarm intelligence; Malware (Computer software)

Publication Date

4-24-2020

Document Type

Dissertation

Student Type

Graduate

Degree Name

Computing and Information Sciences (Ph.D.)

Department, Program, or Center

Computer Science (GCCIS)

Advisor

Yin Pan

Advisor/Committee Member

Xumin Liu

Advisor/Committee Member

Bo Yuan

Campus

RIT – Main Campus

Plan Codes

COMPIS-PHD

Share

COinS