Abstract

Statistical machine learning uses data to model a relationship between many parameters, or explanatory variables, and a response variable. The adaptive boosting algorithm is a machine learning method that can be used to model relationships of classification data. This method uses a weak base learner to improve accuracy of predicting the correct response class from a set of variables. Because of its learnability, adaptive boosting yields an exponentially decreasing empirical error. From this, an empirical error bound can be derived from the boosting algorithm. This empirical error bound inspires us to see if there is a generalized error bound and what form it takes. Evidence from boosting several real datasets will show that the generalized error follows the same shape as the empirical error, thus suggesting that a shift of the empirical error bound can create a generalized error bound. By simulating datasets from random and varying their characteristics based on criteria that seem to affect the shift, we can boost them and derive a function by which to shift the empirical error bound. We will record the test error of the boosted simulated datasets and build a regression model with that as the response and the varying characteristics of the datasets as the explanatory variables. The final regression model gives us the predicted outcome of the difference between the generalized error and the empirical error, thus enabling us to derive the suggested generalized error bound.

Library of Congress Subject Headings

Machine learning--Mathematical models

Publication Date

8-11-2016

Document Type

Thesis

Student Type

Graduate

Degree Name

Applied Statistics (MS)

Department, Program, or Center

School of Mathematical Sciences (COS)

Advisor

Ernest Fokoué

Advisor/Committee Member

Steven LaLonde

Advisor/Committee Member

Mei Nagappan

Comments

Physical copy available from RIT's Wallace Library at Q325.5 .H68 2016

Campus

RIT – Main Campus

Share

COinS