Abstract

Historically it has been difficult to measure the deviation in the notion of a concept. Several schemes have been proposed to attack this challenging problem [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]. The central notion of all these efforts is to detect the change point where the data mining model deviates significantly with respect to the data characteristics that it was trained or built on. The process of detecting such change points is often termed as concept drift. Current state of algorithms assume attribute independence, view the problem as a supervised learning problem and also need tagged data. The proposed algorithm does not make any assumption among attribute independence and uses the covariance summary to detect concept drift in an unsupervised setting. The algorithm proposed in this thesis monitors the underlying characteristics of the input data, maintains data summaries of the various snapshots in time and utilizes effective distance metrics to determine when concept drifts. The technique was evaluated against synthetic and real data sets.

Library of Congress Subject Headings

Data mining; Concepts; Machine learning

Publication Date

2006

Document Type

Thesis

Student Type

Graduate

Degree Name

Computer Science (MS)

Department, Program, or Center

Computer Science (GCCIS)

Advisor

Ankur Teredesai

Advisor/Committee Member

Roger Gaborski

Advisor/Committee Member

Hans-Peter Bischop

Comments

Physical copy available from RIT's Wallace Library at QA76.9.D343 C33 2006

Campus

RIT – Main Campus

Share

COinS