Abstract

The field of mining evolving data is relatively new and evolutionary clustering is among the latest in this trend. Presently, there are algorithms for evolutionary k-means, agglomerative hierarchical, and spectral clustering. These have been excellent in showing the advantages of using evolving data snapshots for better clustering results. From these algorithms the key portion of the conversion from static data handling to evolving data handling has been the addition of the historical cost function. The cost function is what determines whether or not instances should be moved from one cluster to the next between time-steps based on the historical cuts made between the instances in the dataset. These cost functions are then the method by which evolutionary clustering provides smooth transitions as there is a tunable tolerance for shifts in cluster membership. This also means that transitions between clusters become much more significant. For example, if an author-word matrix were clustered over ten years and an author changed clusters part way through the time-line it is a likely indicator that the author has changed research topics. Methods for mining evolving data have not yet expanded into co-clustering; for this reason I have contributed a new algorithm for co-clustering evolving data. The algorithm uses spectral co-clustering to cluster each time-step of instances and features. Using the previous example, cluster changes in features (or words) for an author-word matrix is significant in that it may indicate a change in meaning for the word. This contribution to the field provides an avenue for further development of evolutionary co-clustering algorithms.

Library of Congress Subject Headings

Data mining; Cluster analysis; Algorithms

Publication Date

2010

Document Type

Thesis

Department, Program, or Center

Computer Science (GCCIS)

Advisor

Rege, Manjeet

Comments

Note: imported from RIT’s Digital Media Library running on DSpace to RIT Scholar Works. Physical copy available through RIT's The Wallace Library at: QA76.9.D343 G74 2010

Campus

RIT – Main Campus

Share

COinS