The field of mining evolving data is relatively new and evolutionary clustering is among the latest in this trend. Presently, there are algorithms for evolutionary k-means, agglomerative hierarchical, and spectral clustering. These have been excellent in showing the advantages of using evolving data snapshots for better clustering results. From these algorithms the key portion of the conversion from static data handling to evolving data handling has been the addition of the historical cost function. The cost function is what determines whether or not instances should be moved from one cluster to the next between time-steps based on the historical cuts made between the instances in the dataset. These cost functions are then the method by which evolutionary clustering provides smooth transitions as there is a tunable tolerance for shifts in cluster membership. This also means that transitions between clusters become much more significant. For example, if an author-word matrix were clustered over ten years and an author changed clusters part way through the time-line it is a likely indicator that the author has changed research topics. Methods for mining evolving data have not yet expanded into co-clustering; for this reason I have contributed a new algorithm for co-clustering evolving data. The algorithm uses spectral co-clustering to cluster each time-step of instances and features. Using the previous example, cluster changes in features (or words) for an author-word matrix is significant in that it may indicate a change in meaning for the word. This contribution to the field provides an avenue for further development of evolutionary co-clustering algorithms.
Library of Congress Subject Headings
Data mining; Cluster analysis; Algorithms
Department, Program, or Center
Computer Science (GCCIS)
Green, Nathan S., "Evolutionary spectral co-clustering" (2010). Thesis. Rochester Institute of Technology. Accessed from
RIT – Main Campus