Abstract

Traditional clustering algorithms are inapplicable to many real-world problems where limited knowledge from domain experts is available. Incorporating the do- main knowledge can guide a clustering algorithm, consequently improving the quality of clustering. In this paper, we propose SS-NMF: a Semi-Supervised Non-negative Ma- trix Factorization framework for data clustering. In SS-NMF, users are able to provide supervision for clustering in terms of pairwise constraints on a few data objects spec- ifying whether they \must" or \cannot" be clustered together. Through an iterative algorithm, we perform symmetric tri-factorization of the data similarity matrix to in- fer the clusters. Theoretically, we show the correctness and convergence of SS-NMF. Moveover, we show that SS-NMF provides a general framework for semi-supervised clustering. Existing approaches can be considered as special cases of it. Through extensive experiments conducted on publicly available datasets, we demonstrate the superior performance of SS-NMF for clustering.

Publication Date

2008

Comments

The original publication is available at www.springerlink.com.Note: imported from RIT’s Digital Media Library running on DSpace to RIT Scholar Works in February 2014.

Document Type

Article

Department, Program, or Center

Center for Advancing the Study of CyberInfrastructure

Campus

RIT – Main Campus

Share

COinS