Traditional clustering algorithms are inapplicable to many real-world problems where limited knowledge from domain experts is available. Incorporating the do- main knowledge can guide a clustering algorithm, consequently improving the quality of clustering. In this paper, we propose SS-NMF: a Semi-Supervised Non-negative Ma- trix Factorization framework for data clustering. In SS-NMF, users are able to provide supervision for clustering in terms of pairwise constraints on a few data objects spec- ifying whether they \must" or \cannot" be clustered together. Through an iterative algorithm, we perform symmetric tri-factorization of the data similarity matrix to in- fer the clusters. Theoretically, we show the correctness and convergence of SS-NMF. Moveover, we show that SS-NMF provides a general framework for semi-supervised clustering. Existing approaches can be considered as special cases of it. Through extensive experiments conducted on publicly available datasets, we demonstrate the superior performance of SS-NMF for clustering.
Department, Program, or Center
Center for Advancing the Study of CyberInfrastructure
Knowledge and Information Systems Journal, vol. 17, no. 3, December 2008
RIT – Main Campus