Document clustering without any prior knowledge or background information is a challenging problem. In this paper, we propose SS-NMF: a semi-supervised non-negative matrix factorization framework for document clustering. In SS-NMF, users are able to provide supervision for document clustering in terms of pairwise constraints on a few documents specifying whether they “must” or “cannot” be clustered together. Through an iterative algorithm, we perform symmetric tri-factorization of the document document similarity matrix to infer the document clusters. Theoretically, we show that SS-NMF provides a general framework for semi-supervised clustering and that existing approaches can be considered as special cases of SS-NMF. Through extensive experiments conducted on publicly available data sets, we demonstrate the superior performance of SS-NMF for clustering documents.
Date of creation, presentation, or exhibit
Department, Program, or Center
Computer Science (GCCIS)
Incorporating User provided Constraints into Document Clustering, Proceedings of IEEE International Conference on Data Mining. Held in Omaha, NE: 28-31 October 2007.
RIT – Main Campus