Web image clustering has drawn significant attention in the research community recently. However, not much work has been done in using multi-modal information for clustering Web images. In this paper, we address the problem of Web image clustering by simultaneous integration of visual and textual features from a graph partitioning perspective. In particular, we modelled visual features, images, and words from the surrounding text of the images using a tripartite graph. This graph is actually considered as a fusion of two bipartite graphs that are partitioned simultaneously by the proposed Consistent Isoperimetric High-order Co-clustering (CIHC) framework. Although a similar approach has been adopted before, the main contribution of this work lies in the computational efficiency, quality in Web image clustering and scalability to large image repositories that CIHC is able to achieve. We demonstrate this through experimental results performed on real Web images.

Date of creation, presentation, or exhibit



© ACM, 2009. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in the Proceedings of ACM Multimedia 2007.

Document Type

Conference Proceeding

Department, Program, or Center

Computer Science (GCCIS)


RIT – Main Campus