With the explosive growth of Web and the recent development in digital media technology, the number of images on the Web has grown tremendously. Consequently,Web image clustering has emerged as an important application. Some of the initial efforts along this direction revolved around clustering Web images based on the visual features of images or textual features by making use of the text surrounding the images. However, not much work has been done in using multimodal information for clustering Web images. In this paper, we propose a graph theoretical framework for simultaneously integrating visual and textual features for efficient Web image clustering. Specifically, we model visual features, images and words from surrounding text using a tripartite graph. Partitioning this graph leads to clustering of the Web images. Although, graph partitioning approach has been adopted before, the main contribution of this work lies in a new algorithm that we propose - Consistent Isoperimetric High-order Co-clustering (CIHC), for partitioning the tripartite graph. Computationally, CIHC is very quick as it requires a simple solution to a sparse system of linear equations. Our theoretical analysis and extensive experiments performed on real Web images demonstrate the performance of CIHC in terms of the quality, efficiency and scalability in partitioning the visual feature-image-word tripartite graph.
Date of creation, presentation, or exhibit
Department, Program, or Center
Computer Science (GCCIS)
Paper in the 17th International Conference on World Wide Web held on April 21-25, 2008 in Beijing, China. ACM.
RIT – Main Campus