In this thesis, we propose a different technique to initialize a Convolutional K-means. We propose Visual Similarity Sampling (VSS) to collect $8\times8$ sample patches from images for convolutional feature learning. The algorithm uses within-class and between-class cosine similarity/dissimilarity measure to collect samples from both foreground and background. Thus. VSS uses local frequency of shapes within a character patch and uses it as probability distribution to select them. Also, we show how that initializing Convolutional K-means from samples with high between-class and within-class similarity produce discriminative codebook. We test the codebook to detect text in the natural scene. We show that using representative property within and between class for each sample as the probability for selecting it as initial cluster center, helps achieve discriminative cluster centers, which we use as feature maps. One of the advantages of our work is; since it is not problem dependent, it can be applied for sample collection in other pattern recognition problems. The proposed algorithm helped improve detection rate and simplify the learning process in both convolutional feature learning and text detection training.
Computer Science (MS)
Department, Program, or Center
Computer Science (GCCIS)
Aziz, Kardo Othman, "Better Text Detection through Improved K-means-based Feature Learning" (2017). Thesis. Rochester Institute of Technology. Accessed from
RIT – Main Campus