In this thesis, we propose a different technique to initialize a Convolutional K-means. We propose Visual Similarity Sampling (VSS) to collect $8\times8$ sample patches from images for convolutional feature learning. The algorithm uses within-class and between-class cosine similarity/dissimilarity measure to collect samples from both foreground and background. Thus. VSS uses local frequency of shapes within a character patch and uses it as probability distribution to select them. Also, we show how that initializing Convolutional K-means from samples with high between-class and within-class similarity produce discriminative codebook. We test the codebook to detect text in the natural scene. We show that using representative property within and between class for each sample as the probability for selecting it as initial cluster center, helps achieve discriminative cluster centers, which we use as feature maps. One of the advantages of our work is; since it is not problem dependent, it can be applied for sample collection in other pattern recognition problems. The proposed algorithm helped improve detection rate and simplify the learning process in both convolutional feature learning and text detection training.

Library of Congress Subject Headings

Optical character recognition; Machine learning; Pattern recognition systems

Publication Date


Document Type


Student Type


Degree Name

Computer Science (MS)

Department, Program, or Center

Computer Science (GCCIS)


Richard Zanibbi

Advisor/Committee Member

Leo Reznik

Advisor/Committee Member

Matthew Fluet


Physical copy available from RIT's Wallace Library at TA1640 .A94 2017


RIT – Main Campus

Plan Codes