Abstract

Clustering is the important task of partitioning data into groups with similar characteristics, with one category being spectral clustering where data points are represented as vertices of a graph connected by weighted edges signifying similarity based on distance. The longest leg path distance (LLPD) has shown promise when used in spectral clustering, but is sensitive to noisy data, therefore requiring a data denoising procedure to achieve good performance. Previous denoising techniques have involved identifying and removing noisy data points, however this is not a desirable pre-clustering step for data sets with a specific structure like images. The process of partitioning an image into regions of similar features known as image segmentation can be represented as a clustering problem by defining the vector of intensity and spatial information at each pixel as data point. We therefore propose the method of pre-cluster denoising to formulate a robust LLPD clustering framework. By creating a fine clustering of approximately equal-sized groups and averaging each, a reduced number of data points can be defined that represent the relevant information of the original data set by locally averaging out noise influence. We can then construct a smaller graph representation of the data based on the LLPD between the reduced data points, and identify the spectral embedding coordinates for each reduced point. An out-of-sample extension procedure is then used to compute spectral embedding coordinates at each of the original data points, after which a simple (k-means) clustering is performed to compute the final cluster labels. In the context of image segmentation, computing superpixels provides a nice structure for performing this type of pre-clustering. We show how the above LLPD framework can be carried out in the context of image segmentation, and show that a simple computationally efficient spatial interpolation procedure can be used instead to extend the embedding in a way that yields better segmentation performance with respect to ground truth on a publicly available data set. Similar experiments are also performed using the standard Euclidean distance in place of the LLPD to show the proficiency of the LLPD for image segmentation.

Library of Congress Subject Headings

Image analysis; Image processing--Digital techniques; Pattern recognition systems; Source separation (Signal processing)

Publication Date

8-16-2018

Document Type

Thesis

Student Type

Graduate

Degree Name

Applied and Computational Mathematics (MS)

Department, Program, or Center

School of Mathematical Sciences (COS)

Advisor

Nathan Cahill

Advisor/Committee Member

Nathaniel S. Barlow

Advisor/Committee Member

Kara L. Maki

Campus

RIT – Main Campus

Plan Codes

ACMTH-MS

Share

COinS