In order to help understand how the genes are affected by different disease conditions in a biological system, clustering is typically performed to analyze gene expression data. In this paper, we propose to solve the clustering problem using a graph theoretical approach, and apply a novel graph partitioning model - Isoperimetric Graph Partitioning (IGP), to group biological samples from gene expression data. The IGP algorithm has several advantages compared to the wellestablished Spectral Graph Partitioning (SGP) model. First, IGP requires a simple solution to a sparse system of linear equations instead of the eigen-problem in the SGP model. Second, IGP avoids degenerate cases produced by spectral approach to achieve a partition with higher accuracy. Moreover, we integrate unsupervised gene selection into the proposed approach through two-way ordering of gene expression data, such that we can eliminate irrelevant or redundant genes in the data and obtain an improved clustering result. We evaluate our approach on several well-known problems involving gene expression profiles of colon cancer and leukemia subtypes. Our experiment results demonstrate that IGP constantly outperforms SGP and produces a better result that is closer to the original labeling of sample sets provided by domain experts. Furthermore, the clustering accuracy is improved significantly when IGP is integrated with the unsupervised gene (feature) selection.
Date of creation, presentation, or exhibit
Department, Program, or Center
Computer Science (GCCIS)
Paper in Proceedings of International Joint Conference on Neural Networks in Orlando, Florida, USA on August 12-17, 2007. IEEE
RIT – Main Campus