We perform discriminant analysis together with principal component analysis on dialect and accent recognition. Since the data matrix exhibits high dimension low sample size feature, we calculate the principal components and the score matrix based on the dual space. Given the transformed score matrix, linear discriminant model does not fit the data well, while quadratic discriminant model, the superior model comparing to LDA, may fail sometimes when large number of principal components are required. Using the Gaussian radial basis function kernel, we calculate the kernel matrix and perform LDA directly on it. Comparing the LDA-PCA method, the in-sample prediction error rate of LDA reduces by more than 20% on average.
Department, Program, or Center
The John D. Hromi Center for Quality and Applied Statistics (KGCOE)
Fokoue, Ernest and Ma, Zichen, "Modern Multivariate Methods for Accurate Dialect Classification" (2013). Technical Report,Accessed from
RIT – Main Campus