Abstract

In this thesis we explore different Spectral Regression Estimators in order to solve the prob- lem in regression where we have multiple columns that are linearly dependent: We explore two scenarios

• Scenario 1: p << n where there exists at least two columns; xj and xk that are nearly linearly dependent which indicates co-linearity and X⊤X becomes near singular.

• Scenario 2: n << p since there are more predictors than observations so some columns must be a linear combination of another column which indicates linear dependence.

The scenarios give us an ill conditioned matrix of X⊤X (when solving the normal equa- tion) due to collinearity issues and the matrix becomes singular and makes the least squares estimate unstable and impossible to compute. In the paper, we explore different methods (variable selection, regularization, compression and dimensionality reduction) that solves the above issue. For variable selection techniques, we use Stepwise Selection Regression as well as the method of Best Subset Selection regression. Two approaches for Stepwise Se- lection regression are assessed in the paper: Forward Selection and Backward Elimination. Performance assessment of our regression models will be made based on criterion based procedures like AIC,BIC,R2,R2 adjusted and the Mallow’s CP statistic. In chapter three of this paper we introduce the concepts of General Regularization, Ridge Regression as well

as subsequent shrinkage methods such as the Lasso, Bayesian Lasso and the Elastic net. Chapter five will look at Compression and Dimensionality reduction procedures which are outlined via SVD (Singular Value Decomposition) and Eigenvector Decomposition. Hard thresholding is subsequently introduced via SPCA (Sparse Principle Component Analysis) and a novel approach using RPCA (Robust Principle Component Analysis). Furthermore, RPCA also shows how it can aid with data and image compression. The basis of this study is concluded with an empirical exploration of all the methods outlined above using several performance indicators on simulated data and real data sets. Assessment of the data sets is done via cross-validation. We determine the optimal values of the settings and then evalu- ate the predictive and explanatory performance.

Library of Congress Subject Headings

Regression analysis; Spectral analysis (Mathematics); Estimation theory

Publication Date

5-3-2019

Document Type

Thesis

Student Type

Graduate

Degree Name

Applied Statistics (MS)

Department, Program, or Center

School of Mathematical Sciences (COS)

Advisor

Ernest Fokoue

Advisor/Committee Member

Robert Parody

Advisor/Committee Member

Joseph Voelkel

Recommended Citation

Hassan, Nawal, "Some Statistical Properties of Spectral Regression Estimators" (2019). Thesis. Rochester Institute of Technology. Accessed from
https://repository.rit.edu/theses/10045

Campus

RIT – Main Campus

Plan Codes

APPSTAT-MS

Download

COinS

Theses

Some Statistical Properties of Spectral Regression Estimators

Abstract

Library of Congress Subject Headings

Publication Date

Document Type

Student Type

Degree Name

Department, Program, or Center

Advisor

Advisor/Committee Member

Advisor/Committee Member

Recommended Citation

Campus

Plan Codes

Search

Browse

Author Corner

RIT Links

Theses

Some Statistical Properties of Spectral Regression Estimators

Author

Abstract

Library of Congress Subject Headings

Publication Date

Document Type

Student Type

Degree Name

Department, Program, or Center

Advisor

Advisor/Committee Member

Advisor/Committee Member

Recommended Citation

Campus

Plan Codes

Share

Search

Browse

Author Corner

RIT Links