Variable selection is of utmost importance in aviation safety where the data contains a large number of highly correlated predictors and flight safety has to be accurately predicted. Variable selection methods were not encouraged in medical research where the subject-matter knowledge is limited. For this reason, Genell, Anna Nemes, Szilard Steineck, Gunnar Dickman, Paul W. (2010) conducted simulated study to compare Bayesian Model Averaging and stepwise regression to motivate medical researchers to conduct automatic variable selection on their regression models and encourage them to take advantage of it. In this era of data science and Machine Learning, we have extended this comparative study by considering Machine learning algorithms. Various studies have shown that the Recursive feature elimination (RFE) algorithm reduces the effect of correlation on the variable importance measure and results in minimal prediction error. In this study, we compare RFE-RF, RFE-SVM and Bayesian Model Averaging (BMA) for simulated data in the presence of correlation by varying sample sizes (30,300) for 45 variables considering both cases n
p. Our results show that the percentage of selecting true predictors is highest for the RFE-RF model of all the three models. However, though the overall percentage of selecting true predictors is highest for RFE-RF, the estimated probability of selecting correlated true predictors is better for the Bayes in comparison to the other methods.comparison to the other methods.
Applied Statistics (MS)
Department, Program, or Center
School of Mathematical Sciences (COS)
Rumao, Sailee, "Exploration of Variable Importance and Variable selection techniques in presence of correlated variables" (2019). Thesis. Rochester Institute of Technology. Accessed from
RIT – Main Campus