Loan applications at banks are often long, requiring the applicant to provide large amounts of data. Is all of it necessary? Can we save the applicant some frustration and the bank some expense by using only a subset of the relevant variables? To answer this question, I have attempted to model the current loan approval process at a particular bank.
I have used several model selection techniques for logistic regression, including stepwise regression, Occam's Window, Markov Chain Monte Carlo Model Composition (Raftery, Madigan, and Hoeting, 1993), and Bayesian Random Searching. The resulting models largely agree upon a subset of only one-third of the original variables.