HCclassification This archive contains a Matlab implementation of HC-Classification algorithm as described in the paper D. Donoho and J. Jin (2008) Higher criticism thresholding: Optimal feature selection when useful features are rare and weak Current version V1.1 SHORT DOCUMENTATION For more information, type help functionname in the Matlab command line. There are two functions in this package, HCclassification to find the classifier, and HCclassification_fit to apply the classifier to new data. Usage: [weight, stats] = HCclassification(TrainX, TrainY, threshold, alpha, sdflag, muflag) Inputs: TrainX N-by-P matrix of predictors for train data set with one row per observation and one column per predictor. TrainY N-by-1 matrix of class labels "1" or "0" Inputs (Optional): threshold Choice of thresholding functions, can be 'clip', 'soft', or 'hard' default 'hard' alpha The proportion of features with small p-values to calculate HC threshold. default 0.2 sdflag optional type of proxy sd: 0 - std; 2 - floored at median; 1 - add median ; default 1 muflag optional type of proxy mu: 0, std, 1, average two means default 1 Outputs weight p-by-1 vector, showing the weight for each feature stats 4-by-1 struct, including the normalization parameters for test data and the statistic stats.xbar p-by-1 chosen mu proxy stats.s p-by-1 chosen standard deviation proxy stats.HCT chosen threshold stats.HC p-by-1 HC score for each feature Usage: [label, score] = HCclassification_fit(weight, xbar, s, Test) Inputs: weight p-by-1 vector that shows the weight for each predictor xbar p-by-1 vector, estimated mean from HCclassification function s p-by-1 vector, estimated standard deviation from HCclassification function Test M-by-P matrix of predictors for data set to be predicted Outputs label m-by-1 vector of estimated labels "1" or "0" score m-by-1 vector, showing classification score for test data LICENSE This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see . If you use this code for your publication, please include a reference to the paper "Higher criticism thresholding: Optimal feature selection when useful features are rare and weak". CONTACT For any problem, please contact Jiashun Jin Department of Statistics Carnegie Mellon University Email: jiashun@stat.cmu.edu Wanjie Wang Department of Statistics University of Pennsylvania Email: wanjiew@wharton.upenn.edu