Course Description:

Loosely defined as "learning without labels", unsupervised learning focuses on summarizing, organizing, characterizing the distribution of a (usually) large set of feature vectors. We might be interested in estimating properties of a density (e.g. the location of high density convex regions), discovering frequent patterns in a data base, or identifying a lower-dimension representative manifold with the feature space. After a look at discovering multivariate structure (e.g. principal components, projection pursuit, multidimensional scaling, association rules for binary data), we will turn to traditional and nontraditional clustering topics: approaches (algorithmic, spectral, parametric, nonparametric); dissimilarity/distance measures and coefficients; and diagnostics, visualization, validation ("goodness-of-fit", deviation from unimodality, bootstrap, prediction strength, the stability of a clustering solution). Additional topics may incliude (re)fractionation - clustering with meta-observations, diffusion maps, graphs/text mining, and biclustering.


Class Meetings: Mondays and Wednesdays, 10:30am-11:50pm, Porter Hall A19C