Course Description:
Loosely defined as "learning without labels", unsupervised learning
focuses on summarizing, organizing, characterizing the distribution of
a (usually) large set of feature vectors. We might be interested in
estimating properties of a density (e.g. the location of high density
convex regions), discovering frequent patterns in a data base, or
identifying a lower-dimension representative manifold with the feature
space. After a look at discovering multivariate structure
(e.g. principal components, projection pursuit, multidimensional
scaling, association rules for binary data), we will turn to
traditional and nontraditional clustering topics: approaches
(algorithmic, spectral, parametric, nonparametric);
dissimilarity/distance measures and coefficients; and diagnostics,
visualization, validation ("goodness-of-fit", deviation from
unimodality, bootstrap, prediction strength, the stability of a
clustering solution). Additional topics may incliude (re)fractionation
- clustering with meta-observations, diffusion maps, graphs/text
mining, and biclustering.
Class Meetings: Mondays and Wednesdays, 10:30am-11:50pm, Porter
Hall A19C