36-462 Data Mining

Data mining is the science of discovering structure and making predictions in data sets (typically, large ones). Data mining spans the fields of statistics and computer science. Since this is a course in statistics, we will adopt a statistical perspective for the majority of the course. Data mining also involves a good deal of both applied work (programming, problem solving, data analysis) and theoretical work (learning, understanding, and evaluating methodologies). We will try to maintain a balance between the two.

Upon completing this course, you should be able to tackle new data mining problems, by:

  • selecting the appropriate methods and justifying your choices;
  • implementing these methods programmatically (using, say, the R programming language) and evaluating your results;
  • explaining your results to a researcher outside of statistics or computer science.

Syllabus

The syllabus provides information on grading, class policies etc.

Lecture Notes and Annotated Slides