36-462/36-662, Data Mining

Cosma Shalizi

Lecture 1, 18 January 2022 — Welcome to the course

Agenda for today

What is statistical learning?

Course mechanics

Class meetings

In-class exercises

Reading

Reading: Textbook

Principles of Data Mining

Principles of Data Mining

Reading: Textbook

Reading: Textbook

The Ethical Algorithm: The Science of Socially Aware Algorithm Design

The Ethical Algorithm: The Science of Socially Aware Algorithm Design

Reading: Textbook

Statistical Learning from a Regression Perspective

Statistical Learning from a Regression Perspective

Homework

Homework

Grading

Time expectations

Cheating, collaboration & plagiarism

Homework format

What are we going to learn about

So many things!

Nearest neighbors

Prediction and decision trees

Nonlinear features and kernels

Dimension reduction

Clustering

Mathematical tools to make all this work

Checking our guesses

Applications

Recommendation engines

Fairness in prediction

Waste, fraud and abuse

Waste, fraud and abuse

Where did this come from?

What will you need to know?

Next time: The truth about linear regression

Backup: Where did this really come from?

Backup: Where did this really come from?

Backup: Where did this really come from?

Backup: Where did this really come from?

Backup: Where did this really come from?