Li Liu (lliu1 at andrew dot cmu dot edu) Cong Lu (congl at andrew dot cmu dot edu) Jack Rae (jwr at andrew dot cmu dot edu) Michael Vespe (mvespe at andrew dot cmu dot edu) LL: Wednesdays 4-5pm, FMS 320 CL: Fridays 11am-12pm, Wean 8110 JR: Wednesdays 11am-12pm, Wean 8110 MV: Mondays 5-6pm, FMS 320 |

Go to: Lectures | Recitations | Assignments | Schedule | Examples for extra credit

- Introduction to data mining

- Information retrieval
- Slides
- R files: 02-ir.R, docs.Rdata

- PageRank

- Clustering 1: K-means and K-medoids
- Slides, marked slides
- R files: 04-clus1.R, kclust.R

- Clustering 2: Hierarchical clustering
- Slides, marked slides
- R files: 05-clus2.R

- Clustering 3: Hierarchical clustering (continued);
choosing the number of clusters
- Slides, marked slides
- R files: 06-clus3.R

- Dimension reduction 1: Principal component analysis
- Slides, marked slides
- R files: 07-dim1.R

- Dimension reduction 2: Principal component analysis
(continued)
- Slides, marked slides
- R files: 08-dim2.R, playerstats.Rdata

- Dimension reduction 3: Nonlinear dimension reduction

- Correlation analysis 1: Canonical correlation analysis
- Slides, marked slides
- R files: 10-cor1.R

- Correlation analysis 2: Measures of correlation

- Correlation analysis 3: Measures of correlation (continued)

- Regression 1: Different perspectives

- Regression 2: More perspectives, shortcomings
- Slides, marked slides
- R files: 14-reg2.R

- Regression 3: More perspectives, shortcomings
(continued)
- Slides, marked slides (same as lecture 14)
- R files: 14-reg2.R (same as lecture 14)

- Modern regression 1: Ridge regression
- Slides, marked slides
- R files: 16-modr1.R

- Modern regression 2: The lasso

- Model selection and validation 1: Cross-validation

- Model selection and validation 1: Model assessment, more cross-validation
- Slides, marked slides
- R files: 19-val2.R, splines.Rdata

- Classification 1: Linear regression of indicators, linear discriminant
analysis
- Slides, marked slides
- R files: 20-clas1.R

- Classification 2: Linear discriminant analysis (continued);
logistic regression
- Slides, marked slides
- R files: 21-clas2.R

- Classification 3: Logistic regression (continued); model-free
classification

- Tree-based methods for classification and regression

- Bagging

- Boosting

- Homework 1, due
February 5

R files hw1prob1.Rdata, hw1prob3.Rdata

- Homework 2, due
February 19

R files: hw2prob1.Rdata, hw2prob3.Rdata, plot.digit.R

- Homework 3, due
March 7

R files: hw3prob3.Rdata, smoother.R

- Homework 4, due
March 28

R files: hw4prob3.R, plotfuns.R, bstar.Rdata

- Homework 5, due
April 11

R files: zip.014.Rdata

- Homework 6, due
April 25

R files: hw6prob1.Rdata

- Final project, due
May 9/10 (in two parts)

R files: neighbor.Rdata

Tues Jan 15 | 1. Introduction to data mining | |

Thurs Jan 17 | 2. Information retrieval | |

Tues Jan 22 | 3. PageRank | Hw 1 out |

Thurs Jan 24 | 4. Clustering 1 | |

Tues Jan 29 | 5. Clustering 2 | |

Thurs Jan 31 | 6. Clustering 3 | |

Tues Feb 5 | 7. Dimension reduction 1 | Hw 1 in, Hw 2 out |

Thurs Feb 7 | 8. Dimension reduction 2 | |

Tues Feb 12 | 9. Dimension reduction 3 | |

Thurs Feb 14 | 10. Correlation analysis 1 | |

Tues Feb 19 | 11. Correlation analysis 2 | Hw 2 in, Hw 3 out |

Thurs Feb 21 | 12. Correlation analysis 3 | |

Tues Feb 26 | Midterm 1 | |

Thurs Feb 28 | 13. Regression 1 | |

Tues Mar 5 | 14. Regression 2 | |

Thurs Mar 7 | 15. Regression 3 | Hw 3 in, Hw 4 out |

Tues Mar 12 | (Spring break, no class) | |

Thurs Mar 14 | (Spring break, no class) | |

Tues Mar 19 | 16. Regularized regression 1 | |

Thurs Mar 21 | 17. Regularized regression 2 | |

Tues Mar 26 | 18. Model selection and validation 1 | |

Thurs Mar 28 | 19. Model selection and validation 2 | Hw 4 in, Hw 5 out |

Tues Apr 2 | 20. Classification 1 | |

Thurs Apr 4 | 21. Classification 2 | |

Tues Apr 9 | 22. Classification 3 | |

Thurs Apr 11 | 23. Trees and boosting 1 | Hw 5 in, Hw 6 out |

Tues Apr 16 | Midterm 2 | |

Thurs Apr 18 | (Spring carnival, no class) | |

Tues Apr 23 | 24. Trees and boosting 2 | |

Thurs Apr 25 | 25. Trees and boosting 3 | Hw 6 in |

Tues April 30 | Work on final projects | |

Thurs May 2 | Work on final projects | |

Fri May 10 5:30‐8:30pm | Final presentations | Final project in |

Top

Click here to sign up for a slot at the start of lecture. When choosing a slot, please keep in mind that there is a preference for examples that have to do with current material that we are covering; e.g., if we are in the middle of our clustering sequences of lectures, examples about clustering are highly encouraged.

Top