36-601: Perspectives on Statistical Practice: Intro to SAS Programming


Documents

601 Syllabus

Lectures

Lecture 1: Introduction
Lecture 2: Inputting Data
Lecture 2 (Cont'd): The Data Step
Lecture 3: Procedures (PROCs)
Lecture 4: Macros
Lecture 5: Graphing
Lecture 6: Reports
Lecture 7: Logistic Regression
Lecture 8: Data Management
Lecture 9: SQL
Lectur 10: PROC COMPARE
Lecture 11: ANOVA
Lecture 12: ANOVA and Power
Lecture 13: Simulation and Power
Lecture 13: In class assignment solution
Lecture 14: The Bootstrap
Lecture 15: Advanced Macros

Lab 1: Regression

Lab 1 Instructions
Blood Level Data
Sample Code

Lab 2: Data Management

Lab 2 Instructions
Lab 2 Data
  1. demog.sas7bdat
  2. exposure.sas7bdat
  3. labs.sas7bdat
My Code

Lab 3: Data Management with PROC SQL

Lab 3 Instructions
Lab 3 Data
  1. rat_exposure.sas7bdat
  2. rat_labs.sas7bdat
My Code

Lab 4: Principal Components/Factor Analysis

protein.dat
track.dat
  1. Data: USArrests.dat
  2. Standardize the dataset BEFORE doing PCA using the appropriate PROC
  3. Perform PCA
  4. Plot each state by the first two principal components using appropriate labels
  5. Interpret results

Lab 5: Logistic Regression/LDA/QDA

spamclass.sas7bdat
emails.sas7bdat


Homework

Hw 1:
  1. Read in the keyboard study data for the four schools (will be emailed to you)
  2. Create one large dataset with all four schools
  3. Add a variable signifying the school
  4. Create four school dummy variables
  5. Create numeric pre-test and post-test variables from the character variables (use the numeric part for observations with < and >)
  6. Create two variables, one with the change in score from pre to post, the other with percent change in score

Hw 2:
  1. Perform EDA on your selected tests for all four schools, but do NOT do any formal statistical analyses.
  2. Write up the results of your EDA. It should include a small summary about the study and the specific test(s) you analyzed along with a summary of your EDA. Your report should include relevent tables and figures. Also include what you would expect to find if you did perform a formal statistical analysis based on the results of the EDA. It should be no longer than 2 pages including tables and figures.

Hw 3: Fully comment the following code: hw3.sas

Hw 4:
  1. Part I.
    1. Write and call a macro that reads in the keyboard study data for each school. In addition to reading in data, the macro should create a variable SCHOOL that takes on the value of the school in which the data came and also create a dummy variable for the school. HINT: The EXCEL workbook sheet, the value of SCHOOL, and the dummy variable should all be the same.
    2. Create one large dataset will all four schools. "Fix" the school dummy variables created by the macro.
    3. Create numeric pretest, posttest, change, and percent change variables as in Hw 1.
    4. Read in the demographics data and MERGE it with the test data.
    5. Create an AGE variable equal to the age of the student in years on January 15, 2010.
  2. Part II.
    1. Perform a one-sample t-test for each of the four schools on the CHANGE variable.
    2. Perform a paired t-test of the PRETEST and POSTTEST variables for each of the four schools.
    3. Perform a two-sample t-test comparing the appropriate schools.
    4. Perform two linear regressions, regressing CHANGE on the appropriate paired schools using dummy variables. Make sure to perform the appropriate diagnostics and interpret your results.
    5. Repeat the above analyses using covariates. You may need to convert character variables to numeric dummy variables.
    6. Compare the results of all your analyses.


Data

smoker.sas7bdat
smoker.csv
smoker.txt
smoker_flat.txt
smoker.xls
Prien-logit.dat
p_dat.csv
p_gender.csv


Readings

SQL Paper

Final Project

Instructions: final-project.pdf

Data for the project:
arm.sas7bdat
schedule.sas7bdat
demog.sas7bdat
treatment.sas7bdat