Mondays and Wednesdays, 3:00--4:20 in Baker Hall 235A

Office hours, Tuesdays 11--12 in Baker Hall 229C

This course is a rapid introduction to the statistical modeling of social, biological and technological networks. Emphasis will be on statistical methodology and subject-matter-agnostic models, rather than on the specifics of different application areas. No prior experience with networks is expected, but familiarity with statistical modeling is essential.

There are no formal pre-requisites, but it is intended for those who already know some statistics and want to learn about networks. (Reasonable background would be at the level of Davison's Statistical Models or Wasserman's All of Statistics; or Casella and Berger's Statistical Inference, plus some actual experience with data.) Others are welcome to try, but are warned that we will make little effort to fill in background statistical knowledge.

Undergraduates wishing to take the course should register
for **36-491**.

Auditors are welcome, provided there is space for them in the classroom.

There are two

- M. E. J. Newman, Networks: An Introduction (Oxford University Press, 2010)
- Eric D. Kolaczyk and Gábor Csárdi, Statistical Analysis of Network Data with R (Springer, 2014; available electronically through SpringerLink)

Class will meet for lecture twice a week. You are expected to attend every lecture. Every student is expected to act as the "scribe" for two lectures, taking notes and producing a (rough) version in LaTeX by the next week. (Most lectures will have two scribes, who will be each produce rough versions.) I will randomly select the scribes at the start of each lecture. (If you must miss a lecture, let me know, so I don't call on you.) My evaluation of the job you do as scribe will be the participation portion your grade, and count for 10% of your total grade.

There will be three homework assignments, due roughly every other week. Homework will be 40% of your grade, evenly split across the assignments.

There will also be a final project, which will by default will be a data
analysis of a real data set. If you want to use an outside data set for the
final project, or to tackle a theoretical or methodological question, you will
need
*prior* permission from the professor; otherwise, a data set and agenda
will be provided to you. The final project will be 50% of your grade.

There will be no curve, and the threshold for an A- will be 90/100.

- Awesome Network Analysis (compiled by François Briatte)
- Stanford Network Analysis Project (SNAP)

- 29 August, Lecture 1: Introduction to the course
- Basic concepts; elementary graph theory
- Lecture notes (based on scribed notes by Brendan McVeigh)
*Reading:*Kolaczyk, chapters 1 and 2*Optional reading:*- Newman, chapters 1--6

- Homework 1: assignment,
`ckm_network.dat`data file - 31 August, Lecture 2: Data collection and sampling
- Lecture notes (based on scribed notes by Cristobal De La Maza and Valerie Yuan)
*Reading:*Kolaczyk, sections 3.1--3.3, 3.5, chapter 5*Optional readings:*- Nesreen K. Ahmed, Jennifer Neville, Ramana Kompella, "Reconsidering the Foundations of Network Sampling" [PDF preprint]
- Mark S. Handcock and Krista J. Gile, "Modeling Social Networks from Sampled Data", Annals of Applied Statistics
**4**(2010): 5--25, arxiv:1010.0891 - Gueorgi Kossinets, "Effects of Missing Data in Social Networks",
Social Networks
**28**(2006): 247--268, arxiv:cond-mat/0306335 - Grant Schoenebeck, "Potential Networks, Contagious Communities, and Understanding Social Network Structure", WWW 2013, arxiv:1304.1845
- Michael P. H. Stumpf and Carsten Wiuf, "Sampling properties
of random graphs: the degree
distribution", Physical
Review E
**72**(2005): 036118, cond-mat/0507345 - Michael P. H. Stumpf, Carsten Wiuf and Robert M. May,
"Subnets of scale-free networks are not scale-free: Sampling properties of
networks", PNAS
**102**(2005): 4221--4224

- 7 September, Lecture 3: Visualization and descriptive statistics
*Scribed lecture notes:*by Momin Malik and Neil Spencer*Reading:*Kolaczyk, section 3.4, chapter 4*Optional reading:*- Newman, chapters 6--8

- 12 September, Lecture 4: Random graphs
- The Erdos-Renyi model and its properties
*Scribed lecture notes:*by Ciaran Evans and by Jacqueline Mauro*Reading:*Kolaczyk, sections 6.1--6.2*Optional readings:*- Newman, chapters 12 and 13
- M. E. J. Newman, S. H. Strogatz, D. J. Watts, "Random graphs with arbitrary degree distributions and their applications", Physical Review E
**64**(2001): 026118, arxiv:cond-mat/0007235

- 14 September, Lecture 5: Block models and stochastic block models
- Structural equivalence; regular equivalence; regular, stochastic equivalence; latent variables
*Scribed lecture notes:*by Zongge Liu and Brendan McVeigh*Reading:*TBA*Optional readings:*- Stephen E. Fienberg and Stanley Wasserman, "Categorical Data Analysis of Single Sociometric Relations", pp. 156--192 in Samuel Leinhardt (ed.), Sociological Methodology 1981 (San Francisco: Jossey-Bass, 1981) [JSTOR]
- Paul W. Holland, Kathryn Blackmond Laskey and Samuel Leinhardt,
"Stochastic blockmodels: First steps",
Social Networks
**5**(1983): 109--137 - Stephen E. Fienberg, Michael M. Meyer and Stanley S. Wasserman,
"Statistical Analysis of Multiple Sociometric Relations",
Journal of the
American Statistical Association
**80**(1985): 51--67 - Tom A. B. Snijders and Krzysztof Nowicki, "Estimation and Prediction for Stochastic Blockmodels for Graphs with Latent Block Structure",
Journal of Classification
**14**(1997): 75--100 - Edoardo M. Airoldi, David M. Blei, Stephen E. Fienberg and Eric P. Xing, "Mixed Membership Stochastic Blockmodels",
Journal of Machine Learning Research
**9**(2008): 1981--2014 - Jörg Reichardt and Douglas R. White, "Role models for complex networks", arxiv:0708.0958 [Discussion]

- Homework 1 due
- Homework 2: assignment,
`fj.txt`data file - 19 September, Lecture 6: Community discovery
- Stochastic block models; modularity
*Reading:*Lecture canceled due to instructor back injury; see under lecture 7 for slides*Optional readings:*- Newman, chapter 11
*Included by reference:*Community Discovery Methods for Complex Networks

- 21 September, Lecture 7: Latent space models
- Continuous latent spaces; Euclidean and non-Euclidean models
*Reading:*Slides (.Rmd source file)*Optional readings*(including references from the slides):- Aurelien Decelle, Florent Krzakala, Cristopher Moore and Lenka
Zdeborova
- "Phase transition in the detection of modules in sparse networks",
Physical Review Letters
**107**(2011): 065701, arxiv:1102.1182 - "Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications", Physical Review E
**84**(2011): 066106, arxiv:1109.3041

- "Phase transition in the detection of modules in sparse networks",
Physical Review Letters
- Peter Hoff, "Modeling homophily and stochastic equivalence in symmetric relational data", NIPS 2007
- Peter D. Hoff, Adrian E. Raftery and Mark S. Handcock, "Latent Space Approaches to Social Network Analysis", Journal of the American Statistical Association
**97**(2002): 1090--1098 [PDF preprint] - Dmitri Krioukov, Fragkiskos Papadopoulos, Maksim Kitsak, Amin Vahdat, Marian Boguna, "Hyperbolic Geometry of Complex Networks", Physical Review E
**82**(2010): 036106, arxiv:1006.5169 - Andreas Noack, "Modularity clustering is force-directed
layout", Physical Review E
**79**(2009): 026102, arxiv:0807.4052

- Aurelien Decelle, Florent Krzakala, Cristopher Moore and Lenka
Zdeborova
- Homework 2 due
~~Homework 3 assigned~~- 26 September, Lecture 8: Cumulative advantage, preferential attachment, and vertex-copying models
- Skew distributions from positive feedback
*Scribed lectures notes:*by Matthew Babcock and Mo Li*Reading:*Kolaczyk, sections 6.3--6.4*Optional reading:*- Newman, chapter 14 and section 15.1
- Derek J. de Solla Price, "Networks of Scientific Papers", Science
**149**(1965): 510--515 - Price, "A General Theory of Bibliometric and Other Cumulative Advantage Processes", Journal of the American Society for Information Science
**27**(1976): 292--306 - William J. Reed and Barry D. Hughes, "From Gene Families and Genera to Incomes and Internet File Sizes: Why Power Laws are so Common in Nature", Physical Review E
**66**(2002): 067103 - Herbert A. Simon, "On a Class of Skew Distribution Functions", Biometrika
**42**(1955): 425--440 - Réka Albert and Albert-László Barabási", "Statistical Mechanics of Networks", Reviews of Modern Physics
**74**(2002): 47--97, arxiv:cond-mat/0106096 - Carsten Wiuf, Markus Brameier, Oskar Hagberg and Michael P. H. Stumpf, "A likelihood approach to analysis of network data", Proceedings of the National Academy of Sciences (USA)
**103**(2006): 7566--7570 [Commentary]

- 28 September, Lecture 9: The scale-free network controversy
- Drawing straight lines on log-log plots makes the baby Gauss cry.
*Scribed lecture notes:*by Alan Mishler*Reading:*Clauset, Shalizi and Newman, "Power-law Distributions in Empirical Data", SIAM Review**51**(2009): 661--703, arxiv:0706.1062*Optional reading:*- Raya Khanin and Ernst Wit, "How Scale-Free Are Biological
Networks?",
Journal of Computational Biology
**13**(2006): 810--818 - Michael Mitzenmacher, "A Brief History of Generative Models for
Power Law and Lognormal Distributions", Internet Mathematics
**1**(2003): 226--251 [PDF]

- Raya Khanin and Ernst Wit, "How Scale-Free Are Biological
Networks?",
Journal of Computational Biology
- 3 October, Lecture 10: Exponential-family random graph models I: the good news
*Scribed lecture notes*: by Nicolas Kim and Shengming Luo*Reading:*Kolaczyk, section 6.5*Optional reading:*- Newman, section 15.2
- Mark S. Handcock, David R. Hunter, Carter T. Butts, Steven M. Goodreau, and Martina Morris (eds.), "Statistical Modeling of Social Networks with 'statnet'", special volume (24) of the Journal of Statistical Software (2008)
- Steven M. Goodreau, James A. Kitts and Martina Morris,
"Birds of a Feather, Or Friend of a Friend?: Using Exponential Random Graph Models to Investigate Adolescent Social Networks", Demography
**46**(2009): 103--125 - Arun Chandrasekhar and Matthew O. Jackson, "Tractable and Consistent Random Graph Models", arxiv:1210.7375
- Juyong Park and Mark E. J. Newman, "The Statistical Mechanics of Networks", Physical Review E
**70**(2004): 066117, arxiv:cond-mat/0405566

- 5 October, Lecture 11: Exponential-family random graph models II: the bad news
*Scribed lecture notes*: by Abulhair Saparov*Reading:*- Mark S. Handcock, "Assessing Degeneracy in Statistical Models of Social Networks", Center for Statistics and the Social Sciences, University of Washington, Working Paper 39 (2003)
- Alessandro Rinaldo, Stephen E. Fienberg and Yi Zhou, "On the Geometry of Discrete Exponential Families with Application to Exponential Random Graph Models", Electronic Journal of Statistics
**3**(2009): 446--484 - Cosma Rohilla Shalizi and Alessandro
Rinaldo, "Consistency under Sampling of Exponential Random Graph
Models", Annals of
Statistics
**41**(2013): 508--535, arxiv:1111.3054

*Optional reading:*- Tom A. B. Snijders, "Conditional Marginalization for Exponential Random Graph Models", Journal of Mathematical Sociology
**34**(2010): 239--252 [Reprint]

- Tom A. B. Snijders, "Conditional Marginalization for Exponential Random Graph Models", Journal of Mathematical Sociology
~~Homework 3 due~~- Final project begins: assignment
- 10 October, Lecture 12: Network models are statistical models
- Goodness-of-fit, link prediction, model comparison
*Scribed lecture notes*: by Lynn Kaack*Reading*: Kolaczyk, section 7.2*Optional reading:*- Beau Dabbs and Brian Junker, "Comparison of Cross-Validation Methods for Stochastic Block Models", arxiv:1605.03000
- Kieran Healy, "The Performativity of Networks" [PDF]
- David R. Hunter,
Steven M. Goodreau and Mark
S. Handcock, "Goodness of Fit of Social Network Models", Journal of
the American Statistical Association
**103**(2008): 248--258 [PDF] - Jesse C. Shore and Benjamin Lubin, "Spectral goodness of fit
for network
models", Social
Networks
**43**(2015): 16–-27, arxiv:1407.7247

- 12 October: NO LECTURE
- 17 October, Lecture 13: Dynamic processes on networks I: basic models
- Diffusion; percolation; epidemics; coupled oscillators; regression on neighbors
*Notes for this lecture:*Many Cheerful Facts about the Laplacian*Scribed lecture notes:*by Xiaoying Tu*Optional readings:*- Newman, chapters 16--18
- Mark E. J. Newman, "The spread of epidemic disease on networks", Physical Review E
**66**(2002): 016128, arxiv:cond-mat/0205009 - Eben Kenah and James M. Robins, "Second look at the spread of epidemics on networks", Physical Review E
**76**(2007): 036113, arxiv:q-bio/0610057 - Stephen Judd, Michael Kearns, and Yevgeniy Vorobeychik, "Behavioral dynamics and influence in networked coloring and consensus", Proceedings of the National Academy of Sciences
**107**(2010): 14978--14982 - Seth A. Marvel, Travis Martin, Charles R. Doering, David Lusseau, M. E. J. Newman, "The small-world effect is a modern phenomenon", arxiv:1310.2636
- Laura M. Smith, Kristina Lerman, Cristina Garcia-Cardona, Allon G. Percus and Rumi Ghosh, "Spectral Clustering with Epidemic Diffusion",
Physical Review E
**88**(2013): 042813, arxiv:1303.2663

- 19 October, Lecture 14: Dynamic processes on networks II: homophily vs. influence
- Or, "A social network is a machine for creating selection bias."
*Scribed lecture notes:*by Shengming Luo and by Jacqueline Mauro*Reading*: Cosma Rohilla Shalizi and Andrew C. Thomas, "Homophily and Contagion Are Generically Confounded in Observational Social Network Studies", Sociological Methods and Research**40**(2011): 211--239, arxiv:1004.4704*Optional reading*:- Thomas W. Valente, "Network Models and Methods for Studying the Diffusion of Innovations", pp. 98--116 in Carrington, Scott and Wasserman (eds.), Models and Methods in Social Network Analysis (Cambridge University Press, 2005)
- Nicholas A. Christakis and James H. Fowler, "The Spread of Obesity in a Large Social Network over 32 Years", The New England Journal of Medicine
**357**(2007): 370--379 - Russell Lyons, "The Spread of Evidence-Poor Medicine via Flawed
Social-Network
Analysis", Statistics, Politics, and Policy
**2**(2011): 1, arxiv:1007.2876 - Greg Ver Steeg and Aram Galstyan, "Statistical Tests for Contagion in Observational Social Network Studies", AISTATS 2013, arxiv:1211.4889
- Cosma Rohilla Shalizi and Edward McFowland III, "Controlling for Latent Homophily in Social Networks through Inferring Latent Locations", arxiv:1607.06565

- 20 October: Final projects due

If you are unsure about what is or is not appropriate, please ask me before submitting anything; there will be no penalty for asking.

All of us benefit from support during times of struggle. You are not alone. There are many helpful resources available on campus and an important part of the college experience is learning how to ask for help. Asking for support sooner rather than later is often helpful.

If you or anyone you know experiences any academic stress, difficult life events, or feelings like anxiety or depression, we strongly encourage you to seek support. Counseling and Psychological Services (CaPS) is here to help: call 412-268-2922 and visit their website. Consider reaching out to a friend, faculty or family member you trust for help getting connected to the support that can help.

If you or someone you know is feeling suicidal or in danger of self-harm, call someone immediately, day or night:

CaPS: 412-268-2922

Re:solve Crisis Network: 888-796-8226

If the situation is life threatening, call the police:

On campus: CMU Police: 412-268-2323

Off campus: 911

If you have questions about this or your coursework, please let me know.