# Statistics of Inequality and Discrimination

## 36-313, Fall 2022

Cosma Shalizi
Tuesdays and Thursdays, 1:25 -- 2:45 pm, Wean Hall (WEH) 5409

Many social questions about inequality, injustice and unfairness are, in part, questions about evidence, data, and statistics. This class lays out the statistical methods which let us answer questions like "Does this employer discriminate against members of that group?", "Is this standardized test biased against that group?", "Is this decision-making algorithm biased, and what does that even mean?" and "Did this policy which was supposed to reduce this inequality actually help?" We will also look at inequality within groups, and at different ideas about how to explain inequalities between and within groups. The class will interweave discussion of concrete social issues with the relevant statistical concepts.

#### Pre-requisites

36-202 ("Methods for Statistics and Data Science") (and so also 36-200, "Reasoning with Data")

#### Learning Objectives (accreditation bureaucrats look here)

By the end of the course, students will be be able to calculate, adjust, and communicate standard statistical measures of inequality within and between groups, and discuss their relation to social concepts of discrimination and disparity. More specifically, students will learn to
• Calculate standard measures of within-group inequality, such as the Gini index, apply them to data sets, and interpret the results
• Recognize, describe and estimate heavy-tailed distributions of income, wealth, etc., and articulate how heavy tails relate to levels of inequality
• Calculate standard measures of between-group inequality, apply them to data sets, calculate the statistical significance and uncertainty of these measures, and interpret the results
• Adjust measures of between-group inequality for differences in the characteristics of various groups (using regression or other, related techniques), and explain the pros and cons of doing so
• Relate statistical measures of between-group inequality to legal concepts of "disparate treatment", "disparate impact" and discrimination
• Calculate standard measures of intergenerational mobility, and explain their implications for the persistence of inequality
• Calculate and interpret standard measures of spatial segregation;
• Calculate, distinguish and apply widely-used concepts of "fairness" and "bias" for decision-making algorithms and for mental tests;
• Assess studies of social programs against commonly-accepted standards of experimental and observational study design
• Understand models of social dynamics in which categorical inequalities form and/or perpetuate themselves, even in the absence of prejudice, and distinguish situations where these models do apply from those where they do not

## Course Mechanics

#### Lectures

Lectures will be used to amplify the readings, provide examples and demos, and answer questions and generally discuss the material. You will usually find the readings more rewarding if you do the readings before lecture, rather than after (or during).

No Recordings: I will not be recording lectures. This is because the value of class meetings lies precisely in your chance to ask questions, discuss, and generally interact. (Otherwise, you could just read a book.) Recordings interfere with this in two ways:

1. They tempt you to skip class and/or to zone out and/or try to multi-task during it. (Nobody, not even you, is really any good at multi-tasking.) Even if you do watch the recording later, you will not learn as much from it as if you had attended in the first place.
2. People are understandably reluctant to participate when they know they're being recorded. (It's only too easy to manipulate recordings to make anyone seem dumb and/or obnoxious.) Maybe this doesn't bother you; it doesn't bother me, much, because I'm protected by academic freedom and by tenure, but a good proportion of your classmates won't participate if they're being recorded, and that diminishes the value of the class for everyone.

Recording someone without their permission is illegal in many places, and more importantly is unethical everywhere, so don't make your own recordings of the class.

(Taking notes during class is fine and I strongly encourage it; taking notes forces you to think about what you are hearing and how to organize it, which vhelps you understand and remember the content.)

#### No textbooks, lots of readings

There is no one textbook which covers the material we'll go over at the required level. You will, instead, get very detailed lecture notes after each lecture. There will also be a lot of readings from various books and articles. (I will not agree with every reading I assign.)

You will see, when you look at the class schedule below, that there are usually no more than two (shorter) readings per class. There are also however a lot of optional readings. I don't expect you to do all those readings, but they do give you pathways to go deeper into particular subjects, to explore the history of ideas about some matter, or point you at related topics. You may notice that lots of the readings aren't about statistics; this is because doing good statistics about any subject requires knowing lots about the subject-matter.

#### Assignments

There are three reasons you will get assignments in this course. In order of decreasing importance:
1. Practice. Practice is essential to developing the skills you are learning in this class. It also actually helps you learn, because some things which seem murky clarify when you actually do them, and sometimes trying to do something shows you what you only thought you understood.
2. Feedback. By seeing what you can and cannot do, and what comes easily and what you struggle with, I can help you learn better, by giving advice and, if need be, adjusting the course.
3. Evaluation. The university is, in the end, going to stake its reputation (and that of its faculty) on assuring the world that you have mastered the skills and learned the material that goes with your degree. Before doing that, it requires an assessment of how well you have, in fact, mastered the material and skills being taught in this course.

To serve these goals, there will be two kinds of assignment in this course.

After-class comprehension questions and exercises
Following every lecture, there will be a brief set of questions about the material covered in lecture. Sometimes these will be about specific points in the lecture, sometimes about specific aspects of the reading assigned to go with the lecture. These will be done electronically, and will be due the day after each lecture. These should take no more than 10 minutes, but will be untimed (so no accommodations for extra time are necessary). If the questions ask you to do any math (and not all of them will!), a scan or photograph of hand-written math is OK, so long as the picture is clearly legible. (Black ink or dark pencil on unlined white paper helps.)
Homework
Most weeks will have a homework assignment, divided into a series of questions or problems. These will have a common theme, and will usually build on each other. Each problem set will involve some combination of (very basic) statistical theory, (possibly less basic) calculations using the theory we've gone over, and analysis of real data sets using the methods discussed in class.
All homework will be submitted electronically through Gradescope. Most weeks, homework will be due at 6:00 pm on Thursdays (Pittsburgh time). Any exceptions will be clearly noted on the syllabus and at the beginning of the assignment. When this results in less than seven days between an assignment's due date and the previous due date, the homework will be shortened.

#### Time Expectations

You should expect to spend 5--7 hours on assignments every week, averaging over the semester. (This follows from the university's rules about how course credits translate into hours of student time.) If you find yourself spending significantly more time than that on the class, please come to talk to me.

Grades will be broken down as follows:

• Homework: 90%. All homeworks will have equal weight. Your lowest 3 homework grades will be dropped, no questions asked. If you turn in all homework assignments on time, for a grade of at least 60% (each), your lowest four homework grades will be dropped. For every homework, you will get a 24 hour late period in which you can still turn it in after the deadline, but at a 10% penalty. After that 24 hour period, Late homework will not be accepted for any reason.
• After-class questions: 10%. All sets of questions will have equal weight. The lowest 5 will be dropped, no questions asked, with the lowest 6 dropped if you turn in every set of questions with a minimum grade of 60%.
You can submit assignments as many times as you like; the last version you submit is the one that will be graded. Submit early, submit often.

Grade boundaries will be as follows:
 A [90, 100] B [80, 90) C [70, 80) D [60, 70) R < 60

To be fair to everyone, these boundaries will be held to strictly.

As a final word of advice about grading, "what is the least amount of work I need to do in order to get the grade I want?" is a much worse way to approach higher education than "how can I learn the most from this class and from my teachers?".

Homework will be submitted electronically through Gradescope. Canvas will be used as a calendar showing all assignments and their due-dates, to distribute some readings, and as the official gradebook.

We will be using Piazza for question-answering. You will receive an invitation within the first week of class. Anonymous-to-other-students posting of questions and replies will be allowed, at least initially. Anonymity will go away for everyone if it is abused. During Piazza office hours, someone will be online to respond to questions (and follow-ups) in real time. You are welcome to post at any time, but outside of normal working hours you should expect that the instructors have lives.

#### Office Hours

TBD
During Piazza office hours, I'll be checking the site continually, and responding ASAP, so you can get very quick feedback, and there's a record which you (and others in the class) can consult later.

#### Collaboration, Cheating and Plagiarism

Except for explicit group exercises, everything you turn in for a grade must be your own work, or a clearly acknowledged borrowing from an approved source; this includes all mathematical derivations, computer code and output, figures, and text. Any use of permitted sources must be clearly acknowledged in your work, with citations letting the reader verify your source. You are free to consult the textbooks and recommended class texts, lecture slides and demos, any resources provided through the class website, solutions provided to this semester's previous assignments in this course, books and papers in the library, or legitimate online resources, though again, all use of these sources must be acknowledged in your work. (Websites which compile course materials are not legitimate online resources.)

In general, you are free to discuss homework with other students in the class, though not to share or compare work; such conversations must be acknowledged in your assignments. You may not discuss the content of assignments with anyone other than current students, the instructors, or your teachers in other current classes at CMU, until after the assignments are due. (Exceptions can be made, with prior permission, for approved tutors.) You are, naturally, free to complain, in general terms, about any aspect of the course, to whomever you like.

Any use of solutions provided for any assignment in this course, or in other courses, in previous semesters is strictly prohibited. This prohibition applies even to students who are re-taking the course. Do not copy the old solutions (in whole or in part), do not "consult" them, do not read them, do not ask your friend who took the course last year if they "happen to remember" or "can give you a hint". Doing any of these things, or anything like these things, is cheating, it is easily detected cheating, and those who thought they could get away with it in the past have failed the course. Even more importantly: doing any of those things means that the assignment doesn't give you a chance to practice; it makes any feedback you get meaningless; and of course it makes any evaluation based on that assignment unfair.

If you are unsure about what is or is not appropriate, please ask me before submitting anything; there will never be a penalty for asking. If you do violate these policies but then think better of it, it is your responsibility to tell me as soon as possible to discuss how to rectify matters. Otherwise, violations of any sort will lead to severe, formal disciplinary action, under the terms of the university's policy on academic integrity.

On the first day of class, you will be assigned a "homework 0" on the content of these policies. This assignment will not factor into your grade, but you must complete it before you can get any credit for any other assignment.

#### Accommodations for Students with Disabilities

If you need accommodations for physical and/or learning disabilities, please contact the Office of Disability Resources, via their website, www.cmu.edu/disability-resources. They will help you work out an official written accommodation plan, and help coordinate with me.

#### Inclusion and Respectful Participation

The university is a community of scholars, that is, of people seeking knowledge. All of our accumulated knowledge has to be re-learned by every new generation of scholars, and re-tested, which requires debate and discussion. Everyone enrolled in the course has a right to participate in the class discussions. This doesn't mean that everything everyone says is equally correct or equally important, but does mean that everyone needs to be treated with respect as persons, and criticism and debate should be directed at ideas and not at people. Don't dismiss (or credit) anyone in the course because of where they come from, and don't use your participation in the class as a way of shutting up others. Don't be rude, and don't go looking for things to be offended by. Statistical methods don't usually lead to heated debate, but the subjects to which we'll apply the methods notoriously do. If someone else is saying something you think is really wrong-headed, and you think it's important to correct it, address why it doesn't make sense, and listen if they give a counter-argument.

The classroom is not a democracy; as the teacher, I have the right and the responsibility to guide the discussion in what I judge are productive directions. This may include shutting down discussions which are not helping us learn about statistics, even if those discussions might be important to have elsewhere. (You can have them elsewhere.) I will do my best to guide the course in a way which respects everyone's dignity as a human being, as a scholar, and as a member of the university.

## Detailed course calendar

Links to lecture notes, assignments, etc., will go here as they become relevant.

Readings will be finalized a week before each course meeting. Links on readings point to electronic versions accessible through the university library. (You may need to authenticate yourself with the library and/or use the VPN, if you're trying to access them from off campus.) Optional readings really are optional, but the non-optional ones really are not optional. Readings marked with one or more stars (*) are, as it were, especially optional, because of some combination of being long, difficult, old, etc.

The order of topics after about October 15 is currently somewhat tentative. The due dates for assignments, however, are fixed.

#### Lecture 1 (Tuesday, 30 August): Introduction to the course

• Overview of course topics, goals and mechanics. Lighting review of essential probability and statistics: populations, distribution within a population, distribution functions, models for distributions, comparison of distributions across populations or sub-populations, samples and inference from samples.
• Notes for lecture 1 (.Rmd file, showing how to do all the calculations and make all the figures using R Markdown)
• Homework:

#### Lecture 2 (Thursday, 1 September): Describing income and wealth inequality within a single population

• What does the distribution of income and wealth look like within a population? How do we describe population distributions, especially when there is an extreme range of values (a big difference between the rich and poor)? Measures of central tendency (median, mode, mean), of dispersion and of skew. Trends over time in median vs. mean income and wealth. Where do you fall in the distribution if you make \$10,000 a year, or \$50,000, or \$1,000,000? How much do you need to make to be better off than 50% of the population? Than 90%? Than 99%? Where did the idea of "the 1%" wealthy elite come from? Trends over time in typical values vs. high percentiles. Measures of concentration of income and wealth: ratios, the Lorenz curve, the Gini coefficient. Time permitting: other summary measures of concentration and inequality besides the Gini coefficient.
• Notes (.Rmd)

#### Lecture 3 (Tuesday, 6 September): Income and wealth inequality: modeling I

• The concept of "heavy tails", where the largest values in a population are orders of magnitude larger than typical values. Specific kinds of distributions adapted to heavy-tailed data; log-normal and power law (Pareto, Zipf) distributions. Calculating measures of inequality from theoretical distributions.
• Notes (.Rmd)

#### Lecture 4 (Thursday, 8 September): Modeling wealth and income distributions II

• Fitting distributions to data, using summary statistics and/or maximum likelihood. Checking goodness of fit.
• Notes (.Rmd file generating the notes)
• (*) Aaron Clauset, Cosma Rohilla Shalizi and M. E. J. Newman, "Power-law Distributions in Empirical Data", SIAM Review 51 (2009): 661--703, arxiv:0706.1062
• Homework:

#### Lecture 5 (Tuesday, 13 September): Speed-run through social and economic stratification CANCELED

• Canceled due to a family medical emergency on the part of the professor.

#### Lecture 6 (Thursday, 15 September): Speed-run through social and economic stratification

• Reminders (?) about social-scientific concepts used to describe differences between people, and how people describe others and themselves. Types of qualitative, categorical differences: class, order, caste, race, ethnicity, nationality, citizenship, sex, gender. More-or-less dimensions of differentiation: age, status, prestige, income, consumption, wealth, other resources --- "human capital" or skills, "cultural capital", "social capital". The compound measure "socioeconomic status". Education. The distinction between "ascribed" and "attained" social conditions. The legal notion of "protected categories" or "protected attributes". Individual and inter-generational mobility (briefly; we'll come back to the topic later). "Endogamy" (= marrying within the group) and how it creates informative, but uninfluential, genetic differences that let us predict social outcomes from genes.
• Notes (.Rmd file)
• Lisa A. Keister and Darby E. Southgate, "Social stratification and opportunities", ch. 1 in Inequality: A Contemporary Approach to Race, Class, and Gender (Cambridge, England: Cambridge University Press, 2012)
• Kwame Anthony Appiah, The Lies That Bind: Rethinking Identity (New York: W. W. Norton, 2018)
• (*) Kwame Anthony Appiah, Lines of Descent: W. E. B. Du Bois and the Emergence of Identity (Cambridge, Massachusetts: Harvard University Press, 2014)
• Luigi L. Cavalli-Sforza, Genes, Peoples, and Languages (New York: North Point Press, 2000)
• (*) Luigi L. Cavalli-Sforza, Paolo Menozzi and Alberto Piazza, "Demic Expansions and Human Evolution", Science 259 (1993): 639--646
• (*) John Dollard, Caste and Class in a Southern Town (3rd edition Garden City, New York: Doubleday Anchor, 1957; first edition New Haven, Connecticut: Yale University Press, 1937), especially ch. V ("Caste and Class in Southertown")
• Ernest Gellner, Nations and Nationalism (Ithaca, New York: Cornell University Press, 1983)
• (*) Anna C. F. Lewis, Santiago J. Molina, Paul S. Appelbaum, Bege Dauda, Anna Di Rienzo, Agustin Fuentes, Stephanie M. Fullerton, Nanibaa' A. Garrison, Nayanika Ghosh, Evelynn M. Hammonds, David S. Jones, Eimear E. Kenny, Peter Kraft, Sandra S.-J. Lee, Madelyn Mauro, John Novembre, Aaron Panofsky, Mashaal Sohail, Benjamin M. Neale, and Danielle S. Allen, "Getting genetic ancestry right for science and society", Science 376 (2022): 250--252, arxiv:2110.05987
• Alondra Nelson, The Social Life of DNA: Race, Reparations, and Reconciliation After the Genome (Boston: Beacon Press, 2016)
• (*) W. G. Runciman, The Social Animal (London: HarperCollins, 1998)
• (*) Charles Tilly, Durable Inequality (Berkeley: University of California Press, 1998)
• (*) Adam P. Van Arsdale, "Population Demography, Ancestry, and the Biological Concept of Race", Annual Review of Anthropology 48 (2019): 227--241
• (*) W. Lloyd Warner, "American Caste and Class", American Journal of Sociology 42 (1936): 234--237
• Homework:
• Homework 2 due
• Homework 3: assignment; the data file and descriptive "codebook" are on Canvas, and shouldn't be shared outside this class.

#### Lecture 7 (Tuesday, 20 September) Income (and wealth) disparities: comparing central tendencies and typical values

• How does income (and wealth) differ across groups? How do we compare average or typical values? Permutation tests for differences in mean (and other measures of the average). Resampling (the bootstrap) for finding the range of differences compatible with the data, and/or margins of error around an estimate of the difference.
• Notes (.Rmd, containing code you may find useful in the homework)

#### Lecture 8 (Thursday, 22 September) Income (and wealth) disparities: Comparing whole distributions

• The "analysis of variance" method: decomposing the over-all variance into variance within groups versus variance between groups. Some ANOVA models, i.e., linear regressions with only categorical predictor variables. Fitting ANOVA models by least squares. Comparison of ANOVA for income and for log-income. Trends in global income inequality, between-country and within-country inequality. Extending comparisons beyond central tendencies (like means) and measures of dispersion (like variances). Q-Q plots for comparing distributions. Stochastic dominance. The "relative distribution" method of comparing populations.
• Notes (.Rmd)
• Homework:
• Homework 3 due
• Homework 4: assignment (same data set as last week)

#### Lecture 9 (Tuesday, 27 September): Explaining, or explaining away, inequality I

• To what extent can differences in outcomes between groups be explained by differences in their attributes (e.g., explaining differences in incomes by differences in marketable skills)? How should we go about making such adjustments? Is it appropriate to treat discrimination as the "residual" left unexplained? Using regression models to "control for" or "adjust for" multiple variables when comparing mean outcomes. Kitagawa (and related) "decompositions" of group differences into "similar people have different experiences" vs. "different groups have different types of people".
• Notes (.Rmd)

#### Lecture 11 (Tuesday, 4 October): Detecting and interpreting inequalities in hiring, admissions, etc.

• Do employers hire members of different groups at different rates? (Or: do schools admit members of different groups at different rates?) How can we tell? Statistical tests for differences in proportions or probabilities. Comparing hiring rates per applicant by group vs. comparing those hired to some reference population. Audit studies. When are differences in hiring rates evidence for discrimination? How do statistical perspectives on this question line up with legal criteria for "disparate treatment" and "disparate impact"? The economists' concepts of "taste-based" and "statistical" discrimination.

#### Lecture 14 (Thursday, 13 October): Measuring Spatial Segregation and Its Consequences

• What do we mean by "segregation"? Segregation in law ("de jure") and segregration in fact ("de facto"). Different ways of measuring de facto segregation. Trends in de facto racial segregation since the end of de jure racial segregation. Why different measures of segregation give different results. Segregation by income. Segregation by political partisanship. Consequences of segregation. Inter-generational transmission again.
• (**) Jeremy E. Fiel, "Decomposing School Resegregation: Social Closure, Racial Imbalance, and Racial Isolation", American Sociological Review 78 (2013): 828--848
• Salim Furth, "Is Diversity 'Segregation'?", Market Urbanism 5 July 2021 (Commentary on an application of the work of Roberto, below)
• (**) Douglas S. Massey and Nancy A. Denton, "The Dimensions of Residential Segregation", Social Forces 67 (1988): 281--316 [also JSTOR]
• (*) Elizabeth Roberto, "The Divergence Index: A Decomposable Measure of Segregation and Inequality", arxiv:1508.01167
• (*) Robert J. Sampson, Great American city: Chicago and the Enduring Neighborhood Effect (Chicago: University of Chicago Press, 2011)
• (*) Patrick Sharkey, Stuck in Place: Urban Neighborhoods and the End of Progress toward Racial Equality (Chicago: University of Chicago Press, 2013)
• (**) Henri Theil and Anthony J. Finizza, "A note on the measurement of racial integration of schools by means of informational concepts", The Journal of Mathematical Sociology 1 (1971): 187--193
• William Julius Wilson, When Work Disappears: The World of the New Urban Poor (New York: Alfred A. Knopf, 1996)
• Homework:
• Homework 6 due
• Homework 7 assigned (due on 27 October, but of the usual length to accommodate fall break): assignment, mobility.csv data file

#### Tuesday, 18 October and Thursday, 20 October: NO CLASS

Enjoy fall break!

#### Lecture 15 (Tuesday, 25 October): Algorithmic Bias and/or Fairness

• Notions of "fair" prediction or automated decision-making: not using "protected categories" or features; parity of error rates across groups; "calibration" of prediction. "Inference" issues: using features that carry information about protected features. A brief look at the COMPAS controversy.
• Notes (.Rmd), after-class exercise
• Sam Corbett-Davies and Sharad Goel, "The Measure and Mismeasure of Fairness: A Critical Review of Fair Machine Learning", arxiv:1808.00023

#### Lecture 16 (Thursday, 27 October): Algorithmic Fairness continued

Trade-offs between different forms of fairness. Trade-offs between forms of fairness and accuracy. Techniques for mitigating algorithmic unfairness: changing the estimation procedure; changing the data. Some critiques of these notions of "fairness".
• Notes; See notes from last time.
• Robyn M. Dawes, House of Cards: Psychology and Psychotherapy Built on Myth
• Robyn M. Dawes, David Faust and Paul E. Meehl, "Clinical Versus Actuarial Judgment", Science 243 (1989): 1668--1674
• Homework:

#### Lecture 18 (Thursday, 3 November): Intelligence tests

• Are intelligence tests biased? Before that: How do we measure latent attributes? How do we know the latent attributes even exist? What would it mean for there to be such a thing as "general intelligence", that could be measured by tests? Correlations between scores on tests of particular abilities or skills; factor models as explanations for correlations; estimating factor values from tests; alternatives to factor models. What, if anything, do intelligence tests measure? What rising intelligence test results (the Flynn Effect) tell us?
• Homework:

#### Tuesday, 8 November: NO CLASS

• ... due to the professor dealing with a family medical situation. (Also, it's election day --- go vote!)
• Strictly optional readings on inequality in political participation and influence:

#### Lecture 19 (Thursday, 10 November) Measuring attitudes and prejudice

• Explicit attitude measures, a.k.a. asking people what they feel and think. The need to make people answer in stylized ways, e.g., "Likert scales". Difficulties with explicit measures, especially people lying for public approval, a.k.a. "desirability bias". Not-quite-so-explicit measures, like the "modern racism" or "racial resentment" scales. Justifications for these scales. Controversies over what these scales measure (and whether it's changed over time). Implicit measures, especially the implicit association test (IAT). Difficulties with the IAT: it's unclear what (if anything) it measures, it's very noisy, it doesn't predict behavior well, and changes in IAT scores don't seem to lead to changes in behavior.
• Notes (.Rmd)
• Homework:
• Homework 9 due
• Homework 10: assignment; there is a PDF reading (a chapter from The Nature of Prejudice) on Canvas

#### Lecture 20 (Tuesday, 15 November): Evaluating inequality-reducing interventions I

• How do we investigate the effectiveness of interventions intended to reduce inequalities? How do we design a good study an intervention? Principles of experimental design, when we can apply them. Principles of observational studies, when we can't do the experiment. How do we pool information from multiple studies ("meta-analysis")? Do implicit bias interventions change behavior? Does having a chief diversity officer increase faculty diversity?

#### Thursday, 24 November: NO CLASS

Happy Thanksgiving!

#### Lecture 24 (Thursday, 1 December): Statistics and its history

• The development of statistics in the 19th and early 20th century was intimately tied to the eugenics movement, which was deeply racist and even more deeply classist, but also often anti-sexist. The lecture will cover this history, and explain how many of the intellectual tools we have gone over to document, and perhaps to help combat, inequality and discrimination were invented by people who wanted to use them for quite different purposes. The twin learning objectives are for students to grasp something of this history, and to grasp why the "genetic fallacy", of judging ideas by where they come from (their "genesis") is, indeed, foolish and wrong.
• Homework:
• Homework 11 due
• Homework 12 assigned

#### Lecture 25 (Tuesday, 6 December): How do we know what we do about inequalities?

• Social data-collection systems and institutions. Measurement again, and measurement as a social process. Difficulties in reducing social reality to data; the case of race in the US as an example. What systematic data collection leaves out.
• Margo Anderson and Stephen E. Fienberg, "Race and Ethnicity and the Controversy over the US Census", Current Sociology 48 (2000): 87--110
• Richard Alba, The Great Demographic Illusion: Majority, Minority, and the Expanding American Mainstream (Princeton, New Jersey:Princeton University Press, 2020)
• Howard S. Becker, Evidence (Chicago: University of Chicago Press, 2017)
• Patrick J. Egan, "Identity as Dependent Variable: How Americans Shift Their Identities to Align with Their Politics", American Journal of Political Science 64 (2020): 699--716
• Roberto Franzosi, From Words to Numbers: Narrative, Data, and Social Science (Cambridge, England: Cambridge University Press, 2004)
• Kenneth Prewitt, What Is Your Race? The Census and Our Flawed Efforts to Classify Americans (Princeton, New Jersey: Princeton University Press, 2013)

#### Lecture 26 (Thursday, 8 December) Review of the course

• What have we learned?
• Homework:
• Homework 12 due
• Good luck on your exams in other classes!

(Most of the illustrations are from the great German-American artist George Grosz, via ARTSTOR. Clicking on any of the images will take you to its source.)