Recommender systems are everywhere these days

Lots of big, famous businesses (e.g., Amazon, Netflix, music streaming…)
Even more smaller businesses (e.g., lots of clothing brands)
Some non-profits (e.g., LibraryThing, some actual libraries)
Some things that don’t immediately look like recommender systems:
- News websites
- “Stories to read” from Google
- Feeds in social media (Facebook, Twitter, etc.)
- “Up next” on YouTube
We’ve seen how they work

What happens now that they’re everywhere?

Homogenization?
Rabbit holes?
Feedback loops?
Discrimination?
Getting what you wish for?
Do they actually make a difference?

Homogenization (I)

If the system recommends item \(i\) because it’s popular, won’t that just make \(i\) more popular?
- (Experimentally, people do pay attention to most-popular rankings, and artificially manipulating them can indeed make things more popular: Salganik, Dodds, and Watts (2006), Salganik and Watts (2008))
If everybody’s getting similar recommendations, won’t that make everybody’s consumption and likes more similar?
Potential response: This is why recommendation systems personalize!

Homogenization (II)

Back in the Olden Days, it was very easy to just not run across something Just Like the Sort of Thing You Liked
- Minor authors, obscure bands, cult movies…
- Maybe you had a friend who shared your taste and would introduce you
- …Or you had a access to a whole sub-culture…
- Or you knew where to find specialist publications and could track things down
Result: even if two people had similar tastes, there was often relatively little overlap in the details of what they consumed
- (With obvious exceptions for when there wasn’t much choice)

Homogenization (II), cont’d

Recommender systems change this situation
- Nearest neighbors: people you don’t know can be close to you in item-feature space, and then you get recommend what they indicate you’ll probably like
- Factor models: similar effect
- Entirely intentional: Shardanand and Maes (1995) was about “algorithms for automating ‘word of mouth’”!

Homogenization (II), cont’d

Each individual will be exposed to a broader array of Stuff Like What They Like than before
\(\Rightarrow\) more diversity within individuals
Similar individuals will be exposed to more of the same things
\(\Rightarrow\) less diversity across individuals
- Finance analogy: if you diversify your portfolio, the assets are are less correlated within your portfolio
- … but if I also diversify my portfolio, our portfolios become more correlated

Rabbit holes, feedback loops, echo chambers

Amplification of small initial differences:
- Start with a small preference for (say) Pashto rap over other kinds of music
- System connects you to lovers of Pashto rap and recommends it
- You follow recommendations and rate more Pashto rap
- The system recommends even more Pashto rap
- Eventually everything it recommends is just Pashto rap
Tastes change with experience, but don’t even need that effect

Feedback loops, echo chambers

Music, who cares?
- You like some news stories about how political party X is awful
- The system connects you to other X-haters
- The system recommends more X-is-awful stories
- You like some of them, so you’re more similar to X-haters
- Eventually everything it recommends is about how X is awful

Discrimination

Some recommender systems use covariates in their predictions
- Few have rules like “Don’t recommend information about [apartments in good neighborhoods / educational opportunities / good jobs / …] to [women / Jews / Muslims / blacks / Chinese / Uighurs / immigrants / non-Brahmins / gun owners …]”
- So it’s unlikely that recommender systems will be blatantly biased towards demographic groups, especially ones defined by legally-protect categories
- BUT lots of other variables are very good proxies for these things
  - In the USA, ZIP code is a very good predictor of race, pretty good for income and education
  - ZIP+4 is an even better predictor

Discrimination (cont’d)

Suppose we leave out all covariates and just use item ratings
Lots of items are really good predictors of demographic variables, especially in combination
- Education, age, race, sex, sexual orientation, income…
We’ll see examples of predicting these from which website you visit (Goel, Hofman, and Sirer 2012)
You could definitely do this from videos watched, or books read, or music liked (with varying error levels)
- E.g., some movie actors are wildly more popular among certain groups than others
- Same for musical genres, musicians, book genres, …
Recommending things liked by people who like the things you already like is, in part, recommending things already common in your demographic group

Why might this be a problem?

Lack of exposure to valuable information
- Maybe not an issue with music recommendations
- What about the news, or job ads, or how-to-apply-for-financial-aid guides?
  - Which how-to-apply-for-financial-aid guides?
Fragmentation of the public: different groups experience the same world very differently, because different filters are (inadvertently) applied
- Bad because there’s no common knowledge?
  - But we were just worrying about homogenization!
- Good because isolation helps each culture grow in its own way?
  - Grow how?
- Better than the alternative of recommendation engineers deciding for everyone what they should see?
  - But the engineers are already doing that…
These are old, old questions of politics and ethics
- Paternalism vs. liberty, integration vs. diversity, …
- It’s unlikely a team of software engineers will solve them…
  - … or even a team of software engineers and data scientists…
  - … or even a team of software engineers, data scientists and professional ethicists
- … but when you’re dealing with political and ethical issues, it’s good to recognize that, and think about what lesson there might be from history, and not pretend it’s all just technical optimization

What does the system actually optimize for?

What are we really trying to predict?
- Ratings/likes?
- Clicks?
- “Engagement”?
- Purchases?
What are we really trying to maximize?
- Prediction accuracy?
- Utility to users?
  - How do we measure that?
- Revenue /profit?
  - Where does the money come from — sales, ads, subscriptions?

What does the system actually optimize for?

Usually, the system’s owners want to make money
Purchase recommendations
- If the system recommends item \(k\) to person \(i\), the probability of their want it enough to buy it is \(p_{ik}\)
- The price is \(r_k\) and the system owner gets a cut, \(q r_k\), for purchases
EXERCISE:

What’s the expected revenue for recommending item \(k\)?
When will the system owner prefer to recommend item \(k\) rather than item \(l\), even though \(p_{ik} < p_{il}\)?

What does the system actually optimize for?

Solutions:

Expected revenue is \(q r_k p_{ik}\)
Prefers item \(k\) to item \(l\) when expected revenue is higher, \[ q r_k p_{ik} > q r_l p_{il} ~ \Rightarrow ~ \frac{r_k}{r_l} > \frac{p_{il}}{p_{ik}} \]

Even if \(p_{il}=1 \gg p_{ik} \approx 0\), might still recommend \(k\) if \(r_k \gg r_l\)
Economically: the recommendation engine’s economic mechanism is not “incentive compatible”
- Maximizing revenue for the operator \(\neq\) maximizing utility for the users
- Proverbially: “If you’re not paying for the service, then you’re the product being solid”
Ways out:
- The simple capitalist solution would be to pay for recommendations
- …but free services out-compete
- An alternative: professionalism
  - Like an old-style newspaper with a “wall” between the advertising department and the journalists
  - How credibly could anyone make this promise? How could the recommenders measure utility to the users?

Do recommendation systems do anything?

We recommend \(k\) to \(i\) and \(i\) says “Yes! I love \(k\)!”
Cross-validation says: triumph!
Cynicism says: Our recommendation has changed nothing
A recommendation is more of an action than a _prediction

What would a successful recommendation look like?

Without the recommendation, user \(i\) wouldn’t try item \(k\)
- (Or would be very unlikely to try it, etc.)
With the recommendation, \(i\) tries \(k\) and likes it
The real question isn’t \(\Expect{X_{ik}|\mathrm{predictors}}\)
It’s \(\Expect{X_{ik}|\mathrm{predictors}, do(\mathrm{recommend})} - \Expect{X_{ik}|\mathrm{predictors}, do(\mathrm{no\ recommendation})}\)
This is a causal question

How do we answer causal questions?

Ignore the issue
- Bad answer but common so you’re at least failing conventionally
Observational causal inference
- Try to match people who got the recommendation to those who didn’t
- Need to control for the variables that lead to recommendations and choices
  - Don’t control for anything “downstream” from the recommendation
- Very clear example for marketing in general (rather than recommendation engines strictly speaking): Rubin and Waterman (2006)
Experiments
- Randomize who gets recommendations
- Works if you can measure the outcome without the recommendation
  - Better for purchases than for ratings

Summing up

Recommendation systems are deeply embedded into modern, online life
They have generally-unintended effects:
- Homogenization within groups
- Positive feedback loops leading to amplification and echo chambers
- Reinforcing existing group differences
System owners (often) want to maximize something different from users
Measuring whether they make a difference is another tricky statistical problem

Backup: “Engagement”

[https://twitter.com/GabrielRossman/status/1169234703414484992]

(Prof. Rossman is joking, but he’s also an excellent sociologist of mass media and social diffusion, so this isn’t entirely a joke)

Backup: Increasing returns to scale

The more users who go into each prediction, the better
- Figure 7 from Shardanand and Maes (1995):

\(\therefore\) Systems with more users, and with more diverse users, will make better predictions
\(\therefore\) A sensible user will prefer to join a larger system than a smaller one
- … all else being equal…
- … what might not be equal?
\(\therefore\) Tendency to segment to a few large systems which are “natural monopolies” for segments of users

(If you want to learn to think this way, Shapiro and Varian (1998) is old but still excellent)

References (in addition to the background reading on the course homepage)

Goel, Sharad, Jake M. Hofman, and M. Irmak Sirer. 2012. “Who Does What on the Web: A Large-Scale Study of Browsing Behavior.” In Sixth International AAAI Conference on Weblogs and Social Media [ICWSM 2012], edited by John G. Breslin, Nicole B. Ellison, James G. Shanahan, and Zeynep Tufekci. AAAI Press. https://www.aaai.org/ocs/index.php/ICWSM/ICWSM12/paper/view/4660.

Rubin, Donald B., and Richard P. Waterman. 2006. “Estimating the Causal Effects of Marketing Interventions Using Propensity Score Methodology.” Statistical Science 21:206–22. https://doi.org/10.1214/088342306000000259.

Salganik, Matthew J., Peter S. Dodds, and Duncan J. Watts. 2006. “Experimental Study of Inequality and Unpredictability in an Artificial Cultural Market.” Science 311:854–56. http://www.princeton.edu/~mjs3/musiclab.shtml.

Salganik, Matthew J., and Duncan J. Watts. 2008. “Leading the Herd Astray: An Experimental Study of Self-Fulfilling Prophecies in an Artificial Cultural Market.” Social Psychological Quarterly 71:338–55. http://www.princeton.edu/~mjs3/salganik_watts08.pdf.

Shapiro, Carl, and Hal R. Varian. 1998. Information Rules: A Strategic Guide to the Network Economy. First. Boston: Harvard Business School Press.

Recommender Systems II — So, What’s Not to Love?

Recommender systems are everywhere these days

What happens now that they’re everywhere?

Homogenization (I)

Homogenization (II)

Homogenization (II), cont’d

Homogenization (II), cont’d

Rabbit holes, feedback loops, echo chambers

Feedback loops, echo chambers

Discrimination

Discrimination (cont’d)

Why might this be a problem?

What does the system actually optimize for?

What does the system actually optimize for?

What does the system actually optimize for?

Do recommendation systems do anything?

What would a successful recommendation look like?

How do we answer causal questions?

Summing up

Backup: “Engagement”

Backup: Increasing returns to scale

References (in addition to the background reading on the course homepage)