Recommender systems are everywhere these days

Lots of big, famous businesses (e.g., Amazon, Netflix, music streaming…)
Even more smaller businesses (e.g., lots of clothing brands)
Some non-profits (e.g., LibraryThing, some actual libraries)
Some things that don’t immediately look like recommender systems:
- Feeds in social media (Facebook, Twitter, etc.)
- “Up next” on YouTube
- News websites, “Discover” in Google’s mobile app, etc.
We’ve seen how they work

What happens now that they’re everywhere?

Homogenization?
Rabbit holes?
Feedback loops?
Discrimination?
Getting what you wish for?
Do they actually make a difference?

Homogenization (I)

If the system recommends item \(i\) because it’s popular, won’t that just make \(i\) more popular?
- (Experimentally, people do pay attention to most-popular rankings, and artificially manipulating them can indeed make things more or less popular: Salganik, Dodds, and Watts (2006); Salganik and Watts (2008))
If everybody’s getting similar recommendations, won’t that make everybody’s consumption and likes more similar?
Potential response: This is why recommendation systems personalize!

Homogenization (II)

Back in the Olden Days, it was very easy to just not run across something Just Like the Sort of Thing You Liked
- Minor authors, obscure bands, cult movies…
- Maybe you had a friend who shared your taste and would introduce you
- …Or you had a access to a whole sub-culture…
- Or you knew where to find specialist publications and could track things down
Result: even if two people had similar tastes, there was often relatively little overlap in the details of what they consumed
- (With obvious exceptions for when there wasn’t much choice, like only 3 TV networks, etc.)

Homogenization (II), cont’d

Recommender systems change this situation
- Nearest neighbors: people you don’t know can be close to you in item-feature space, and then you get recommend what they indicate you’ll probably like
- Factor models: similar effect, more complex geometry
- Entirely intentional: Shardanand and Maes (1995) was about “algorithms for automating ‘word of mouth’”!

Homogenization (II), cont’d

Each individual will be exposed to a broader array of Stuff Like What They Like than before
\(\Rightarrow\) more diversity within individuals
Similar individuals will be exposed to more of the same things
\(\Rightarrow\) less diversity across individuals
- Finance analogy: if you diversify your portfolio, the assets are are less correlated within your portfolio
- … but if I also diversify my portfolio, our portfolios become more correlated

Rabbit holes, feedback loops, echo chambers

Amplification of small initial differences:
- Irene starts with a small preference for (say) Pashto rap over Dari death-metal, Joey with the reverse
- System connects Irene to lovers of Pashto rap and recommends those songs, but connects Joey to lovers of Dari death-metal and recommends that instead
- Irene and Joey both follow the recommendations, so Irene rates more Pashto rap and Joey rates more Dari death-metal
- The system recommends even more Pashto rap to Irene and even more Dari death-metal to Joey
- Eventually everything it recommends to Irene is just Pashto rap, and Joey only gets recommend Dari death-metal
Tastes change with experience, but don’t even need that effect

Feedback loops, echo chambers

Music, who cares?
- Irene clicks on some news stories about how political party X is awful
- The system connects Irene to other X-haters
- The system recommends more X-is-awful stories to Irene
- Irene clicks on some of them, so she’s even more similar to X-haters
- Eventually everything it recommends to Irene is about how X is awful

Discrimination

Some recommender systems use covariates in their predictions
Few have explicit rules like “Don’t recommend information about [apartments in good neighborhoods / educational opportunities / good jobs / …] to [women / Jews / Muslims / blacks / Chinese / Uighurs / immigrants / non-Brahmins / gun owners …]”
So it’s unlikely that recommender systems will be blatantly biased against prominent social groups, especially ones defined by legally-protect categories
- Of course legal protections vary by country
BUT lots of other variables are very good proxies for social categories
- In the USA, ZIP code is a very good predictor of race, pretty good for income and education
  - ZIP+4 is an even better predictor
  - We’ll see some detailed examples next time

Discrimination (cont’d)

Suppose we leave out all user attributes and just use item ratings
Lots of items are really good predictors of demographic attributes, especially in combination
- Education, age, race, sex, sexual orientation, income…
We’ll see an example next time of predicting these attributes from which website you visit (Goel, Hofman, and Sirer 2012)
You could definitely do this from videos watched, or books read, or music liked (with varying error levels)
- E.g., some movie actors are wildly more popular among certain groups than others
- Same for musical genres, musicians, book genres, …
Recommending things liked by people who like the things you already like is, in part, recommending things already common in your demographic group

Why might this be a problem?

Lack of exposure to valuable information
- Maybe not an issue with music recommendations
- What about the news, or job ads, or how-to-apply-for-financial-aid guides?
  - Which how-to-apply-for-financial-aid guides? (Many are basically scams.)
Fragmentation of the public: different groups experience the same world very differently, because different filters are (inadvertently) applied
- Bad because there’s no common knowledge?
  - But we were just worrying about homogenization!
- Good because isolation helps each culture grow in its own way?
  - Grow how?
- Better than the alternative of recommendation engineers deciding for everyone what they should see?
  - But the engineers are already doing that…

Why might this be a problem? (cont’d.)

These are old, old questions of politics and ethics
- Paternalism vs. liberty, integration vs. diversity, …
It’s unlikely a team of software engineers will solve them…
- … or even a team of software engineers and data scientists…
  - … or even a team of software engineers, data scientists and self-declared AI ethicists
… but when you’re dealing with political and ethical issues, it’s good to recognize that, and think about what lesson there might be from history, and not pretend it’s all just technical optimization

What does the system actually optimize for?

What are we really trying to predict?
- Ratings/likes?
- Clicks?
- “Engagement”?
- Purchases?
What are we really trying to maximize?
- Prediction accuracy?
- Utility to users?
  - How do we measure that?
- Revenue /profit?
  - Where does the money come from — sales, ads, subscriptions?

What does the system actually optimize for?

Usually, the system’s owners want to make money
Purchase recommendations
- If the system recommends item \(j\) to person \(i\), the probability of \(i\)’s buying it (through the system) is \(p_{ij}\)
- The price is \(r_j\) and the system owner gets a fixed share \(q\) of every purchase, so purchasing \(j\) is worth \(q r_j\) to the owner
EXERCISE:

What’s the expected revenue for recommending item \(j\)?
When will the system owner prefer to recommend item \(j\) rather than item \(k\), even though \(p_{ij} < p_{ik}\)?

What does the system actually optimize for?

Solutions:

Expected revenue is \(q r_j p_{ij}\)
Prefers item \(j\) to item \(k\) when expected revenue is higher, \[ q r_j p_{ij} > q r_k p_{ik} ~ \Rightarrow ~ \frac{r_j}{r_k} > \frac{p_{ik}}{p_{ij}} \]

Even if \(p_{ik}=1 \gg p_{ij} \approx 0\), might still recommend \(j\) if \(r_j \gg r_k\)
Economically: the recommendation engine’s economic mechanism is not “incentive compatible”
- Maximizing revenue for the operator \(\neq\) maximizing utility for the users
- Proverbially: “If you’re not paying for the service, then you’re the product being solid”

Making recommender systems more aligned with users’ interests

The simple capitalist solution would be to pay for recommendations
- …but free-to-the-user services have obvious competitive advantages
- And systems with more users genuinely deliver better recommendations (see backup)
An alternative: professionalism
- Like an old-style newspaper with a “wall” between the advertising department and the journalists
- How credibly could anyone make this promise?
How could the recommenders measure utility to the users?

Do recommendation systems do anything?

We recommend \(j\) to \(i\) and \(i\) says “Yes! I love \(j\)!”
Cross-validation says: triumph!
Cynicism says: Our recommendation has changed nothing
A recommendation is more of an action than a prediction

What would a successful recommendation look like?

Without the recommendation, user \(i\) wouldn’t consume item \(j\)
- (Or would be very unlikely to try it, etc.)
With the recommendation, \(i\) tries \(j\) and likes it
The real question isn’t \(\Expect{X_{ij}|\mathrm{predictors}}\)
It’s \(\Expect{X_{ij}|\mathrm{predictors}, do(\mathrm{recommend})} - \Expect{X_{ij}|\mathrm{predictors}, do(\mathrm{no\ recommendation})}\)
This is a causal question

How do we answer causal questions?

Ignore the issue
- Bad answer but common so you’re at least failing conventionally
Observational causal inference
- Try to match people who got the recommendation to those who didn’t
- Need to control for the variables that lead to recommendations and choices
  - Don’t control for anything “downstream” from the recommendation
- Very clear example for marketing in general (rather than recommendation engines strictly speaking): Rubin and Waterman (2006)
Experiments
- Randomize who gets recommendations
- Works if you can measure the outcome without the recommendation
- Often easier to measure for for purchases than for ratings
Natural experiments which exploit unrelated variation in what people look at or buy
- Sharma, Hofman, and Watts (2015) use this approach to estimate that \(\approx 75\)% of clicks that happen through Amazon’s recommendations “would likely occur in the absence of recommendations”

Summing up

Recommendation systems are deeply embedded into modern, online life
They have generally-unintended effects:
- Homogenization within groups
- Positive feedback loops leading to amplification and echo chambers
- Reinforcing existing group differences
System owners (often) want to maximize something different from users
Measuring whether they make a difference is another tricky statistical problem

Backup: “Engagement”

[https://twitter.com/GabrielRossman/status/1169234703414484992]

(Prof. Rossman is joking, but he’s also an excellent sociologist of mass media and social diffusion, so this isn’t entirely a joke)

Backup: Increasing returns to scale

The more users who go into each prediction, the better
- Figure 7 from Shardanand and Maes (1995):

\(\therefore\) Systems with more users, and with more diverse users, will make better predictions
\(\therefore\) A sensible user will prefer to join a larger system than a smaller one
- … all else being equal…
- … what might not be equal?
\(\therefore\) Tendency to condense into a few large systems which are “natural monopolies” for segments of users

(If you want to learn to think this way, Shapiro and Varian (1998) is old but still excellent)

References (in addition to the background reading on the course homepage)

Goel, Sharad, Jake M. Hofman, and M. Irmak Sirer. 2012. “Who Does What on the Web: A Large-Scale Study of Browsing Behavior.” In Sixth International AAAI Conference on Weblogs and Social Media [ICWSM 2012], edited by John G. Breslin, Nicole B. Ellison, James G. Shanahan, and Zeynep Tufekci. AAAI Press. https://www.aaai.org/ocs/index.php/ICWSM/ICWSM12/paper/view/4660.

Rubin, Donald B., and Richard P. Waterman. 2006. “Estimating the Causal Effects of Marketing Interventions Using Propensity Score Methodology.” Statistical Science 21:206–22. https://doi.org/10.1214/088342306000000259.

Salganik, Matthew J., Peter S. Dodds, and Duncan J. Watts. 2006. “Experimental Study of Inequality and Unpredictability in an Artificial Cultural Market.” Science 311:854–56. http://www.princeton.edu/~mjs3/musiclab.shtml.

Salganik, Matthew J., and Duncan J. Watts. 2008. “Leading the Herd Astray: An Experimental Study of Self-Fulfilling Prophecies in an Artificial Cultural Market.” Social Psychological Quarterly 71:338–55. http://www.princeton.edu/~mjs3/salganik_watts08.pdf.

Shapiro, Carl, and Hal R. Varian. 1998. Information Rules: A Strategic Guide to the Network Economy. First. Boston: Harvard Business School Press.

Sharma, Amit, Jake M. Hofman, and Duncan J. Watts. 2015. “Estimating the Causal Impact of Recommendation Systems from Observational Data.” In Proceedings of the Sixteenth ACM Conference on Economics and Computation [Ec ’15], edited by Michal Feldman, Michael Schwarz, and Tim Roughgarden, 453–70. New York: The Association for Computing Machinery. https://doi.org/10.1145/2764468.2764488.

Recommender Systems II — So, What’s Not to Love?

Recommender systems are everywhere these days

What happens now that they’re everywhere?

Homogenization (I)

Homogenization (II)

Homogenization (II), cont’d

Homogenization (II), cont’d

Rabbit holes, feedback loops, echo chambers

Feedback loops, echo chambers

Discrimination

Discrimination (cont’d)

Why might this be a problem?

Why might this be a problem? (cont’d.)

What does the system actually optimize for?

What does the system actually optimize for?

What does the system actually optimize for?

Making recommender systems more aligned with users’ interests

Do recommendation systems do anything?

What would a successful recommendation look like?

How do we answer causal questions?

Summing up

Backup: “Engagement”

Backup: Increasing returns to scale

References (in addition to the background reading on the course homepage)