*Diversity Prediction Theorem*, which is sometimes taken to explain why crowds are wiser, on average, than the individuals who compose them. The theorem was originally proved by Anders Krogh and Jesper Vedelsby, but it has entered the literature on social epistemology through the work of Scott E. Page. In this post, I'll generalize this result.

The Diversity Prediction Theorem concerns a situation in which a number of different individuals estimate a particular quantity -- in the original example, it is the weight of an ox at a local fair. Take the crowd's estimate of the quantity to be the average of the individual estimates. Then the theorem shows that the distance from the crowd's estimate to the true value is less than the average distance from the individual estimates to the true value; and, moreover, the difference between the two is always given by the average distance from the individual estimates to the crowd's estimate (which you might think of as the variance of the individual estimates).

Let's make this precise. Suppose you have a group of $n$ individuals. They each provide an estimate for a real-valued quantity. The $i^\mathrm{th}$ individual gives the prediction $q_i$. The true value of this quantity is $\tau$. And we measure the distance from one estimate of a quantity to another, or to the true value of that quantity, using squared error. Then:

- The crowd's prediction of the quantity is $c = \frac{1}{n}\sum^n_{i=1} q_i$.
- The crowd's distance from the true quantity is $\mathrm{SqE}(c) = (c-\tau)^2$.
- $S_i$'s distance from the true quantity is $\mathrm{SqE}(q_i) = (q_i-\tau)^2$
- The average individual distance from the true quantity is $\frac{1}{n} \sum^n_{i=1} \mathrm{SqE}(q_i) = \frac{1}{n} \sum^n_{i=1} (q_i - \tau)^2$.
- The average individual distance from the crowd's estimate is $v = \frac{1}{n}\sum^n_{i=1} (q_i - c)^2$.

**Diversity Prediction Theorem**$$\mathrm{SqE}(c) = \frac{1}{n} \sum^n_{i=1} \mathrm{SqE}(q_i) - v$$

The theorem is easy enough to prove. You essentially just follow the algebra. However, following through the proof, you might be forgiven for thinking that the result says more about some quirk of squared error as a measure of distance than about the wisdom of crowds. And of course squared error is just one way of measuring the distance from an estimate of a quantity to the true value of that quantity, or from one estimate of a quantity to another. There are other such distance measures. So the question arises: Does the Diversity Prediction Theorem hold if we replace squared error with one of these alternative measures of distance? In particular, it is natural to take any of the so-called Bregman divergences $\mathfrak{d}$ to be a legitimate measure of distance from one estimate to another. I won't say much about Bregman divergences here, except to give their formal definition. To learn about their properties, have a look here and here. They were introduced by Bregman as a natural generalization of squared error.

**Definition (Bregman divergence)**A function $\mathfrak{d} : [0, \infty) \times [0, \infty) \rightarrow [0, \infty]$ is a

*Bregman divergence*if there is a continuously differentiable, strictly convex function $\varphi : [0, \infty) \rightarrow [0, \infty)$ such that $$\mathfrak{d}(x, y) = \varphi(x) - \varphi(y) - \varphi'(y)(x-y)$$

Squared error is itself one of the Bregman divergences. It is the one generated by $\varphi(x) = x^2$. But there are many others, each generated by a different function $\varphi$.

Now, suppose we measure distance between estimates using a Bregman divergence $\mathfrak{d}$. Then:

- The crowd's prediction of the quantity is $c = \frac{1}{n}\sum^n_{i=1} j_i$.
- The crowd's distance from the true quantity is $\mathrm{E}(c) = \mathfrak{d}(c, \tau)$.
- $S_i$'s distance from the true quantity is $\mathrm{E}(j_i) = \mathfrak{d}(q_i, \tau)$
- The average individual distance from the true quantity is $\frac{1}{n} \sum^n_{i=1} \mathrm{E}(j_i) = \frac{1}{n} \sum^n_{i=1} \mathfrak{d}(q_i, \tau)$.
- The average individual distance from the crowd's estimate is $v = \frac{1}{n}\sum^n_{i=1} \mathfrak{d}(q_i, c)$.

**Generalized Diversity Prediction Theorem**$$\mathrm{E}(c) = \frac{1}{n} \sum^n_{i=1} \mathrm{E}(q_i) - v$$

*Proof.*

\begin{eqnarray*}

& & \frac{1}{n} \sum^n_{i=1} \mathrm{E}(q_i) - v \\

& = & \frac{1}{n} \sum^n_{i=1} [ \mathfrak{d}(q_i, \tau) - \mathfrak{d}(q_i, c)] \\

& = & \frac{1}{n} \sum^n_{i=1} [\varphi(q_i) - \varphi(\tau) - \varphi'(\tau)(q_i - \tau)] - [\varphi(q_i) - \varphi(c) - \varphi'(\tau)(q_i - c)] \\

& = & \frac{1}{n} \sum^n_{i=1} [\varphi(q_i)- \varphi(\tau) - \varphi'(\tau)(q_i - \tau) - \varphi(q_i)+ \varphi(c) + \varphi'(\tau)(q_i - c)] \\

& = & - \varphi(\tau) - \varphi'(\tau)((\frac{1}{n} \sum^n_{i=1} q_i) - \tau) + \varphi(c) + \varphi'(\tau)((\frac{1}{n} \sum^n_{i=1} q_i) - c) \\

& = & - \varphi(\tau) - \varphi'(\tau)(c - \tau) + \varphi(c) + \varphi'(\tau)(c - c) \\

& = & \varphi(c) - \varphi(\tau) - \varphi'(\tau)(c - \tau) \\

& = & \mathfrak{d}(c, \tau) \\

& = & \mathrm{E}(c)

\end{eqnarray*}

as required.