## Friday, 17 March 2023

### The Robustness of the Diversity Prediction Theorem II: problems with asymmetry

There is a PDF of this post here.

Take a quantity whose value you wish to estimate---the basic reproduction number for a virus; the number of jelly beans in a jar at the school fête; the global temperature rise caused by doubling the concentration of CO2 in the atmosphere; the number of years before humanity goes extinct. Ask a group of people to provide their estimate of that value, and take the mean of their answers. The Diversity Prediction Theorem says that, if you measure distance as squared difference, so that, for example, the distance from $2$ to $5$ is $(2-5)^2$, the distance that the mean of the answers lies from the true value will be equal to the average distance from the answers to the true value less a quantity that measures the diversity of the answers, namely, the average distance from the answers to the mean answer.

In a previous blogpost, I asked: to what extent is this result a quirk of squared difference? Extending a result due to David Pfau, I showed that it is true of exactly the Bregman divergences. But there's a problem. In the Diversity Prediction Theorem, we measure the diversity of estimates as the average distance from the individual answers to the mean answer. But why this, and not the average distance to the individual answers from the mean answer? Of course, if we use squared difference, then these values are the same, because squared difference is a symmetric measure of distance: the squared difference from one value to another is the same as the squared difference from the second value to the first. And so one set of answers will be more diverse than another according to one definition if it is more diverse according to the other definition. But squared difference is in fact the only symmetric Bregman divergence. So, for all other definitions of Bregman divergence, the two definitions of diversity come apart.

This has an odd effect on the Diversity Prediction Theorem. One of the standard lessons the theorem is supposed to teach is that the mean of a more diverse group is more accurate than the mean of a less diverse group. In fact, even the original version of the theorem doesn't tell us that. It tells us that the mean of a more diverse group is more accurate than the mean of a less diverse group when the average distance from the truth is the same for both groups. But, if we use a non-symmetric distance measure, i.e., one of the Bregman divergences that isn't squared error, and we use the alternative measure of diversity mentioned in the previous paragraph---that is,  the average distance to the individual answers from the mean answer---then we can get a case in which the mean of a less diverse group is more accurate than the mean of a more diverse group, even though the average distance from the answers to the truth is the same for both groups. So it seems that we have three choices: (i) justify using squared difference only; (ii) justify the first of the two putative definitions of diversity in terms of average distance between mean answer and the answers; (iii) give up the apparent lesson of the Diversity Prediction Theorem that diversity leads to more accurate average answers. For my money, I think (iii) is the most plausible.

Let me finish off by providing an example. First, define two measures of distance:

Squared difference: $q(x, y) = (x-y)^2$

Generalized Kullback-Leibler divergence: $l(x, y) = x\log(x/y) - x + y$

Then the Diversity Prediction Theorem says that, for any $a_1, \ldots, a_n, t$, if $a^\star = \frac{1}{n}\sum^n_{i=1} a_i$,$$q(a^\star, t) = \frac{1}{n}\sum^n_{i=1} q(a_i, t) - \frac{1}{n}\sum^n_{i=1} q(a_i, a^\star)$$And the generalization I discussed in the previous blogpost entails that$$l(a^\star, t) = \frac{1}{n}\sum^n_{i=1} l(a_i, t) - \frac{1}{n}\sum^n_{i=1} l(a_i, a^\star)$$But drawing from this the conclusion that groups with the same average distance to the truth are more accurate if they're more diverse relies on defining the diversity of the estimates $a_1, \ldots, a_n$ to be $\frac{1}{n}\sum^n_{i=1} l(a_i, a^\star)$, rather than $\frac{1}{n}\sum^n_{i=1} l(a^\star, a_i)$. Suppose that we define it in the second way instead. And take two groups each containing two individuals. Here are their estimates of a quantity (perhaps the R number of a virus):$$\begin{array}{c|c|c|c|c|c} a_1 & a_2 & a^\star & b_1 & b_2 & b^\star \\ \hline 0.5 & 0.1 & 0.3 & 0.3 & 0.9 & 0.6 \end{array}$$Then $a_1, a_2$ is less diverse then $b_1, b_2$ according to the original definition of diversity, but more diverse according to the second definition. That is,$$\frac{l(0.5, 0.3) + l(0.1, 0.3)}{2} < \frac{l(0.3, 0.6) + l(0.9, 0.6)}{2}$$and$$\frac{l(0.3, 0.5) + l(0.3, 0.1)}{2} > \frac{l(0.6, 0.3) + l(0.6, 0.9)}{2}$$What's more, if $t = 0.44994$, then the average distance to the truth is the same for both groups. That is,$$\frac{l(0.5, 0.44994) + l(0.1, 0.44994)}{2} = \frac{l(0.3, 0.44994) + l(0.9, 0.44994)}{2}$$But then it follows that the mean of $a_1, a_2$ is less accurate than the mean of $b_1, b_2$, even though, according to one seemingly legitimate definition of diversity, $a_1, a_2$ is more diverse than $b_1, b_2$.$$l(0.3, 0.44994) > l(0.6, 0.44994)$$