The Robustness of the Diversity Prediction Theorem II: problems with asymmetry

Posted by Richard Pettigrew March 17, 2023

The Robustness of the Diversity Prediction Theorem II: problems with asymmetry

There is a PDF of this post here.

Take a quantity whose value you wish to estimate---the basic reproduction number for a virus; the number of jelly beans in a jar at the school fête; the global temperature rise caused by doubling the concentration of CO2 in the atmosphere; the number of years before humanity goes extinct. Ask a group of people to provide their estimate of that value, and take the mean of their answers. The Diversity Prediction Theorem says that, if you measure distance as squared difference, so that, for example, the distance from $2$ to $5$ is $(2-5)^2$, the distance that the mean of the answers lies from the true value will be equal to the average distance from the answers to the true value less a quantity that measures the diversity of the answers, namely, the average distance from the answers to the mean answer.

In a previous blogpost, I asked: to what extent is this result a quirk of squared difference? Extending a result due to David Pfau, I showed that it is true of exactly the Bregman divergences. But there's a problem. In the Diversity Prediction Theorem, we measure the diversity of estimates as the average distance from the individual answers to the mean answer. But why this, and not the average distance to the individual answers from the mean answer? Of course, if we use squared difference, then these values are the same, because squared difference is a symmetric measure of distance: the squared difference from one value to another is the same as the squared difference from the second value to the first. And so one set of answers will be more diverse than another according to one definition if it is more diverse according to the other definition. But squared difference is in fact the only symmetric Bregman divergence. So, for all other definitions of Bregman divergence, the two definitions of diversity come apart.

This has an odd effect on the Diversity Prediction Theorem. One of the standard lessons the theorem is supposed to teach is that the mean of a more diverse group is more accurate than the mean of a less diverse group. In fact, even the original version of the theorem doesn't tell us that. It tells us that the mean of a more diverse group is more accurate than the mean of a less diverse group when the average distance from the truth is the same for both groups. But, if we use a non-symmetric distance measure, i.e., one of the Bregman divergences that isn't squared error, and we use the alternative measure of diversity mentioned in the previous paragraph---that is, the average distance to the individual answers from the mean answer---then we can get a case in which the mean of a less diverse group is more accurate than the mean of a more diverse group, even though the average distance from the answers to the truth is the same for both groups. So it seems that we have three choices: (i) justify using squared difference only; (ii) justify the first of the two putative definitions of diversity in terms of average distance between mean answer and the answers; (iii) give up the apparent lesson of the Diversity Prediction Theorem that diversity leads to more accurate average answers. For my money, I think (iii) is the most plausible.

Let me finish off by providing an example. First, define two measures of distance:

Squared difference: $q(x, y) = (x-y)^2$

Generalized Kullback-Leibler divergence: $l(x, y) = x\log(x/y) - x + y$

Then the Diversity Prediction Theorem says that, for any $a_1, \ldots, a_n, t$, if $a^\star = \frac{1}{n}\sum^n_{i=1} a_i$,$$q(a^\star, t) = \frac{1}{n}\sum^n_{i=1} q(a_i, t) - \frac{1}{n}\sum^n_{i=1} q(a_i, a^\star)$$And the generalization I discussed in the previous blogpost entails that$$l(a^\star, t) = \frac{1}{n}\sum^n_{i=1} l(a_i, t) - \frac{1}{n}\sum^n_{i=1} l(a_i, a^\star)$$But drawing from this the conclusion that groups with the same average distance to the truth are more accurate if they're more diverse relies on defining the diversity of the estimates $a_1, \ldots, a_n$ to be $\frac{1}{n}\sum^n_{i=1} l(a_i, a^\star)$, rather than $\frac{1}{n}\sum^n_{i=1} l(a^\star, a_i)$. Suppose that we define it in the second way instead. And take two groups each containing two individuals. Here are their estimates of a quantity (perhaps the R number of a virus):$$\begin{array}{c|c|c|c|c|c} a_1 & a_2 & a^\star & b_1 & b_2 & b^\star \\ \hline 0.5 & 0.1 & 0.3 & 0.3 & 0.9 & 0.6 \end{array}$$Then $a_1, a_2$ is less diverse then $b_1, b_2$ according to the original definition of diversity, but more diverse according to the second definition. That is,$$\frac{l(0.5, 0.3) + l(0.1, 0.3)}{2} < \frac{l(0.3, 0.6) + l(0.9, 0.6)}{2}$$and$$\frac{l(0.3, 0.5) + l(0.3, 0.1)}{2} > \frac{l(0.6, 0.3) + l(0.6, 0.9)}{2}$$What's more, if $t = 0.44994$, then the average distance to the truth is the same for both groups. That is,$$\frac{l(0.5, 0.44994) + l(0.1, 0.44994)}{2} = \frac{l(0.3, 0.44994) + l(0.9, 0.44994)}{2}$$But then it follows that the mean of $a_1, a_2$ is less accurate than the mean of $b_1, b_2$, even though, according to one seemingly legitimate definition of diversity, $a_1, a_2$ is more diverse than $b_1, b_2$.$$l(0.3, 0.44994) > l(0.6, 0.44994)$$

Comments

Anonymous3 April 2023 at 12:34
Natural herbal Remedies....Contact him now for a herpes cure.

___________________ robinsonbuckler@yahoo.com …..

I am cured from herpes…it works…
ReplyDelete
Replies
애브너 심슨안젤로 호지15 October 2024 at 07:03
Great blog. Solid arguments. Keep up the good effort. Keep on writing!
ReplyDelete
Replies
라이언 하딘킹스턴 맥캔15 October 2024 at 07:05
A round of applause for your blog post. Really looking forward to read more.
ReplyDelete
Replies
카이로 켐프아니카 샘슨15 October 2024 at 07:05
Hello! I wish to say that this article is awesome, this is great written
ReplyDelete
Replies
케인 필립스켄지 클라크15 October 2024 at 07:06
Great site. Plenty of helpful information here. I am sending it to several buddies
ReplyDelete
Replies
바카라사이트5 January 2025 at 10:17
Thanks for sharing your info.
ReplyDelete
Replies
엔에프엘뉴스5 January 2025 at 10:17
I really appreciate your efforts and I am waiting for your next post thanks once again.
ReplyDelete
Replies
토토사이트5 January 2025 at 10:17
I’d be very grateful if you could elaborate a little bit more. Kudos!

ReplyDelete
Replies
토토사이트20 February 2025 at 09:38

Great work! This is the type of info that should be shared around the internet.
ReplyDelete
Replies
슬롯사이트20 February 2025 at 09:39

Nice blog, So exciting photo! Looks totally beautiful! CONGRATS!!! Thanks for sharing these information with all of us.

ReplyDelete
Replies
슬롯사이트27 February 2025 at 06:28
It’s really a great and useful piece of information. Keep doing it!
ReplyDelete
Replies
파워볼사이트27 February 2025 at 06:29
I just love to read new topics from your blog.
ReplyDelete
Replies
토토사이트27 February 2025 at 06:29
Nice! Your article is extremely attractive and interesting,
ReplyDelete
Replies
토토사이트27 February 2025 at 06:32
Hello. magnificent job. I didn't expect this
ReplyDelete
Replies
카지노사이트27 February 2025 at 06:32
I need to thank you for this great read!
ReplyDelete
Replies

Add comment

Search This Blog

M-Phi

The Robustness of the Diversity Prediction Theorem II: problems with asymmetry

Comments

Post a Comment

Popular Posts

Mona Simion on resistance to evidence

Discount code for Bertrand's Paradox and the Principle of Indifference by Nicholas Shackel