Aggregating for accuracy: another accuracy argument for linear pooling

 A PDF of this blogpost is available here.

I don't have an estimate for how long it will be before the Greenland ice sheet collapses, and I don't have an estimate for how long it will be before the average temperature at Earth's surface rises more than 3C above pre-industrial levels. But I know a bunch of people who do have such estimates, and I might hope that learning theirs might help me set mine. Unfortunately, each of these people has a different estimate for each of these two quantities. What should I do? Should I pick one of them at random and adopt their estimates as mine? Or should I pick some compromise between them? If the latter, which compromise?

Cat inaccurately estimates width of step
The following fact gives a hint. An estimate of a quantity, such as the number of years until an ice sheet collapses or the number of years until the temperature rises by a certain amount, is better the closer it lies to the true value of the quantity and worse the further it lies from this. There are various ways to measure the distance between estimate and true value, but we'll stick with a standard one here, namely, squared error, which takes the distance to be the square of the difference between the two values. Then the following is simply a mathematical fact: taking the straight average of the group's estimates of each quantity as your estimate of that quantity is guaranteed to be better, in expectation, than picking a member of the group at random and simply deferring to them. This is sometimes called the Diversity Prediction Theorem, or it's a corollary of what goes by that name.

The result raises a natural question: Is it only by taking the straight average of the group's estimate as your own that you can be guaranteed to do better, in expectation, than by picking at random? Or is there another method for aggregating the estimates that also has this property? As I'll show, only straight averaging has this property. If you combine the group's estimates in any other way to give your own, there is a possible set of true values that will lie further from your estimate than you would lie, in expectation, were you to pick at random. The question is natural, and the answer is not difficult to prove, so I'm pretty confident this has been asked and answered before; but I haven't been able to find it, so I'd be grateful for a reference if anyone has one.

Let's make all of this precise. We have a group of $m$ individuals; each of them has an estimate for each of the quantities $Q_1, \ldots, Q_n$. We represent individual $j$ by the sequence $X_j = (x_{j1}, \ldots, x_{jn})$ of their estimates of these quantities. So $x_{ji}$ is the estimate of quantity $Q_i$ by individual $j$. Suppose $T = (t_1, \ldots, t_n)$ is the sequence of true values of these quantities. So $t_i$ is the true value of $Q_i$. Then the disvalue or badness of individual $j$'s estimates, as measured by squared error, is:$$(x_{j1} - t_1)^2 + \ldots + (x_{jn} - t_n)^2$$The disvalue or badness of individual $j$'s estimate of quantity $Q_i$ is $(x_{ji} - t_i)^2$, and the disvalue or badness of their whole set of estimates is the sum of the disvalue or badness of their individual estimates. We write $\mathrm{SE}(X, T)$ for this sum. That is,$$\mathrm{SE}(X_j, T) = \sum_i (x_{ji} - t_i)^2$$Then the Diversity Prediction Theorem says that, for any $X_1, \ldots, X_m$ and any $T$,$$\mathrm{SE}\left (\frac{1}{m}X_1 + \ldots + \frac{1}{m}X_m, T \right ) < \frac{1}{m}\mathrm{SE}(X_1, T) + \ldots + \frac{1}{m}\mathrm{SE}(X_m, T)$$And we wish to prove a sort of converse, namely, if $V \neq \frac{1}{m}X_1 + \ldots + \frac{1}{m}X_m$, then there is a possible set of true values $T = (t_1, \ldots, t_n)$ such that$$\mathrm{SE}(V, T) > \frac{1}{m}\mathrm{SE}(X_1, T) + \ldots + \frac{1}{m}\mathrm{SE}(X_m, T)$$I'll give the proof below.

Why is this interesting? One question at the core of those parts of philosophy that deal with collectives and their attitudes is this: How should you aggregate the opinions of a group of individuals to give a single set of opinions? When the opinions come in numerical form, such as when they are estimates of quantities or when they are probabilities, there are a number of proposals. Taking the straight arithmetic average as we have done here is just one. How are we to decide which to use? Standard arguments proceed by identifying a set of properties that only one aggregation method boasts, and then arguing that the properties in the set are desirable given your purpose in doing the aggregation in the first place. The result we have just noted might be used to mount just such an argument: when we aggregate estimates, we might well want a method that is guaranteed to produce aggregate estimates that are better, in expectation, than picking at random, and straight averaging is the only method that does that. 

Finally, here's a slightly more general version of the result, which considers not just straight averages but also weighted averages; the proof is also given.

Proposition Suppose $\lambda_1, \ldots, \lambda_m$ is a set of weights, so that $0 \leq \lambda_j \leq 1$ and $\sum_j \lambda_j = 1$. Then, if $V \neq \lambda_1 X_1 + \ldots + \lambda_mX_m$, then there is a possible set of true values $T = (t_1, \ldots, t_n)$ such that$$\mathrm{SE}(V, T) > \lambda_1\mathrm{SE}(X_1, T) + \ldots + \lambda_m\mathrm{SE}(X_m, T)$$

Proof. The left-hand side of the inequality is
$$
\mathrm{SE}(V, T) = \sum_i (v_i - t_i)^2 = \sum_i v_i^2 - 2\sum_i v_it_i + \sum_i t^2_i
$$The right-hand side of the inequality is
\begin{eqnarray*}
\sum_j \lambda_j \mathrm{SE}(X_j, T) & = & \sum_j \lambda_j \sum_i (x_{ji} - t_i)^2 \\
& = & \sum_j \lambda_j \sum_i \left ( x^2_{ji} - 2x_{ji}t_i + t_i^2 \right ) \\
& = & \sum_{i,j} \lambda_j x^2_{ji} - 2\sum_{i,j} \lambda_j x_{ji}t_i + \sum_i t_i^2
\end{eqnarray*}
So $\mathrm{SE}(V, T) > \sum_j \lambda_j \mathrm{SE}(X_j, T)$ iff$$
\sum_i v_i^2 - 2\sum_i v_it_i > \sum_{i,j} \lambda_j x^2_{ji} - 2\sum_{i,j} \lambda_j x_{ji}t_i
$$iff$$
  2\left ( \sum_i \left ( \sum_j \lambda_j x_{ji}- v_i \right) t_i \right ) > \sum_{i,j} \lambda_j x^2_{ji} - \sum_i v_i^2
$$And, if $(v_1, \ldots, v_n) \neq (\sum_j \lambda_j x_{j1}, \ldots,\sum_j \lambda_j x_{jn})$, there is $i$ such that $\sum_j \lambda_j x_{ji} - v_i \neq 0$, and so it is always possible to choose $T = (t_1, \ldots, t_n)$ so that the inequality holds, as required.

Comments