Aggregating abstaining experts

In a series of posts a few months ago (here, here, and here), I explored a particular method by which we might aggregate expert credences when those credences are incoherent. The result was this paper, which is now forthcoming in Synthese. The method in question was called the coherent approximation principle (CAP), and it was introduced by Daniel Osherson and Moshe Vardi in this 2006 paper. CAP is based on what we might call the principle of minimal mutilation. We begin with a collection of credence functions, c1, ..., cn, one for each expert, and some of which might be incoherent. What we want at the end is a single coherent credence function c that is the aggregate of c1, ..., cn. The principle of minimal mutilation says that c should be as close as possible to the cis -- when aggregating a collection of credence functions, you should change them as little as possible to obtain your aggregate.

We can spell this out more precisely by introducing a divergence D. We might think of this as a measure of how far one credence function lies from another. Thus, D(c,c) measures the distance from c to c. We call these measures divergences rather than distances or metrics, since they do not have the usual features that mathematicians assume of a metric: we assume D(c,c)0, for any c,c, and D(c,c)=0 iff c=c, but we do not assume that D is symmetric nor that it satisfies the triangle inequality. In particular, we assume that D is an additive Bregman divergence. The standard example of an additive Bregman divergence is squared Euclidean distance: if c, c are both defined on the set of propositions F, then
SED(c,c)=XF|c(X)c(X)|2In fact, SED is symmetric, but it does not satisfy the triangle inequality. The details of this family of divergences needn't detain us here (but see here and here for more). Indeed, we will simply use SED throughout. But a more general treatment would look at other additive Bregman divergences, and I hope to do this soon.

Now, suppose c1, ..., cn is a set of expert credence functions. And suppose ci is defined on the set of propositions Fi. And suppose that D is an additive Bregman divergence -- you might take it to be SED. Then how do we define the aggregate c that is obtained from c1, ..., cn by a minimal mutilation? We let c be the coherent credence function such that the sum of the distances from c to the cis is minimal. That is,
CAPD(c1,,cn)=arg mincPFii=1nD(c,ci)
where PFi is the set of coherent credence functions over Fi.

As we see in my paper linked above, if each of the credence functions are defined over the same set of propositions -- that is, if Fi=Fj, for all 1i,j,n -- then:
  • if D is squared Euclidean distance, then this aggregate is the straight linear pool of the original credences; if c is defined on the partition X1, ..., Xm, then the straight linear pool of c1, ..., cn is this:c(Xj)=1nc1(Xj)+...+1ncn(Xj)
  • if D is the generalized Kullback-Leibler divergence, then the aggregate is the straight geometric pool of the originals; if c is defined on the partition X1, ..., Xm, then the straight geometric pool of c1, ..., cn is this: c(Xj)=1K(c1(Xj)1n×...×c1(Xj)1n)where K is a normalizing factor.
(For more on these types of aggregation, see here and here).

In this post, I'm interested in cases where our agents have credences in different sets of propositions. For instance, the first agent has credences concerning the rainfall in Bristol tomorrow and the rainfall in Bath, but the second has credences concerning the rainfall in Bristol and the rainfall in Birmingham.

I want to begin by pointing to a shortcoming of CAP when it is applied to such cases. It fails to satisfy what we might think of as a basic desideratum of such procedures. To illustrate this desideratum, let's suppose that the three propositions X1, X2, and X3 form a partition. And suppose that Amira has credences in X1, X2, and X3, while Benito has credences only in X1 and X2. In particular:
  • Amira's credence function is: cA(X1)=0.3, cA(X2)=0.6, cA(X3)=0.1.
  • Benito's credence function is: cB(X1)=0.2, cB(X2)=0.6.
Now, notice that, while Amira's credence function is defined on the whole partition, Benito's is not. But, nonetheless, Benito's credences uniquely determine a coherent credence function on the whole partition:
  • Benito's extended credence function is: cB(X1)=0.2, cB(X2)=0.6, cB(X3)=0.2.
Thus, we might expect our aggregation procedure to give the same result whether we aggregate Amira's credence function with Benito's or with Benito's extended credence function. That is, we might expect the same result whether we aggregate cA with cB or with cB. After all, cB is in some sense implicit in cB. An agent with credence function cB is committed to the credences assigned by credence function cB.

However, CAP does not do this. As mentioned above, if you aggregate cA and cB using SED, then the result is their linear pool: 12cA+12cB. Thus, the aggregate credence in X1 is 0.25; in X2 it is 0.6; and in X3 it is 0.15. The result is different if you aggregate cA and cB using SED: the aggregate credence in X1 is 0.2625; in X2 it is 0.6125; in X3 it is 0.125.

Now, it is natural to think that the problem arises here because Amira's credences are getting too much say in how far a potential aggregate lies from the agents, since she has credences in three propositions, while Benito only has credences in two. And, sure enough, CAPSED(cA,cB) lies closer to cA than to cB and closer to cA than the aggregate of cA and cB lies. And it is equally natural to try to solve this potential bias in favour of the agent with more credences by normalising. That is, we might define a new version of CAP:
CAPD+(c1,,cn)=arg mincPFii=1n1|Fi|D(c,ci)
However, this doesn't help. Using this definition, the aggregate of Amira's credence function cA and Benito's extended credence function cB remains the same; but the aggregate of Amira's credence function and Benito's original credence function changes -- the aggregate credence in X1 is 0.25333; in X2, it is 0.61333; in X3, it is 0.1333. Again, the two ways of aggregating disagree.

So here is our desideratum in general:

Agreement with Coherent Commitments (ACC) Suppose c1, ..., cn are coherent credence functions, with ci defined on Fi, for each 1in. And let F=i=1nFi. Now suppose that, for each ci defined on Fi, there is a unique coherent credence function ci defined on F that extends ci -- that is, ci(X)=ci(X) for all X in Fi. Then the aggregate of c1, ..., cn should be the same as the aggregate of c1, ..., cn.

CAP does not satisfy ACC. Is there a natural aggregation rule that does? Here's a suggestion. Suppose you wish to aggregate a set of credence functions c1, ..., cn, where ci is defined on Fi, as above. Then we proceed as follows.
  1. First, let F=i=1nFi.
  2. Second, for each 1in, let ci={c:c is coherent & c is defined on F & c(X)=ci(X) for all X in F} That is, while ci represents a precise credal state defined on Fi, ci represents an imprecise credal state defined on F. It is the set of coherent credence functions on F that extend ci. That is, it is the set of coherent credence functions on F that agree with ci on propositions in Fi. Thus, if, like Benito, your coherent credences on Fi uniquely determine your coherent credences on F, then ci is just the singleton that contains that unique extension. But if your credences over Fi do not uniquely determine your coherent credences over F, then ci will contain more coherent credence functions.
  3. Finally, we take the aggregate of c1, ..., cn to be the credence function c that minimizes the total distance from c to the cis. The problem is that there isn't a single natural definition of the distance from a point to a set of points, even when you have a definition of the distance between individual points. I adopt a very particular measure of such distances here; but it would be interesting to explore the alternative options in greater detail elsewhere. Suppose c is a credence function and C is a set of credence functions. Then D(c,C)=mincCD(c,c)+maxcCD(c,c)2 With this in hand, we can finally give our aggregation procedure:CAPD(c1,,cn)=arg mincPFi=1nD(c,ci) 
The first thing to note about CAP is that, unlike the original CAP, or CAP+, it automatically satisfies ACC.

Let's now see CAP in action.
  • Since CAP satisfies ACC, the aggregate for cA and cB is the same as the aggregate for cA and cB, which is just their straight linear pool.
  • Next, suppose we wish to aggregate Amira with a third agent, Cleo, who has a credence only in X1, which she assigns 0.5 -- that is, cC(X1)=0.5. Then F={X1,X2,X3}, and  cC={c:c(X1)=0.5,c(X2)0.5,c(X3)=1c(X1)c(X2)} So, CAPD(cA,cB)=arg mincPFD(c,cA)+D(c,cC)Working through the calculation for D=SED, we obtain the following aggregate: c(X1)=0.4, c(X2)=0.425, c(X3)=0.175.
  • One interesting feature of CAP is that, unlike CAP, we can apply it to individual agents. Thus, for instance, suppose we wish to take Cleo's single credence in X1 and 'fill in' her credences in X2 and X3. Then we can use CAP to do this. Her new credence function will be cC=CAPSED(cC)=arg mincPFD(c,cC) That is, cC(X1)=0.5, cC(X2)=0.25, cC(X3)=0.25. Rather unsurprisingly, cC is the midpoint of the line formed by the imprecise probabilities cC. Now, notice: the aggregate of Amira and Cleo given above is just the straight linear pool of Amira's credence function cA and Cleo's 'filled in' credence function cC. I would conjecture that this is generally true: filling in credences using CAPSED and then aggregating using straight linear pooling always agrees with aggregating using CAPSED. And perhaps this generalises beyond SED.

Comments

  1. Posted this on facebook then figured it might be more appropriate to post here.

    Interesting considerations. I wonder if there is anything useful to be said when there isn't a unique extension of the credences to the whole space but they merely impose a restriction on the remaining values on which they are undefined.

    For instance, I suspect if Alice assigns a probability to X_1, X_2 and X_3 (which form a partition) and Bob assigns P(X_1)=.99 this highly restricts his assignment to X_2 and X_3 but I suspect for an appropriate credence function for Alice we will find merely aggregating these credences on X_1 will yield a result that couldn't be achieved by aggregating any coherent extension of Bob's credences to the whole space with Alice's. However, since there is no unique extension one needs to do something more complex then merely use the unique extension. Perhaps consider some kind of average of all possible extensions of Bob's credence function with Alice's credence would meet this more demanding criteria.

    There might be an interesting theorem in here ... I'll have to think about it.

    ReplyDelete

Post a Comment