Aggregating abstaining experts
In a series of posts a few months ago (here, here, and here), I explored a particular method by which we might aggregate expert credences when those credences are incoherent. The result was this paper, which is now forthcoming in Synthese. The method in question was called the coherent approximation principle (CAP), and it was introduced by Daniel Osherson and Moshe Vardi in this 2006 paper. CAP is based on what we might call the principle of minimal mutilation. We begin with a collection of credence functions, , ..., , one for each expert, and some of which might be incoherent. What we want at the end is a single coherent credence function that is the aggregate of , ..., . The principle of minimal mutilation says that should be as close as possible to the s -- when aggregating a collection of credence functions, you should change them as little as possible to obtain your aggregate.
We can spell this out more precisely by introducing a divergence . We might think of this as a measure of how far one credence function lies from another. Thus, measures the distance from to . We call these measures divergences rather than distances or metrics, since they do not have the usual features that mathematicians assume of a metric: we assume , for any , and iff , but we do not assume that is symmetric nor that it satisfies the triangle inequality. In particular, we assume that is an additive Bregman divergence. The standard example of an additive Bregman divergence is squared Euclidean distance: if , are both defined on the set of propositions , then
In fact, is symmetric, but it does not satisfy the triangle inequality. The details of this family of divergences needn't detain us here (but see here and here for more). Indeed, we will simply use throughout. But a more general treatment would look at other additive Bregman divergences, and I hope to do this soon.
Now, suppose , ..., is a set of expert credence functions. And suppose is defined on the set of propositions . And suppose that is an additive Bregman divergence -- you might take it to be . Then how do we define the aggregate that is obtained from , ..., by a minimal mutilation? We let be the coherent credence function such that the sum of the distances from to the s is minimal. That is,
where is the set of coherent credence functions over .
As we see in my paper linked above, if each of the credence functions are defined over the same set of propositions -- that is, if , for all -- then:
In this post, I'm interested in cases where our agents have credences in different sets of propositions. For instance, the first agent has credences concerning the rainfall in Bristol tomorrow and the rainfall in Bath, but the second has credences concerning the rainfall in Bristol and the rainfall in Birmingham.
I want to begin by pointing to a shortcoming of CAP when it is applied to such cases. It fails to satisfy what we might think of as a basic desideratum of such procedures. To illustrate this desideratum, let's suppose that the three propositions , , and form a partition. And suppose that Amira has credences in , , and , while Benito has credences only in and . In particular:
with or with . After all, is in some sense implicit in . An agent with credence function is committed to the credences assigned by credence function .
However, CAP does not do this. As mentioned above, if you aggregate and using , then the result is their linear pool: . Thus, the aggregate credence in is ; in it is ; and in it is . The result is different if you aggregate and using : the aggregate credence in is ; in it is ; in it is .
Now, it is natural to think that the problem arises here because Amira's credences are getting too much say in how far a potential aggregate lies from the agents, since she has credences in three propositions, while Benito only has credences in two. And, sure enough, lies closer to than to and closer to than the aggregate of and lies. And it is equally natural to try to solve this potential bias in favour of the agent with more credences by normalising. That is, we might define a new version of CAP:
However, this doesn't help. Using this definition, the aggregate of Amira's credence function and Benito's extended credence function remains the same; but the aggregate of Amira's credence function and Benito's original credence function changes -- the aggregate credence in is ; in , it is ; in , it is . Again, the two ways of aggregating disagree.
So here is our desideratum in general:
Agreement with Coherent Commitments (ACC) Suppose , ..., are coherent credence functions, with defined on , for each . And let . Now suppose that, for each defined on , there is a unique coherent credence function defined on that extends -- that is, for all in . Then the aggregate of , ..., should be the same as the aggregate of , ..., .
CAP does not satisfy ACC. Is there a natural aggregation rule that does? Here's a suggestion. Suppose you wish to aggregate a set of credence functions , ..., , where is defined on , as above. Then we proceed as follows.
is that, unlike the original CAP, or CAP , it automatically satisfies ACC.
Let's now see CAP in action.
We can spell this out more precisely by introducing a divergence
Now, suppose
where
As we see in my paper linked above, if each of the credence functions are defined over the same set of propositions -- that is, if
- if
is squared Euclidean distance, then this aggregate is the straight linear pool of the original credences; if is defined on the partition , ..., , then the straight linear pool of , ..., is this: - if
is the generalized Kullback-Leibler divergence, then the aggregate is the straight geometric pool of the originals; if is defined on the partition , ..., , then the straight geometric pool of , ..., is this: where is a normalizing factor.
In this post, I'm interested in cases where our agents have credences in different sets of propositions. For instance, the first agent has credences concerning the rainfall in Bristol tomorrow and the rainfall in Bath, but the second has credences concerning the rainfall in Bristol and the rainfall in Birmingham.
I want to begin by pointing to a shortcoming of CAP when it is applied to such cases. It fails to satisfy what we might think of as a basic desideratum of such procedures. To illustrate this desideratum, let's suppose that the three propositions
- Amira's credence function is:
, , . - Benito's credence function is:
, .
- Benito's extended credence function is:
, , .
However, CAP does not do this. As mentioned above, if you aggregate
Now, it is natural to think that the problem arises here because Amira's credences are getting too much say in how far a potential aggregate lies from the agents, since she has credences in three propositions, while Benito only has credences in two. And, sure enough,
However, this doesn't help. Using this definition, the aggregate of Amira's credence function
So here is our desideratum in general:
Agreement with Coherent Commitments (ACC) Suppose
CAP does not satisfy ACC. Is there a natural aggregation rule that does? Here's a suggestion. Suppose you wish to aggregate a set of credence functions
- First, let
. - Second, for each
, let That is, while represents a precise credal state defined on , represents an imprecise credal state defined on . It is the set of coherent credence functions on that extend . That is, it is the set of coherent credence functions on that agree with on propositions in . Thus, if, like Benito, your coherent credences on uniquely determine your coherent credences on , then is just the singleton that contains that unique extension. But if your credences over do not uniquely determine your coherent credences over , then will contain more coherent credence functions. - Finally, we take the aggregate of
, ..., to be the credence function that minimizes the total distance from to the s. The problem is that there isn't a single natural definition of the distance from a point to a set of points, even when you have a definition of the distance between individual points. I adopt a very particular measure of such distances here; but it would be interesting to explore the alternative options in greater detail elsewhere. Suppose is a credence function and is a set of credence functions. Then With this in hand, we can finally give our aggregation procedure:
Let's now see CAP
- Since CAP
satisfies ACC, the aggregate for and is the same as the aggregate for and , which is just their straight linear pool. - Next, suppose we wish to aggregate Amira with a third agent, Cleo, who has a credence only in
, which she assigns -- that is, . Then , and So, Working through the calculation for , we obtain the following aggregate: , , . - One interesting feature of CAP
is that, unlike CAP, we can apply it to individual agents. Thus, for instance, suppose we wish to take Cleo's single credence in and 'fill in' her credences in and . Then we can use CAP to do this. Her new credence function will be That is, , , . Rather unsurprisingly, is the midpoint of the line formed by the imprecise probabilities . Now, notice: the aggregate of Amira and Cleo given above is just the straight linear pool of Amira's credence function and Cleo's 'filled in' credence function . I would conjecture that this is generally true: filling in credences using CAP and then aggregating using straight linear pooling always agrees with aggregating using CAP . And perhaps this generalises beyond SED.
Posted this on facebook then figured it might be more appropriate to post here.
ReplyDeleteInteresting considerations. I wonder if there is anything useful to be said when there isn't a unique extension of the credences to the whole space but they merely impose a restriction on the remaining values on which they are undefined.
For instance, I suspect if Alice assigns a probability to X_1, X_2 and X_3 (which form a partition) and Bob assigns P(X_1)=.99 this highly restricts his assignment to X_2 and X_3 but I suspect for an appropriate credence function for Alice we will find merely aggregating these credences on X_1 will yield a result that couldn't be achieved by aggregating any coherent extension of Bob's credences to the whole space with Alice's. However, since there is no unique extension one needs to do something more complex then merely use the unique extension. Perhaps consider some kind of average of all possible extensions of Bob's credence function with Alice's credence would meet this more demanding criteria.
There might be an interesting theorem in here ... I'll have to think about it.