## Sunday, 9 June 2013

### Accuracy-based arguments for the Principal Principle

So far in this series of blog posts, we've seen accuracy-based arguments for Probabilism and Conditionalization.  In this post and the next, I'd like to consider arguments for different sorts of norms, which we might call deference norms.  The most prominent deference norms are David Lewis' Principal Principle, which says that we should defer to the objective chances when we set our credences, and Bas van Fraassen's Reflection Principle, which says that we should defer to our future credences when we set our current ones.  As we will see, though, some of the accuracy-based arguments that purport to establish these deference norms also give putative justifications for Probabilism and Conditionalization as well.  This brings us full circle.  In this post, I will consider norms that demand deference to chance, such as the Principal Principle.  In the next post, I will consider norms that demand deference to future selves, such as the Reflection Principle.

Recall the structure of the accuracy-based argument for Probabilism:
1. The cognitive value of a credence function is given by its proximity to the ideal credence function.  (a)  The ideal credence function at world $w$ is $v_w$. (b) Distance is measured by a Bregman divergence $D$.  Thus, the cognitive value of a credence function $c$ at a world $w$ is $-D(v_w, c)$.
2. Dominance
3. Theorem 1
4. Therefore, Probabilism
There are two different ways to argue for the deference norms.  On the first, we alter (1a).  That is, we give a different account of the ideal credence function:  we say it is the true objective chance function; or we say that it is one's future credence function.  On the second, we alter (2).  That is, we retain proximity to the omniscient credences as the correct measure of cognitive value.  But we replace the Dominance norm with a different norm, just as we replaced it with the Maximize Subjective Expected Utility norm in the argument for Conditionalization.

#### Proximity to chance and the Principal Principle

Let's begin with the first way of arguing for the Principal Principle.

First, some terminology to help us state the principle:
• Given a possible world $w$, the ur-chance function at $w$ (written $ch_w$) is the probability function upon which one conditionalizes with the history of $w$ up to a time in order to obtain the chances at $w$ at that time.
• Given a probability function $ch$, let $C_{ch}$ be the proposition The ur-chances are given by $ch$.  That is, $C_{ch}$ is the proposition that is true at all worlds at which the ur-chance function is $ch$.
• Let $\mathcal{C}$ be the set of possible ur-chance functions.
Throughout, we will assume that $\mathcal{C}$ is finite.  That is, there are only finitely many possible ur-chance functions about which an agent has an opinion.  Thus, only finitely many propositions of the form $C_{ch}$ occur in the algebra on which an agent's credence function is defined. We also assume that every ur-chance function is a probability function.

Then the Principal Principle says the following (Lewis, 1980):

Lewis' Principal Principle (LPP)  At the beginning of her epistemic life, an agent ought to have a credence function $b_0$ such that, for all propositions $X$ and ur-chance functions $ch$,
$b_0(X | C_{ch}) = ch(X)$
(Note that it follows from this version that it ought to be that $b_0(X | C_{ch} \wedge E) = ch(X | E)$.  This removes the need for the usual admissibility clause.)

It will be useful to state another norm, which follows from Lewis' Principal Principle, but which is not equivalent to it.  It is due to Jenann Ismael (Ismael 2008):

Ismael's Principal Principle (IPP)  At the beginning of her epistemic life, an agent ought to have a credence function $b_0$ such that, for all propositions $X$,
$b_0(X) = \sum_{ch \in \mathcal{C}} c(C_{ch})ch(X)$
According to IPP, an agent's initial credence in a proposition ought to be her expectation of its ur-chance.

How might we argue for these norms?  As noted above, our plan in this section is to adapt the accuracy-based argument for Probabilism by changing premise (1a):  that is, we will change what counts as the ideal credence function at a world.  According to (1a), the ideal credence function at $w$ is the omniscient credence function at $w$, i.e., $v_w$.  Call this Joyce's thesis.  According to (1a'), our new premise, the ideal credence function at $w$ is the ur-chance function at $w$, i.e., $ch_w$.  Call this Hájek's thesis, since it is defended by Alan Hájek (Hájek, ms), who claims that

Chance : Credences :: Truth : Full belief

That is, just as full beliefs aims at the truth, credences aim at the objective chances.

How does this help?  Well, if the ideal credence function at $w$ is $ch_w$, then the cognitive value of credence function $c$ at $w$ is its proximity to $ch_w$.  That is, it is $-D(ch_w, c)$.  But recall the following lemmas about Bregman divergences:

Lemma 1  Suppose $\mathcal{X} \subseteq \mathbb{R}^n$ is convex.  Then if $x \not \in \mathcal{X}$, there is $x^* \in \mathcal{X}$ such that $D(y, x^*) < D(y, x)$ for all $y \in \mathcal{X}$.

Lemma 2  Suppose $\mathcal{X} \subseteq \mathbb{R}^n$.  Then, if $x, y \in \mathcal{X}^+$ and $x \neq y$, there is $z \in \mathcal{X}$ such that $D(z, x) < D(z, y)$.

Now let $\mathcal{C}^+$ be the convex hull of $\mathcal{C}$.  That is, $\mathcal{C}^+$ contains all the linear combinations of possible ur-chance functions.  Then the preceding lemmas entail that, if a credence function $c$ lies outside $\mathcal{C}^+$, there is a credence function $c'$ that lies inside that is closer to each $ch_w$ than $c$ is (when distance is measured by the Bregman divergence $D$).  And moreover, if $c$ lies inside $\mathcal{C}^+$, there is no $c'$ that is closer to each $ch_w$ than $c$.  Thus, if cognitive value is measured by $-D$, the non-dominated credence functions are precisely those that lie in $\mathcal{C}^+$.  But it turns out that, on a natural assumption about ur-chance functions, $\mathcal{C}^+$ is precisely the set of credence functions that satisfies Ismael's version of the Principal Principle.  The natural assumption is this:  each ur-chance function expects itself to give the ur-chances; that is, the probability that an ur-chance function assigns to a proposition is always equal to what that ur-chance function expects the chance to be.  (This is proved in (Pettigrew, 2012).)  Thus, we have the following argument for IPP (Pettigrew, 2012):
1. The cognitive value of a credence function is given by its proximity to the ideal credence function.  (a') (Hájek's thesis) The ideal credence function at world $w$ is $ch_w$. (b) Distance is measured by a Bregman divergence $D$.  Thus, the cognitive value of a credence function $c$ at a world $w$ is $-D(ch_w, c)$.
2. Dominance
3. Lemmas 1 and 2
4. Therefore, Ismael's Principal Principle
Thus, according to this argument, what is wrong with an agent whose credence in a proposition differs from her expectation of its chance is that there is another credence function that is closer to each of the possible ur-chance functions than hers is.  This is our first accuracy-based argument for the Principal Principle.

How does it relate to Lewis' Principal Principle?  It turns out that, under certain assumptions, Lewis' Principal Principle does follow from Ismael's.  To state these assumptions, we need a little terminology, which we borrow from Lewis.  We say an ur-chance function $ch$ is self-undermining if $ch(C_{ch}) < 1$.  Thus, a self-undermining chance function is less than certain that it gives the ur-chances.  Then we have the following:

Proposition 1  Suppose no $ch_w$ is self-undermining:  that is, $ch(C_{ch}) = 1$, for all $ch$ in $\mathcal{C}$.  Then $c$ satisfies Ismael's Principal Principle iff $c$ satisfies Lewis' Principal Principle.

Proof. This is proved in (Pettigrew, 2012).

Thus, in the absence of self-undermining chance functions, Lewis' and Ismael's versions of the Principal Principle are equivalent, and the accuracy-based argument supports both.  In the presence of self-undermining chance functions, this argument supports only Ismael's version.  That's as it should be, since Lewis pointed out the serious problems that his version of the Principal Principle faces in the presence of self-undermining ur-chance functions.

#### Objective expectations and the Principal Principle

In the previous section, we adapted the accuracy-based argument for Probabilism by replacing Joyce's thesis (1a) with Hájek's thesis (1a'):  that is, we claimed that matching the objective chances is the goal of credence, rather than matching the omniscient credences.  In this section, we return to the claim that matching the omniscient credences is the goal of credences.  And instead we replace premise (2)--the Dominance norm--with (2')--an alternative norm of decision theory.  Here is the alternative norm of decision theory:

Chance Dominance  Suppose $\mathcal{O}$ is a set of options, $\mathcal{W}$ is a set of possible worlds, and $U$ is a measure of the value of the options in $\mathcal{O}$ at the worlds in $\mathcal{W}$.  Suppose $o, o'$ in $\mathcal{O}$.  Then we say that
• $o$ chance dominates $o'$ relative to $U$ if, for all ur-chance functions $ch$ in $\mathcal{C}$, $\sum_{w \in \mathcal{W}} ch(w) U(o', w) < \sum_{w \in \mathcal{W}} ch(w) U(o, w)$
Now suppose $o, o'$ in $\mathcal{O}$ and
1. $o$ chance dominates $o'$ relative to $U$;
2. There is no $o''$ in $\mathcal{O}$ that chance dominates $o$ relative to $U$.
Then $o'$ is irrational.

Thus, Chance Dominance rules out an option as irrational if there is another option that every ur-chance function expects to be better.

We will now argue for Ismael's Principal Principle by showing that, if a credence function $c$ violates it then there is another $c'$ that satisfies it such that every ur-chance function expects $c'$ to be more accurate than it expects $c$ to be.  Together with premise (1) and Chance Dominance, this gives an alternative accuracy-based argument for Ismael's Principal Principle.  When there are no self-undermining ur-chance functions, it thereby gives an argument for Lewis' Principal Principle as well.  Here's the important mathematical theorem:

Lemma 3  Suppose $D$ is a Bregman divergence.  Suppose $p$, $c$, $c'$ are credence functions.  And suppose $p$ is a probability function.  Then
$D(p, c) < D(p, c')$
iff
$\sum_{w \in \mathcal{W}} p(w) D(v_w, c) < \sum_{w \in \mathcal{W}} p(w) D(v_w, c')$
Proof. This is proved in the Appendix to (Pettigrew, 2013).

That is, the further a credence function lies from $p$ the more inaccurate $p$ expects it to be.  Now, from the previous section, we know that, if $c$ lies outside $\mathcal{C}^+$, there is $c'$ in $\mathcal{C}^+$ such that $c'$ is closer to each $ch$ in $\mathcal{C}$ than $c$ is.  But then, by Lemma 3, we have that each $ch$ in $\mathcal{C}$ expects $c'$ to be more accurate than it expects $c$ to be.  That is, $c'$ chance dominates $c$ with respect to accuracy.  This gives us the mathematical result we need to bridge the gap between our measure of cognitive value (namely, $-D$) and our decision-theoretic norm (namely, Chance Dominance) on the one hand, and our epistemic norm (namely, Ismael's Principal Principle) on the other.  Thus, we have the following argument for IPP (Pettigrew, 2013):
1. The cognitive value of a credence function is given by its proximity to the ideal credence function.  (a) (Joyce's thesis) The ideal credence function at world $w$ is $v_w$. (b) Distance is measured by a Bregman divergence $D$.  Thus, the cognitive value of a credence function $c$ at a world $w$ is $-D(v_w, c)$.
2. Chance Dominance
3. Lemma 3
4. Therefore, Ismael's Principal Principle.
In the next post, I will consider analogous arguments for the Reflection Principle.