Thursday, 26 July 2018

Dutch Strategy Theorems for Conditionalization and Superconditionalization

I've just signed a contract with Cambridge University Press to write a book on the Dutch Book Argument for their Elements in Decision Theory and Philosophy series. So over the next few months, I'm going to be posting some bits and pieces as I get properly immersed in the literature.

-----

Many Bayesians formulate the update norm of Bayesian epistemology as follows:

Bayesian Conditionalization  If
(i) your credence function at $t$ is $c : \mathcal{F} \rightarrow [0, 1]$,
(ii) your credence function at a later time $t'$ is $c' : \mathcal{F} \rightarrow [0, 1]$,
(iii) $E$ is the strongest evidence you acquire between $t$ and $t'$,
(iv) $E$ is in $\mathcal{F}$,
then rationality requires that, if $c(E) > 0$, then for all $X$ in $\mathcal{F}$, $$c'(X) = c(X|E) = \frac{c(XE)}{c(E)}$$

I don't. One reason you might fail to conditionalize between $t$ and $t'$ is that you re-evaluate the options between those times. You might disavow the prior that you had at the earlier time, perhaps decide it was too biased in one way or another, or not biased enough; perhaps you come to think that it doesn't give enough consideration to the explanatory power one hypothesis would have were it true, or gives too much consideration to the adhocness of another hypothesis; and so on. Now, it isn't irrational to change your mind. So surely it can't be irrational to fail to conditionalize as a result of changing your mind in this way. On this, I agree with van Fraassen.

Instead, I prefer to formulate the update norm as follows -- I borrow the name from Kenny Easwaran:

Plan Conditionalization If
(i) your credence function at $t$ is $c: \mathcal{F} \rightarrow [0, 1]$,
(ii) between $t$ and $t'$ you will receive evidence from the partition $\{E_1, \ldots, E_n\}$,
(iii) each $E_i$ is in $\mathcal{F}$
(iv) at $t$, your updating plan is $c'$, so that $c'_i : \mathcal{F} \rightarrow [0, 1]$ is the credence function you will adopt if $E_i$,
then rationality requires that, if $c(E_i) > 0$, then for all $X$ in $\mathcal{F}$, $$c'_i(X) = c(X | E_i)$$

I want to do two things in this post. First, I'll offer what I think is a new proof of the Dutch Strategy or Diachronic Dutch Book Theorem that justifies Plan Conditionalization (I haven't come across it elsewhere, though Ray Briggs and I used the trick at the heart of it for our accuracy dominance theorem in this paper). Second, I'll explore how that might help us justify other norms of updating that concern situations in which you don't come to learn any proposition with certainty. We will see that we can use the proof I give to justify the following standard constraint on updating rules: Suppose the evidence I receive between $t$ and $t'$ is not captured by any of the propositions to which I assign a credence -- that is, there is no proposition $e$ to which I assign a credence that is true at all and only the worlds at which I receive the evidence I actually receive between $t$ and $t'$. As a result, there is no proposition $e$ that I learn with certainty as a result of receiving that evidence. Nonetheless, I should update my credence function from $c$ to $c'$ in such a way that it is possible to extend my earlier credence function $c$ to a credence function $c^*$ so that: (i) $c^*$ does assign a credence to $e$, and (ii) my later credence $c'(X)$ in a proposition $X$ is the credence that this extended credence function $c^*$ assigns to $X$ conditional on me receiving evidence $e$ -- that is, $c'(X) = c^*(X | e)$. That is, I should update as if I had assigned a credence to $e$ at the earlier time and then updated by conditionalizing on it.

Here's the Dutch Strategy or Diachronic Dutch Book Theorem for Plan Conditionalization:

Definition (Conditionalizing pair) Suppose $c$ is a credence function and $c'$ is an updating rule defined on $\{E_1, \ldots, E_n\}$. We say that $(c, c')$ is a conditionalizing pair if, whenever $c(E_i) > 0$, then for all $X$, $c'_i(X) = c(X | E_i)$.

Dutch Strategy Theorem Suppose $(c, c')$ is not a conditionalizing pair. Then
(i) there are two acts $A$ and $B$ such that $c$ prefers $A$ to $B$, and
(ii) for each $E_i$, there are two acts $A_i$ and $B_i$ such that $c'_i$ prefers $A_i$ to $B_i$,
and, for each $E_i$, $A + A_i$ has greater utility than $B + B_i$ at all worlds at which $E_i$ is true.

We'll now give the proof of this.

First, we describe a way of representing pairs $(c, c')$. Both $c$ and each $c'_i$ are defined on the same set $\mathcal{F} = \{X_1, \ldots, X_m\}$. So we can represent $c$ by the vector $(c(X_1), \ldots, c(X_m))$ in $[0, 1]^m$, and we can represent each $c'_i$ by the vector $(c'_i(X_1), \ldots, c'_i(X_m))$ in $[0, 1]^m$. And we can represent $(c, c')$ by concatenating all of these representations to give:
$$(c, c') = c \frown c'_1 \frown c'_2 \frown \ldots \frown c'_n$$
which is a vector in $[0, 1]^{m(n+1)}$.

Second, we use this representation to give an alternative characterization of conditionalizing pairs. First, three pieces of notation:
• Let $W$ be the set of all possible worlds.
• For any $w$ in $W$, abuse notation and write $w$ also for the credence function on $\mathcal{F}$ such that $w(X) = 1$ if $X$ is true at $w$, and $w(X) = 0$ if $X$ is false at $w$.
• For any $w$ in $W$, let $$(c, c')_w = w \frown c'_1 \frown \ldots \frown c'_{i-1} \frown w \frown c'_{i+1} \frown \ldots \frown c'_n$$ where $E_i$ is the element of the partition that is true at $w$.
Lemma 1 If $(c, c')$ is not a conditionalizing pair, then $(c, c')$ is not in the convex hull of $\{(c, c')_w : w \in W\}$, which we write $\{(c, c')_w : w \in W\}^+$.

Proof of Lemma 1. If $(c, c')$ is in $\{(c, c')_w : w \in W\}^+$, then there are $\lambda_w \geq 0$ such that

(1) $\sum_{w \in W} \lambda_w = 1$,
(2) $c(X) = \sum_{w \in W} \lambda_w w(X)$
(3) $c'_i(X) = \sum_{w \in E_i} \lambda_w w(X) + \sum_{w \not \in E_i} \lambda_w c'_i(X)$.

By (2), we have $\lambda_w = c(w)$. So by (3), we have $$c'_i(X) = c(XE_i) + (1-c(E_i))c'_i(X)$$ So, if $c(E_i) > 0$, then $c'_i(X) = c(X | E_i)$.

Third, we use this alternative characterization of conditionalizing pairs to specify the acts in question. Suppose $(c, c')$ is not a conditionalizing pair. Then $(c, c')$ is outside $\{(c, c')_w : w \in W\}^+$. Now, let $(p, p')$ be the orthogonal projection of $(c, c')$ into $\{(c, c')_w : w \in W\}^+$. Then let $(S, S') = (c, c') - (p, p')$. That is, $S = c - p$ and $S'_i = c'_i - p'_i$. Now pick $w$ in $W$. Then the angle between $(S, S')$ and $(c, c')_w - (c, c')$ is obtuse and thus
$$(S, S') \cdot ((c, c')_w - (c, c')) = -\varepsilon_w < 0$$

Thus, define the acts $A$, $B$, $A'_i$ and $B'_i$ as follows:
• The utility of $A$ at $w$ is $S \cdot (w - c) + \frac{1}{3}\varepsilon_w$:
• The utility of $B$ at $w$ is 0;
• The utility of $A'_i$ at $w$ is $S'_i \cdot (w - c'_i) + \frac{1}{3}\varepsilon_w$;
• The utility of $B'_i$ at $w$ is 0.
Then the expected utility of $A$ by the lights of $c$ is $\sum^w c(w)\frac{1}{3}\varepsilon_w > 0$, while the expected utility of $B$ is 0, so $c$ prefers $A$ to $B$. And the expected utility of $A'_i$ by the lights of $c'_i$ is $\sum_w c'_i(w)\frac{1}{3}\varepsilon_w > 0$, while the expected utility of $B'_i$ is 0, so $c'_i$ prefers $A'_i$ to $B'_i$. But the utility of $A + A'_i$ at $w$ is
$$S \cdot (w - c) + S'_i \cdot (w - c'_i) + \frac{2}{3}\varepsilon_w = (S, S') \cdot ((c, c')_w - (c, c')) + \frac{2}{3}\varepsilon_w = - \frac{1}{3}\varepsilon_w < 0$$
where $E_i$ is true at $w$. While the utility of $B + B'_i$ at $w$ is 0.

This completes our proof. $\Box$

You might be forgiven for wondering why we are bothering to give an alternative proof for a theorem that is already well-known. David Lewis proved the Dutch Strategy Theorem in a handout for a seminar at Princeton in 1972, Paul Teller then reproduced it (with full permission and acknowledgment) in a paper in 1973, and Lewis finally published his handout in 1997 in his collected works. Why offer a new proof?

It turns out that this style of proof is actually a little more powerful. To see why, it's worth comparing it to an alternative proof of the Dutch Book Theorem for Probabilism, which I described in this post (it's not original to me, though I'm afraid I can't remember where I first saw it!). In the standard Dutch Book Theorem for Probabilism, we work through each of the axioms of the probability calculus, and say how you would Dutch Book an agent who violates it. The axioms are: Normalization, which says that $c(\top) = 1$ and $c(\bot) = 0$; and Additivity, which says that $c(A \vee B) = c(A) + c(B) - c(AB)$. But consider an agent with credences only in the propositions $\top$, $A$, and $A\ \&\ B$.  Her credences are: $c(\top) = 1$, $c(A) = 0.4$, $c(A\ \&\ B) = 0.7$. Then there is no axiom of the probability calculus that she violates. And thus the standard proof of the Dutch Book Theorem is no help in identifying any Dutch Book against her. Yet she is Dutch Bookable. And she violates a more expansive formulation of Probabilism that says, not only are you irrational if your credence function is not a probability function, but also if your credence function cannot be extended to a probability function. So the standard proof of the Dutch Book Theorem can't establish this more expansive version. But the alternative proof I mentioned above can.

Now, something similar is true of the alternative proof of the Dutch Strategy Theorem that I offered above (I happened upon this while discussing Superconditionalizing with Jason Konek, who uses similar techniques in his argument for J-Kon, the alternative to Jeffrey's Probability Kinematics that he proposes in his paper, 'The Art of Learning', which was runner-up for last year's Sander's Prize in Epistemology). In Lewis' proof of that theorem: First, if you violate Plan Conditionalization, there must be $E_i$ and $X$ such that $c(E_i) > 0$ and $c'_i(X) \neq c(X|E_i)$. Then you place bets on $XE_i$, $\overline{E_i}$ at the earlier time $t$, and a bet on $X$ at $t'$. These bets then together lose you money in any world at which $E_i$ is true. Now, it might seem that you must have the required credences to make those bets just in virtue of violating Plan Conditionalization. But imagine the following is true of you: between $t$ and $t'$, you'll obtain evidence from the partition $\{E_1, \ldots, E_n\}$. And, at $t'$, you'll update on this evidence using the rule $c'$. That is, if $E_i$, then you'll adopt the new credence function $c'_i$ at time $t'$. Now, you don't assign credences to the propositions in $\{E_1, \ldots, E_n\}$. Perhaps this is because you don't have the conceptual resources to formulate these propositions. So while you will update using the rule $c'$, this is not a rule you consciously or explicitly adopt, since to state it would require you to use the propositions in $\{E_1, \ldots, E_n\}$. So it's more like you have a disposition to update in this way. Now, how might we state Plan Conditionalization for such an agent? We can't demand that $c'_i(X) = c(X|E_i)$, since $c(X | E_i)$ is not defined. Rather, we demand that there is some extension $c^*$ of $c$ to a set of propositions that does include each $E_i$ such that $c'_i(X) = c^*(X | E_i)$. Thus, we have:

Plan Superconditionalization If
(i) your credence function at $t$ is $c : \mathcal{F} \rightarrow [0, 1]$,
(ii) between $t$ and $t'$ you will receive evidence from the partition $\{E_1, \ldots, E_n\}$,
(iii) at $t$, your updating plan is $c'$, so that $c'_i : \mathcal{F} \rightarrow [0, 1]$ is the credence function you plan to adopt if $E_i$,
then rationality requires that there is some extension $c^*$ of $c$ for which, if $c^*(E_i) > 0$, then for all $X$, $$c'_i(X) = c^*(X | E_i)$$

And it turns out that we can adapt the proof above for this purpose. Say that $(c, c')$ is a superconditionalizing pair if there is an extension $c^*$ of $c$ such that, if $c^*(E_i) > 0$, then for all $X$, $c'_i(X) = c^*(X | E_i)$. Then we can prove that if $(c, c')$ is not a superconditionalizing pair, then $(c, c')$ is not in $\{(c, c')_w : w \in W\}^+$. Here's the proof from above adapted to our case: If $(c, c')$ is in $\{(c, c')_w : w \in W\}^+$, then there are $\lambda_w \geq 0$ such that

(1) $\sum_{w \in W} \lambda_w = 1$,
(2) $c(X) = \sum_{w \in W} \lambda_w w(X)$
(3) $c'_i(X) = \sum_{w \in E_i} \lambda_w w(X) + \sum_{w \not \in E_i} \lambda_w c'_i(X)$.

Define the following extension $c^*$ of $c$: $c^*(w) = \lambda_w$. Then, by (3), we have $$c'_i(X) = c^*(XE_i) + (1-c^*(E_i))c'_i(X)$$ So, if $c^*(E_i) > 0$, then $c'_i(X) = c^*(X | E_i)$, as required. $\Box$

Now, this is a reasonably powerful version of conditionalization. For instance, as Skyrms showed here, if we make one or two further assumptions on the extension of $c$ to $c^*$, we can derive Richard Jeffrey's Probability Kinematics from Plan Superconditionalization. That is, if the evidence $E_i$ will lead you to set your new credences across the partition $\{B_1, \ldots, B_k\}$ to $q_1, \ldots, q_k$, respectively, so that $c'_i(B_j) = q_j$, then your new credence $c'_i(X)$ must be $\sum^k_{j=1} c(X | B_j)q_j$, as Probability Kinematics demands. Thus, Plan Superconditionalization places a powerful constraint on updating rules for situations in which the proposition stating your evidence is not one to which you assign a credence. Other cases of this sort include the Judy Benjamin problem and the many cases in which MaxEnt is applied.