Sunday, 12 July 2020

Hurwicz's Criterion of Realism and decision-making under massive uncertainty

For a PDF version of this post, click here.

[UPDATE: After posting this, Johan Gustafsson got in touch and it seems he and I have happened upon similar points via slightly different routes. His paper is here. He takes his axioms from Binmore's Rational Decisions, who took them from Milnor's 'Games against Nature'. Hurwicz and Arrow also cite Milnor, but Hurwicz's original characterisation appeared before Milnor's paper, and he cites Chernoff's Cowles Commission Discussion Paper: Statistics No. 326A as the source of his axioms.]

In 1951, Leonid Hurwicz, a Polish-American economist who would go on to share the Nobel prize for his work on mechanism design, published a series of short notes as part of the Cowles Commission Discussion Paper series, where he introduced a new decision rule for choice in the face of massive uncertainty. The situations that interested him were those in which your evidence is so sparse that it does not allow you to assign probabilities to the different possible states of the world. These situations, he thought, fall outside the remit of Savage's expected utility theory.

The rule he proposed is called Hurwicz's Criterion of Realism or just the Hurwicz Criterion. He introduced it in the form in which it is usually stated in February 1951 in the Cowles Commission Discussion Paper: Statistics No. 356 -- the title was 'A Class of Criteria for Decision-Making under Ignorance'. The Hurwicz Criterion says that you should choose an option that maximises what I'll call its Hurwicz score, which is a particular weighted average of its best-case utility and its worst-case utility. A little more formally: We follow Hurwicz and let an option be a function $a$ from a set $W$ of possible states of the world to the real numbers $\mathbb{R}$. Now, you begin by setting the weight $0 \leq \alpha \leq 1$ you wish to assign to the best-case utility of an option, and then you assign the remaining weight $1-\alpha$ to its worst-case. Then the Hurwicz score of option $a$ is just $$H^\alpha(a) := \alpha \max_{w \in W} a(w) + (1-\alpha) \min_{w \in W} a(w)$$

However, reading his other notes in the Cowles series that surround this brief three-page note, it's clear that Hurwicz's chief interest was not so much in this particular form of decision rule, but rather with any such rule that determines the optimal choices solely by looking at their best- and worst-case scenarios. The Hurwicz Criterion is one such rule, but there are others. You might, for instance, weight the best- and worst-cases not by fixed constant coefficients, but by coefficients that change with the minimum and maximum values, or change with the difference between them or with their ratio. One of the most interesting contributions of these papers that surround the one in which Hurwicz gives us his Criterion is a characterization of rules that depend only on best- and worst-case utilities. Hurwicz gave rather an inelegent initial version of that characterization in Cowles Commission Discussion Paper: Statistics No. 370, published at the end of 1951 -- the title was 'Optimality Criteria for Decision-Making under Ignorance'. Kenneth Arrow then seems to have helped clean it up, and they published the new version together in the Appendix of their edited volume, in which they contributed most of the chapters, often with co-authors, Studies in Resource Allocation. The version with Arrow is still reasonably involved, but the idea is quite straightforward, and it is remarkable how strong a restriction Hurwicz obtains from seemingly weak and plausible axioms. This really seems to me a case where axioms that seem quite innocuous on their own can combine in interesting ways to make trouble. So I thought it might be interesting to give a simplified version that has all the central ideas.

Here's the framework:

Possibilities and possible worlds. Let $\Omega$ be the set of possibilities. A possible world is a set of possibilities--that is, a subset of $\Omega$. And a set $W$ of possible worlds is a partition of $\Omega$. That is, $W$ presents the possibilities at $\Omega$ at a certain level of grain. So if $\Omega = \{\omega_1, \omega_2, \omega_3\}$, then $\{\{\omega_1\}, \{\omega_2\}, \{\omega_3\}\}$ is the most fine-grained set of possible worlds, but there are coarser-grained sets as well, such as $\{\{\omega_1, \omega_2\}, \{\omega_3\}\}$ or $\{\{\omega_1\}, \{\omega_2,  \omega_3\}\}$. (This is not quite how Hurwicz understands the relationship between different sets of possible states of the world -- he talks of deleting worlds rather than clumping them together, but I think this formalization better captures his idea.)

Options. For any set $W$ of possible worlds, an option defined on $W$ is simply a function from $W$ into the real numbers $\mathbb{R}$. So an option $a : W \rightarrow \mathbb{R}$ takes each world $w$ in $W$ and assigns a utility $a(w)$ to it. (Hurwicz refers to von Neumann and Morgenstern to motivate the assumption that utilities can be measured by real numbers.)

Preferences. For any set $W$ of possible worlds, there is a preference relation $\preceq_W$ over the options defined on $W$. (Hurwicz states his result in terms of optimal choices rather than preferences. But I think it's a bit easier to see what's going on if we state it in terms of preferences. There's then a further question as to which options are optimal given a particular preference ordering, but we needn't address that here.)

Hurwicz's goal was to lay down conditions on these preference relations such that the following would hold:

Hurwicz's Rule Suppose $a$ and $a'$ are options defined on $W$. Then

(H1) If
  • $\min_w a(w) = \min_w a'(w)$
  • $\max_w a(w) = \max_w a'(w)$
then $a \sim_W a'$. That is, you should be indifferent between any two options with the same maximum and minimum.

(H2) If
  • $\min_w a(w) < \min_w a'(w)$
  • $\max_w a(w) < \max_w a'(w)$
then $a \prec_W a'$. That is, you should prefer one option to another if the worst case of the first is better than the worst case of the second and the best case of the first is better than the best case of the second.

Here are the four conditions or axioms:

(A1) Structure $\preceq_W$ is reflexive and transitive.

(A2) Weak Dominance
  1. If $a(w) \leq a'(w)$ for all $w$ in $W$, then $a \preceq_W a'$.
  2. If $a(w) < a'(w)$ for all $w$ in $W$, then $a \prec_W a'$.
This is a reasonably weak version of a standard norm on preferences.

(A3) Permutation Invariance For any set of worlds $W$ and any options $a, a'$ defined on $W$, if $\pi : W \cong W$ is a permutation of the worlds in $W$ and if $a'(w) = a(\pi(w))$ for all $w$ in $W$, then $a \sim_W a'$.

This just says that it doesn't matter to you which worlds receive which utilities -- all that matters are the utilities received.

(A4) Coarse-Graining Invariance Suppose $W = \{\ldots, w_1, w_2, \ldots\}$ is a set of possible worlds and suppose $a, a'$ are options on $W$ with $a(w_1) = a(w_2)$ and $a'(w_1) = a'(w_2)$. Then let $W' = \{\ldots, w_1 \cup w_2, \ldots\}$, so that $W'$ has the same worlds as $W$ except that, instead of $w_1$ and $w_2$, it has their union. And define options $b$ and $b'$ on $W'$ as follows: $b(w_1 \cup w_2) = a(w_1) = a(w_2)$ and $b'(w_1 \cup w_2) = a'(w_1) = a'(w_2)$, and $b(w) = a(w)$ and $b'(w) = a'(w)$ for all other worlds. Then $a \sim_W a'$ iff $b \sim_W b'$.

This says that if two options don't distinguish between two worlds, it shouldn't matter to you whether they are defined on a fine- or coarse-grained space of possible worlds.

Then we have the following theorem:

Theorem (Hurwicz) (A1) + (A2) + (A3) + (A4) $\Rightarrow$ (H1) + (H2).

Here's the proof. Assume (A1) + (A2) + (A3) + (A4). First, we'll show that (H1) follows. We'll sketch the proof only for the case in which $W = \{w_1, w_2, w_3\}$, since that gives all the crucial moves. So denote an act on $W$ by a triple $(a(w_1), a(w_2), a(w_3))$. Now, suppose that $a$ and $a'$ are options defined on $W$ with the same minimum, $m$, and maximum, $M$. Let $n$ be the middle value of $a$ and $n'$ the middle value of $a'$.

Now, first note that
$$(m, m, M) \sim_W (m, M, M)$$ After all, $(m, m, M) \sim_W (M, m, m)$ by Permutation Invariance. And, by Coarse-Graining Invariance, $(m, M, M) \sim_W (M, m, m)$ iff $(m, M) \sim_{W'} (M, m)$, where $W' = \{w_1, w_2 \cup w_3\}$. And, by Permutation Invariance and the reflexivity of $\sim_{W'}$, $(m, M) \sim_{W'} (M, m)$. So $(m, M, M) \sim_W (M, m, m) \sim_W (m, m, M)$, as required. And now we have, by previous results, Permutation Invariance, and Weak Dominance:
$$a \sim_W (m, n, M) \preceq_W (m, M, M) \sim_W (m, m, M) \preceq_W (m, n', M) \sim_W a'$$
and
$$a' \sim_W (m, n', M) \preceq_W (m, M, M) \sim_W (m, m, M) \preceq_W (m, n, M) \sim_W a$$
And so, by transitivity, $a \sim_W a'$. That gives (H1).

For (H2), suppose $a$ has worst case $m$, middle case $n$, and best-case $M$, while $a'$ has worst case $m'$, middle case $n'$, and worst case $M'$. And suppose $m < m'$ and $M < M'$. Then$$a \sim_W (m, n, M) \preceq_W (m, M, M) \sim_W (m, m, M) \prec_W (m', n', M') \sim_W a'$$as required. $\Box$

In a follow-up blog post, I'd like to explore Hurwicz's conditions (A1-4) in more detail. I'm a fan of his approach, not least because I want to use something like his decision rule within the framework of accuracy-first epistemology to understand how we select our first credences -- our ur-priors or superbaby credences (see here). But I now think Hurwicz's focus on only the worst-case and best-case scenarios is too restrictive. So I have to grapple with the theorem I've just presented. That's what I hope to do in the next post. But here's a quick observation. (A1-4), while plausible at first sight, sail very close to inconsistency. For instance, (A1), (A3), and (A4) are inconsistent when combined with a slight strengthening of (A2). Suppose we add the following to (A2) to give (A2$^\star$):

3. If $a(w) \leq a'(w)$ for all $w$ in $W$ and $a(w) < a'(w)$ for some $w$ in $W$, then $a \prec_W a'$.

Then we have know from above that $(m, m, M) \sim_W (m, M, M)$, but (A2$^\star$) entails that $(m, m, M) \prec_W (m, M, M)$, which gives a contradiction.

Monday, 6 July 2020

Update on updating -- or: a fall from favour

For a PDF version of this post, click here.

Life comes at you fast. Last week, I wrote a blogpost extolling the virtues of the following scoring rule, which I called the enhanced log rule: $$\mathfrak{l}^\star_1(x) = -\log x + x \ \ \ \ \ \mbox{and}\ \ \ \ \ \ \ \mathfrak{l}^\star_0(x) = x$$And I extolled its virtues. I noted that it is strictly proper and therefore furnishes an accuracy dominance argument for Probabilism. And I showed that, if we restrict attention to credence functions defined over partitions, rather than full algebras, it is the unique strictly proper scoring rule that delivers Conditionalization when you ask for the posterior that minimizes expected inaccuracy with respect to the prior and under the constraint that the posterior credence in the evidence must be 1. But then Catrin Campbell-Moore asked the natural question: what happens when you focus attention instead on full algebras rather than partitions? And looking into this revealed that things don't look so rosy for the enhanced log score. Indeed, if we focus just on the algebra built over three possible worlds, we see that every strictly proper scoring rule delivers the same updating rule, and it is not Conditionalization.

Let's see this in more detail. First, let $\mathcal{W} = \{w_1, w_2, w_3\}$ be our set of possible worlds. And let $\mathcal{F}$ be the algebra over $\mathcal{W}$. That is, $\mathcal{F}$ contains the singletons $\{w_1\}$, $\{w_2\}$, $\{w_3\}$, the pairs $\{w_1, w_2\}$, $\{w_1, w_3\}$, and $\{w_2, w_3\}$ and the tautology $\{w_1, w_2, w_3\}$. Now suppose that your prior credence function is $(p_1, p_2, p_3 = 1-p_1-p_2)$. And suppose that you learn evidence $E = \{w_1, w_2\}$. Then we want to find the posterior, among those that assign credence 1 to $E$, that minimizes expected inaccuracy. Such a posterior will have the form $(x, 1-x, 0)$. Now let $\mathfrak{s}$ be the strictly proper scoring rule by which you measure inaccuracy. Then you wish to minimize:
\begin{eqnarray*}
&& p_1[\mathfrak{s}_1(x) + \mathfrak{s}_0(1-x)  + \mathfrak{s}_0(0) + \mathfrak{s}_1(x+(1-x)) + \mathfrak{s}_1(x+0) + \mathfrak{s}_0((1-x)+0)] + \\
&& p_2[\mathfrak{s}_0(x) + \mathfrak{s}_1(1-x)  + \mathfrak{s}_0(0) + \mathfrak{s}_1(x+(1-x)) + \mathfrak{s}_0(x+0) + \mathfrak{s}_1((1-x)+0)] +\\
&& p_3[\mathfrak{s}_0(x) + \mathfrak{s}_0(1-x)  + \mathfrak{s}_1(0) + \mathfrak{s}_0(x+(1-x)) + \mathfrak{s}_1(x+0) + \mathfrak{s}_1((1-x) +0))]
\end{eqnarray*}
Now, ignore the constant terms, since they do not affect the minima; replace $p_3$ with $1-p_1-p_2$; and group terms together. Then we get:
\begin{eqnarray*}
&& \mathfrak{s}_1(x)(1+p_1 - p_2) + \mathfrak{s}_1(1-x)(1-p_1 + p_2) + \\
&& \mathfrak{s}_0(x)(1-p_1 + p_2) + \mathfrak{s}_0(x)(1+p_1 - p_2)
\end{eqnarray*}
Now, divide through by 2, which again doesn't affect the minimization, and note that$$\frac{1+p_i-p_j}{2} = p_i + \frac{1-p_i-p_j}{2}$$. Then we have
\begin{eqnarray*}
&& (p_1 + \frac{1-p_1-p_2}{2})\mathfrak{s}_1(x) + (p_2 + \frac{1-p_1-p_2}{2})\mathfrak{s}_0(x) + \\
&& (p_2 + \frac{1-p_1-p_2}{2})\mathfrak{s}_1(1-x) + (p_1 + \frac{1-p_1-p_2}{2})\mathfrak{s}_0(1-x)
\end{eqnarray*}
Now, $\mathfrak{s}$ is strictly proper. And $p_2 + \frac{1 -p_1 -p_2}{2} = 1 - (p_1 + \frac{1-p_1-p_2}{2})$. So providing $p_1 + \frac{1-p_1-p_2}{2} \leq 1$ and $p_2 + \frac{1-p_1-p_2}{2} \leq 1$, the posterior that minimizes expected inaccuracy from the point of view of the prior and that assigns credence 1 to $E$ is $(x, 1-x, 0)$ where:$$x = p_1 + \frac{1-p_1-p_2}{2}\ \ \ \ \mbox{and}\ \ \ \ 1-x = p_2 + \frac{1-p_1-p_2}{2}$$And this is very much not Conditionalization. It turns out then, that no strictly proper scoring rule gives Conditionalization on full algebras in this manner.

Friday, 3 July 2020

Updating by minimizing expected inaccuracy -- or: my new favourite scoring rule

For a PDF version of this post, click here.

One of the central questions of Bayesian epistemology concerns how you should update your credences in response to new evidence you obtain. The proposal I want to discuss here belongs to an approach that consists of two steps. First, we specify the constraints that your evidence places on your posterior credences. Second, we specify a means by which to survey the credence functions that satisfy those constraints and pick one to adopt as your posterior.

For instance, in the first step, we might say that when we learn a proposition $E$, we must become certain of it, and so it imposes the following constraint on our posterior credence function $Q$: $Q(E) = 1$. Or we might consider the sort of situation Richard Jeffrey discussed, where there is a partition $E_1, \ldots, E_m$ and credences $q_1, \ldots, q_m$ with $q_1 + \ldots + q_m = 1$ such that your evidence imposes the constraint: $Q(E_i) = q_i$, for $i = 1, \ldots, m$. Or the situation van Fraassen discussed, where your evidence constrains your posterior conditional credences, so that there is a credence $q$ and propositions $A$ and $B$ such that your evidence imposes the constraint: $Q(A|B) = q$.

In the second step of the approach, on the other hand, we might following objective Bayesians like Jon Williamson, Alena Vencovská, and Jeff Paris and say that, from among those credence functions that respect your evidence, you should pick the one that, on a natural measure of informational content, contains minimal information, and which thus goes beyond your evidence as little as possible (Paris & Vencovská 1990, Williamson 2010). Or we might follow what I call the method of minimal mutilation proposed by Persi Diaconis and Sandy Zabell and pick the credence function among those that respect the evidence that is closest to your prior according to some measure of divergence between probability functions (Diaconis & Zabell 1982). Or, you might proceed as Hannes Leitgeb and I suggested and pick the credence function that minimizes expected inaccuracy from the point of view of your prior, while satisfying the constraints the evidence imposes (Leitgeb & Pettigrew 2010). In this post, I'd like to fix a problem with the latter proposal.

We'll focus on the simplest case: you learn $E$ and this requires you to adopt a posterior $Q$ such that $Q(E) = 1$. This is also the case in which the norm governing it is least controversial. The largely undisputed norm in this case says that you should conditionalize your prior on your evidence, so that, if $P$ is your prior and $P(E) > 0$, then your posterior should be $Q(-) = P(-|E)$. That is, providing you assigned a positive credence to $E$ before you learned it, your credence in the proposition $X$ after learning $E$ should be your prior credence in $X$ conditional on $E$.

In order to make the maths as simple as possible, let's assume you assign credences to a finite set of worlds $\{w_1, \ldots, w_n\}$, which forms a partition of logical space. Given a credence function $P$, we write $p_i$ for $P(w_i)$, and we'll sometimes represent $P$ by the vector $(p_1, \ldots, p_n)$. Let's suppose further that your measure of the inaccuracy of a credence function is $\mathfrak{I}$, which is generated additively from a scoring rule $\mathfrak{s}$. That is,
  • $\mathfrak{s}_1(x)$ measures the inaccuracy of credence $x$ in a truth;
  • $\mathfrak{s}_0(x)$ measures the inaccuracy of credence $x$ in a falsehood;
  • $\mathfrak{I}(P, w_i) = \mathfrak{s}_0(p_1) + \mathfrak{s}_0(p_{i-1} ) + \mathfrak{s}_1(p_i) + \mathfrak{s}_0(p_{i+1} ) + \ldots + \mathfrak{s}_0(p_n)$.
Hannes and I then proposed that, if $P$ is your prior, you should adopt as your posterior the credence function $Q$ such that
  1. $Q(E) = 1$;
  2. for any other credence function $Q^\star$ for which $Q^\star(E) = 1$, the expected inaccuracy of $Q$ by the lights of $P$ is less than the expected inaccuracy of $Q^\star$ by the lights of $P$.
Throughout, we'll denote the expected inaccuracy of $Q$ by the lights of $P$ when inaccuracy is measured by $\mathfrak{I}$ as $\mathrm{Exp}_\mathfrak{I}(Q | P)$. Thus,
$$ \mathrm{Exp}_\mathfrak{I}(Q | P) = \sum^n_{i=1} p_i \mathfrak{I}(Q, w_i)$$
At this point, however, a problem arises. There are two inaccuracy measures that tend to be used in statistics and accuracy-first epistemology. The first is the Brier inaccuracy measure $\mathfrak{B}$, which is generated by the quadratic scoring rule $\mathfrak{q}$:
$$\mathfrak{q}_0(x) = x^2\ \ \ \mbox{and}\ \ \ \ \mathfrak{q}_1(x) = (1-x)^2$$
So
$$\mathfrak{B}(P, w_i) = 1-2p_i + \sum^n_{i=1} p_i^2$$
The second is the local log inaccuracy measure $\mathfrak{L}$, which is generated by what I'll call here the basic log score $\mathfrak{l}$:
$$\mathfrak{l}_0(x) = 0\ \ \ \ \mbox{and}\ \ \ \ \mathfrak{l}_1(x) = -\log x$$
So
$$\mathfrak{L}(P, w_i) = -\log p_i$$
The problem is that both have undesirable features for this purpose: the Brier inaccuracy measure does not deliver Conditionalization when you take the approach Hannes and I described; the local log inaccuracy measure does give Conditionalization, but while it is strictly proper in a weak sense, the basic log score that generates it is not; and relatedly, but more importantly, the local log inaccuracy measure does not furnish an accuracy dominance argument for Probabilism. Let's work through this in more detail.

According to the standard Bayesian norm of Conditionalization, if $P$ is your prior and $P(E) > 0$, then your posterior after learning at most $E$ should be $Q(-) = P(-|E)$. That is, when I remove all credence from the worlds at which my evidence is false, in order to respect my new evidence, I should redistribute it to the worlds at which my evidence is true in proportion to my prior credence in those worlds.

Now suppose that I update instead by picking the posterior $Q$ for which $Q(E) = 1$ and that minimizes expected inaccuracy as measured by the Brier inaccuracy measure. Then, at least in most cases, when I remove all credence from the worlds at which my evidence is false, in order to respect my new evidence, I redistribute it equally to the worlds at which my evidence is true---not in proportion to my prior credence in those worlds, but equally to each, regardless of my prior attitude.

Here's a quick illustration in the case in which you distribute your credences over three worlds, $w_1$, $w_2$, $w_3$ and the proposition you learn is $E = \{w_1, w_2\}$. Then we want to find a posterior $Q = (x, 1-x, 0)$ with minimal expected Brier inaccuracy from the point of view of the prior $P = (p_1, p_2, p_3)$. Then:
\begin{eqnarray*}
& & \mathrm{Exp}_\mathfrak{B}((x, 1-x, 0) | (p_1, p_2, p_3))\\
& = & p_1[(1-x)^2 + (1-x)^2 + 0^2] + p_2[x^2 + x^2 + 0^2] +p_3[x_2 + (1-x)^2 + 1]
\end{eqnarray*}
Differentiating this with respect to $x$ gives $$-4p_1 + 4x - 2p_3$$ which equals 0 iff $$x = p_1 + \frac{p_3}{3}$$ Thus, providing $p_1 + \frac{p_3}{3}, p_2 + \frac{p_3}{3} \leq 1$, then the posterior that minimizes expected Brier inaccuracy while respecting the evidence is $$Q = \left (p_1 + \frac{p_3}{3}, p_2 + \frac{p_3}{3}, 0 \right )$$ And this is typically not the same as Conditionalization demands.

Now turn to the local log measure, $\mathfrak{L}$. Here, things are actually a little complicated by the fact that $-\log 0 = \infty$. After all, $$\mathrm{Exp}_\mathfrak{L}((x, 1-x, 0)|(p_1, p_2, p_3)) = -p_1\log x - p_2 \log (1-x) - p_3 \log 0$$ and this is $\infty$ regardless of the value of $x$. So every value of $x$ minimizes, and indeed maximizes, this expectation. As a result, we have to look at the situation in which the evidence imposes the constraint $Q(E) = 1-\varepsilon$ for $\varepsilon > 0$, and ask what happens as we let $\varepsilon$ approach 0. Then
$$\mathrm{Exp}_\mathfrak{L}((x, 1-\varepsilon-x, \varepsilon)|(p_1, p_2, p_3)) = -p_1\log x - p_2 \log (1-\varepsilon-x) - p_3 \log \varepsilon$$
Differentiating this with respect to $x$ gives
$$-\frac{p_1}{x} + \frac{p_2}{1-\varepsilon - x}$$
which equals 0 iff
$$x = (1-\varepsilon) \frac{p_1}{p_1 + p_2}$$
And this approaches Conditionalization as $\varepsilon$ approaches 0. So, in this sense, as Ben Levinstein pointed out, the local log inaccuracy measure gives Conditionalization, and indeed Jeffrey Conditionalization or Probability Kinematics as well (Levinstein 2012). So far, so good.

However, throughout this post, and in the two derivations above---the first concerning the Brier inaccuracy measure and the second concerning the local log inaccuracy measure---we assumed that all credence functions must be probability functions. That is, we assumed Probabilism, the other central tenet of Bayesianism alongside Conditionalization. Now, if we measure inaccuracy using the Brier measure, we can justify that, for then we have the accuracy dominance argument, which originated mathematically with Bruno de Finetti, and was given its accuracy-theoretic philosophical spin by Jim Joyce (de Finetti 1974, Joyce 1998). That is, if your prior or your posterior isn't a probability function, then there is an alternative that is and that is guaranteed to be more Brier-accurate. However, the local log inaccuracy measure doesn't furnish us with any such argument. One very easy way to see this is to note that the non-probabilistic credence function $(1, 1, \ldots, 1)$ over $\{w_1, \ldots, w_n\}$ dominates all other credence functions according to the local log measure. After all, $\mathfrak{L}((1, 1, \ldots, 1), w_i) = -\log 1 = 0$, for $i = 1, \ldots, n$, while $\mathfrak{L}(P, w_i) > 0$ for any $P$ with $p_i < 1$ for some $i = 1, \ldots, n$.

Another related issue is that the scoring rule $\mathfrak{l}$ that generates $\mathfrak{L}$ is not strictly proper. A scoring rule $\mathfrak{s}$ is said to be strictly proper if every credence expects itself to be the best. That is, for any $0 \leq p \leq 1$, $p\mathfrak{s}_1(x) + (1-p) \mathfrak{s}_0(x)$ is minimized, as a function of $x$, at $x = p$. But $-p\log x + (1-p)0 = -p\log x$ is always minimized, as a function of $x$, at $x = 1$, where $-p\log x = 0$. Similarly, an inaccuracy measure $\mathfrak{I}$ is strictly proper if, for any probabilistic credence function $P$, $\mathrm{Exp}_\mathfrak{I}(Q | P) = \sum^n_{i=1} p_i \mathfrak{I}(Q, w_i)$ is minimized, as a function of $Q$ at $Q = P$. Now, in this sense, $\mathfrak{L}$ is not strictly proper, since $\mathrm{Exp}_\mathfrak{L}(Q | P) = \sum^n_{i=1} p_i \mathfrak{L}(Q, w_i)$ is minimized, as function of $Q$ at $Q = (1, 1, \ldots, 1)$, as noted above. Nonetheless, if we restrict our attention to probabilistic $Q$, $\mathrm{Exp}_\mathfrak{L}(Q | P) = \sum^n_{i=1} p_i \mathfrak{L}(c, w_i)$ is minimized at $Q = P$. In sum: $\mathfrak{L}$ is only a reasonable inaccuracy measure to use if you already have an independent motivation for Probabilism. But accuracy-first epistemology does not have that luxury. One of central roles of an inaccuracy measure in that framework is to furnish an accuracy dominance argument for Probabilism.

So, we ask: is there a scoring rule $\mathfrak{s}$ and resulting inaccuracy measure $\mathfrak{I}$ such that:
  1. $\mathfrak{s}$ is a strictly proper scoring rule;
  2. $\mathfrak{I}$ is a strictly proper inaccuracy measure; 
  3. $\mathfrak{I}$ furnishes an accuracy dominance argument for Probabilism;
  4. If $P(E) > 0$, then $\mathrm{Exp}_\mathfrak{I}(Q | P)$ is minimized, as a function of $Q$ among credence functions for which $Q(E) = 1$, at $Q(-) = P(-|E)$.
Straightforwardly, (1) entails (2). And, by a result due to Predd, et al., (1) also entails (3) (Predd 2009). So we seek $\mathfrak{s}$ with (1) and (4). Theorem 1 below shows that essentially only one such $\mathfrak{s}$ and $\mathfrak{I}$ exist and they are what I will call the enhanced log score $\mathfrak{l}^\star$ and the enhanced log inaccuracy measure $\mathfrak{L}^\star$:
$$\mathfrak{l}^\star_0(x) = x\ \ \ \ \mathrm{and}\ \ \ \ \mathfrak{l}^\star_1(x) = -\log x + x-1$$

The enhanced log score $\mathfrak{l}^\star$. $\mathfrak{s}_0$ in yellow; $\mathfrak{s}_1$ in blue.


Before we state and prove the theorem, there are some features of this scoring rule and its resulting inaccuracy measure that are worth noting. Juergen Landes has identified this scoring rule for a different purpose (Proposition 9.1, Landes 2015).


Proposition 1 $\mathfrak{l}^\star$ is strictly proper.

Proof. Suppose $0 \leq p \leq 1$. Then
$$\frac{d}{dx} p\mathfrak{l}^\star_1(x) + (1-p)\mathfrak{l}^\star_0(x) = \frac{d}{dx} p[-\log x + x] + (1-p)x = -\frac{p}{x} + 1 = 0$$ iff $p = x$. $\Box$

Proposition 2 If $P$ is non-probabilistic, then $P^\star = \left (\frac{p_1}{\sum_k p_k}, \ldots, \frac{p_n}{\sum_k p_k} \right )$ accuracy dominates $P = (p_1, \ldots, p_n)$.

Proof. $$\mathfrak{L}^\star(P^\star, w_i) = -\log\left ( \frac{p_i}{\sum_k p_k} \right ) + 1 = -\log p_i + \log\sum_k p_k + 1$$ and $$\mathfrak{L}^\star(P, w_i) = -\log p_i + \sum_k p_k$$ But $\log x + 1 \leq x$, for all $x> 0$, with equality iff $x = 1$. So, if $P$ is non-probabilistic, then $\sum_k p_k \neq 1$ and  $$\mathfrak{L}^\star(P^\star, w_i) < \mathfrak{L}^\star(P, w_i)$$ for $i = 1, \ldots, n$. $\Box$

Proposition 3 If $P$ is probabilistic, $\mathfrak{L}^\star(P, w_i) = 1 + \mathfrak{L}(P, w_i)$.

Proof.
\begin{eqnarray*}
\mathfrak{L}^\star(P, w_i) & = & p_1 + \ldots + p_{i-1} + (-\log p_i + p_i ) + p_{i+1} + \ldots + p_n \\
& = & -\log p_i + 1 \\
& = & 1 + \mathfrak{L}(P, w_i)
\end{eqnarray*}
 $\Box$

Corollary 1 If $P$, $Q$ are probabilistic, then
$$\mathrm{Exp}_{\mathfrak{L}^\star}(Q | P) = 1 + \mathrm{Exp}_\mathfrak{L}(Q | P)$$

Proof.  By Proposition 3. $\Box$

Corollary 2 Suppose $E_1, \ldots, E_m$ is a partition and $0 \leq q_1, \ldots, q_m \leq 1$ with $\sum^m_{i=1} q_i = 1$. Then, among $Q$ for which $Q(E_i) = q_i$ for $i = 1, \ldots, m$, $\mathrm{Exp}_{\mathfrak{L}^\star}(Q |P)$ is minimized at the Jeffrey Conditionalization posterior $Q(-) = \sum^k_{i=1} q_iP(-|E_i)$.

Proof.  This follows from Corollary 1 and Theorem 5.1 from (Diaconis & Zabell 1982). $\Box$

Having seen $\mathfrak{l}^\star$ and $\mathfrak{L}^\star$ in action, let's see that they are unique in having this combination of features.

Theorem 1 Suppose $\mathfrak{s}$ is a strictly proper scoring rule and $\mathfrak{I}$ is the inaccuracy measure  it generates. And suppose that, for any $\{w_1, \ldots, w_n\}$ and any $E \subseteq \{w_1, \ldots, w_n\}$, and any probabilistic credence function $P$, the probabilistic credence function $Q$ that minimizes the expected inaccuracy of $Q$ with respect to $P$ with the constraint $Q(E) = 1$, and when inaccuracy is measured by $\mathfrak{I}$, is $Q(-) = P(-|E)$. Then the scoring rule is
$$\mathfrak{s}_1(x) = -\log x +x\ \ \ \ \mbox{and}\ \ \ \ \mathfrak{s}_0(x) = x$$ or any affine transformation of this.

Proof. First, we appeal to the following lemma (Proposition 2, Predd, et al. 2009):

Lemma 1

(i) Suppose $\mathfrak{s}$ is a continuous strictly proper scoring rule. Then define$$\varphi_\mathfrak{s}(x) = -x\mathfrak{s}_1(x) - (1-x)\mathfrak{s}_0(x)$$Then $\varphi_\mathfrak{s}$ is differentiable on $(0, 1)$ and convex on $[0, 1]$ and $$\mathrm{Exp}_\mathfrak{I}(Q | P) - \mathrm{Exp}_\mathfrak{I}(P | P) = \sum^n_{i=1} \varphi_\mathfrak{s}(p_i) - \varphi_\mathfrak{s}(q_i) -  \varphi_\mathfrak{s}^\prime (q_i)(p_i - q_i)$$ (ii) Suppose $\varphi$ is differentiable on $(0, 1)$ and convex on $[0, 1]$. Then let
  • $\mathfrak{s}^\varphi_1(x) = - \varphi(x) - \varphi'(x)(1-x)$ 
  • $\mathfrak{s}^\varphi_0(x) = - \varphi(x) - \varphi'(x)(0-x)$
Then $\mathfrak{s}^\varphi$ is a strictly proper scoring rule.

Moreover, $\mathfrak{s}^{\varphi_\mathfrak{s}} = \mathfrak{s}$.

Now, let's focus on $\{w_1, w_2, w_3, w_4\}$ and let $E = \{w_1, w_2, w_3\}$. Let $p_1 = a$, $p_2 = b$, $p_3 = c$. Then we wish to minimize
$$\mathrm{Exp}_\mathfrak{I}((x, y, 1-x-y, 0) | (a, b, c, 1-a-b-c))$$
Now, but Lemma 1,
\begin{eqnarray*}
&& \mathrm{Exp}_\mathfrak{I}((x, y, 1-x-y, 0) | (a, b, c, 1-a-b-c)) \\
& = & \varphi(a) - \varphi(x) - \varphi'(x)(a-x)\\
& + & \varphi(b) - \varphi(y) - \varphi'(y)(b-y) \\ 
& + & \varphi(c) - \varphi(1-x-y) - \varphi'(1-x-y)(c - (1-x-y)) \\
& + & \mathrm{Exp}_\mathfrak{I}((a, b, c, 1-a-b-c) | (a, b, c, 1-a-b-c))
\end{eqnarray*}
Thus:
\begin{eqnarray*}
&& \frac{\partial}{\partial x} \mathrm{Exp}_\mathfrak{I}((x, y, 1-x-y, 0) | (a, b, c, 1-a-b-c))\\
& = & \varphi''(x)(x-a) - ((1-x-y) - c) \varphi''(1-x-y)
\end{eqnarray*}
and
\begin{eqnarray*}
&& \frac{\partial}{\partial y} \mathrm{Exp}_\mathfrak{I}((x, y, 1-x-y, 0) | (a, b, c, 1-a-b-c))\\
& = & \varphi''(y)(y-b) - ((1-x-y) - c) \varphi''(1-x-y)
\end{eqnarray*}
which are both 0 iff$$\varphi''(x)(x-a) = \varphi''(y)(y-b) = ((1-x-y) - c) \varphi''(1-x-y)$$ Now, suppose this is true for $x = \frac{a}{a+b+c}$ and $y = \frac{b}{a + b+ c}$. Then, for all $0 \leq a, b, c \leq 1$ with $a + b + c \leq 1$, $$a\varphi'' \left ( \frac{a}{a+b+c} \right ) = b\varphi'' \left ( \frac{b}{a+b+c} \right ) $$
We now wish to show that $\varphi''(x) = \frac{k}{x}$ for all $0 \leq x \leq 1$. If we manage that, then it follows that $\varphi'(x) = k\log x + m$ and $\varphi(x) = kx\log x + (m-k)x$. And we know from Lemma 1:
\begin{eqnarray*}
& & \mathfrak{s}_0(x) \\
& = & - \varphi(x) - \varphi'(x)(0-x) \\
& = & - [kx\log x + (m-k)x] - [k\log x + m](0-x) \\
& = & kx
\end{eqnarray*}
and
\begin{eqnarray*}
&& \mathfrak{s}_1(x) \\
& = & - \varphi(x) - \varphi'(x)(1-x) \\
& = & - [kx\log x + (m-k)x] - [k\log x + m](1-x) \\
& = & -k\log x + kx - m
\end{eqnarray*}
Now, first, let $f(x) = \varphi''\left (\frac{1}{x} \right )$. Thus, it will suffice to prove that $f(x) = x$. For then $\varphi''(x) = \varphi''\left (\frac{1}{\frac{1}{x}} \right ) = f \left ( \frac{1}{x} \right ) = \frac{1}{x}$, as required. And to prove $f(x) = x$, we need only show that $f'(x)$ is a constant function. We know that, for all $0 \leq a, b, c \leq 1$ with $a + b + c \leq 1$, we have
$$a f \left ( \frac{a + b + c}{a} \right ) = bf \left ( \frac{a + b + c}{b} \right )$$
So$$
\frac{d}{dx} a f \left ( \frac{a + b + x}{a} \right ) = \frac{d}{dx} bf \left ( \frac{a + b + x}{b} \right )
$$So, for all $0 \leq a, b, c \leq 1$ with $a + b + c \leq 1$
$$
f'\left (\frac{a+b+c}{a} \right ) = f'\left (\frac{a + b + c}{b} \right )
$$We now show that, for all $x \geq 1$, $f'(x) = f'(2)$, which will suffice to show that it is constant. First, we consider $2 \leq x$. Then let
$$a = \frac{1}{x}\ \ \ \ \ b = \frac{1}{2}\ \ \ \ \ c = \frac{1}{2}-\frac{1}{x}$$
Then
$$f'(x) = f'\left (\frac{a + b + c}{a} \right ) = f'\left (\frac{a + b + c}{b} \right ) = f'(2)$$
Second, consider $1 \leq x \leq 2$. Then pick $2 \leq y$ such that $\frac{1}{x} + \frac{1}{y} \leq 1$. Then let
$$a = \frac{1}{x}\ \ \ \ \ b = \frac{1}{y}\ \ \ \ \ c = 1 - \frac{1}{x} - \frac{1}{y}$$
Then
$$f'(x) = f'\left (\frac{a + b + c}{a} \right ) = f'\left (\frac{a + b + c}{b} \right ) = f'(y) = f'(2)$$
as required. $\Box$

Friday, 6 December 2019

Deterministic updating and the symmetry argument for Conditionalization

According to the Bayesian, when I learn a proposition to which I assign a positive credence, I should update my credences so that my new unconditional credence in a proposition is my old conditional credence in that proposition conditional on the proposition I learned. Thus, if $c$ is my credence function before I learn $E$, and $c'$ is my credence function afterwards, and $c(E) > 0$, then it ought to be the case that $$c'(-) = c(-|E) := \frac{c(-\ \&\ E)}{c(E)}$$ There are many arguments for this Bayesian norm of updating. Some pay attention to the pragmatic costs of updating any other way (Brown 1976; Lewis 1999); some pay attention to the epistemic costs, which are spelled out in terms of the accuracy of the credences that result from the updating plans (Greaves & Wallace 2006; Briggs & Pettigrew 2018); others show that updating as the Bayesian requires, and only updating in that way, preserves as much as possible about the prior credences while still respecting the new evidence (Diaconis & Zabell 1982; Dietrich, List, and Bradley 2016). And then there are the symmetry arguments that are our focus here (Hughes & van Fraassen 1985; van Fraassen 1987; Grove & Halpern 1998).

In a recent paper, I argued that the pragmatic and epistemic arguments for Bayesian updating are based on an unwarranted assumption, which I called Deterministic Updating. An updating plan says how you'll update in response to a specific piece of evidence. Such a plan is deterministic if there's a single credence function that it says you'll adopt in response to that evidence, rather than a range of different credence functions that you might adopt in response. Deterministic Updating says that your updating plan for a particular piece of evidence should be deterministic. That is, if $E$ is a proposition you might learn, your plan for responding to receiving $E$ as evidence should take the form:
  • If I learn $E$, I'll adopt $c'$ 
rather than the form:
  • If I learn $E$, I might adopt $c'$, I might adopt $c^+$, and I might adopt $c^*$.
Here, I want to show that the symmetry arguments make the same assumption.
Let's start by laying out the symmetry argument. Suppose $W$ is a set of possible worlds, and $F$ is an algebra over $W$. Then an updating plan on $M = (W, F)$ is a function $U^M$ that takes a credence function $P$ defined on $F$ and a proposition $E$ in $F$ and returns the set of credence functions that the updating plan endorses as responses to learning $E$ for those with credence function $P$. Then we impose three conditions on a family of updating plans $U$.

Deterministic Updating This says that an updating plan should endorse at most one credence function as a response to learning a given piece of evidence. That is, for any $M = (W, F)$ and $E$ in $F$, $U^M$ endorses at most one credence function as a response to learning $E$. That is, $|U^M(P, E)| \leq 1$ for all $P$ on $F$ and $E$ in $F$.

Certainty This says that any credence function that an updating plan endorses as a response to learning $E$ must be certain of $E$. That is, for any $M = (W, F)$, $P$ on $F$ and $E$ in $F$, if $P'$ is in $U^M(P, E)$, then $P'(E) = 1$.

Symmetry This condition requires a bit more work to spell out. Very roughly, it says that the way that an updating plan would have you update should not be sensitive to the way the possibilities are represented. More precisely: Let $M = (W, F)$ and $M' = (W', F')$. Suppose $f : W \rightarrow W'$ is a surjective function. That is, for each $w'$ in $W'$, there is $w$ in $W$ such that $f(w) = w'$. And suppose for each $X$ in $F'$, $f^{-1}(X) = \{w \in W | f(w) \in X\}$ is in $F$. Then the worlds in $W'$ are coarse-grained versions of the worlds in $W$, and the propositions in $F'$ are coarse-grained versions of those in $F$. Now, given a credence function $P$ on $F$, let $f(P)$ be the credence function over $F'$ such that $f(P)(X) = P(f^{-1}(X))$. Then the credence functions that result from updating $f(P)$ by $E'$ in $F'$ using $U^{M'}$ are the image under $f$ of the credence functions that result from updating $P$ on $f^{-1}(E')$ using $U^M$. That is, $U^{M'}(f(P), E') = f(U^M(P, f^{-1}(E')))$.

Now, van Fraassen proves the following theorem, though he doesn't phrase it like this because he assumes Deterministic Updating in his definition of an updating rule:

Theorem (van Fraassen) If $U$ satisfies Deterministic Updating, Certainty, and Symmetry, then $U$ is the conditionalization updating plan. That is, if $M = (W, F)$, $P$ is defined on $F$ and $E$ is in $F$ with $P(E) > 0$, then $U^M(P, E)$ contains only one credence function $P'$ and $P'(-) = P(-|E)$.

The problem is that, while Certainty is entirely uncontroversial and Symmetry is very plausible, there is no particularly good reason to assume Deterministic Updating. But the argument cannot go through without it. To see this, consider the following updating rule:
  • If $0 < P(E) < 1$, then $V^M(P, E) = \{v_w | w \in W\ \&\ w \in E\}$, where $v_w$ is the credence function on $F$ such that $v_w(X) = 1$ if $w$ is in $X$, and $v_w(X) = 0$ is $w$ is not in $X$ ($v_w$ is sometimes called the valuation function for $w$, or the omniscience credence function at $w).
  • If $P(E) = 1$, then $V^M(P, E) = P$.
That is, if $P$ is not already certain of $E$, then $V^M$ takes any credence function on $F$ and any proposition in $F$ and returns the set of valuation functions for the worlds in $W$ at which that proposition is true. Otherwise, it keeps $P$ unchanged.

It is easy to see that $V$ satisfies Certainty, since $v_w(E) = 1$ for each $w$ in $E$. To see that $V$ satisfies Symmetry, the crucial fact is that $f(v_w) = v_{f(w)}$. First, take a credence function in $V^{M'}(f(P), E')$: that is, $v_{w'}$ for some $w'$ in $E'$. Then $f^{-1}(w')$ is in $f^{-1}(E')$ and so $v_{f^{-1}(w')}$ is in $V^M(P, f^{-1}(E')))$. And $f(v_{f^{-1}(w')}) = v_{w'}$, so $v_{w'}$ is in $f(V^M(P, f^{-1}(E')))$. Next, take a credence function in $f(V^M(P, f^{-1}(E')))$. That is, $f(v_w)$ for some $w$ in $f^{-1}(E')$. Then $f(v_w) = v_{f(w)}$ and thus $f(v_w)$ is in $V^{M'}(f(P), E')$, as required.

So $V$ satisfies Certainty and Symmetry, but it is not the Bayesian updating rule.

Now, perhaps there is some further desirable condition that $V$ fails to meet? Perhaps. And it's difficult to prove a negative existential claim. But one thing we can do is to note that $V$ satisfies all the conditions on updating plans on sets of probabilities that Grove & Halpern explore as they try to extend van Fraassen's argument from the case of precise credences to the case of imprecise credences. All, that is, except Deterministic Updating, which they also impose. Here they are:

Order Invariance This says that updating first on $E$ and then on $E \cap F$ should result in the same posteriors as updating first on $F$ and then on $E \cap F$. This holds because, either way, you end up with $$U^M(P, E \cap F) = \{v_w : w \in W\ \&\ w \in E \cap F\}$$.

Stationarity This says that updating on $E$ should have no effect if you are already certain of $E$. That is, if $P(E) = 1$, then $U^M(P, E) = P$. The second clause of our definition of $V$ ensures this.

Non-Triviality This says that there's some prior that is less than certain of the evidence such that updating it on the evidence leads to some posteriors that the updating plan endorses. That is, for some $M = (W, F)$, some $P$ on $F$, and some $E$ in $F$, $U^M(P, E) \neq \emptyset$. Indeed, $V$ will satisfy this for any $P$ and any $E \neq \emptyset$.

So, in sum, it seems that van Fraassen's symmetry argument for Bayesian updating shares the same flaw as the pragmatic and epistemic arguments, namely, they rely on Deterministic Updating, and yet that assumption is unwarranted.

References

  1. Briggs, R. A., & Pettigrew, R. (2018). An accuracy-dominance argument for conditionalization. Noûs.  https://doi.org/10.1111/nous.12258
  2. Brown, P. M. (1976). Conditionalization and expected utility. Philosophy of Science, 43(3), 415–419.
  3. Diaconis, P., & Zabell, S. L. (1982). Updating subjective probability. Journal of the American Statistical Association, 77(380), 822–830.
  4. Dietrich, F., List, C., & Bradley, R. (2016). Belief revision generalized: A joint characterization of Bayes’s and Jeffrey’s rules. Journal of Economic Theory, 162, 352–371.
  5. Greaves, H., & Wallace, D. (2006). Justifying conditionalization: Conditionalization maximizes expected epistemic utility. Mind, 115(459), 607–632.
  6. Grove, A. J., & Halpern, J. Y. (1998). Updating sets of probabilities. In Proceedings of the 14th conference on uncertainty in AI (pp. 173–182). San Francisco, CA: Morgan Kaufman.
  7. Lewis, D. (1999). Why conditionalize? Papers in metaphysics and epistemology (pp. 403–407). Cambridge: Cambridge University Press.





Thursday, 27 June 2019

CFP (Formal Philosophy, Gdansk)


The International Conference for Philosophy of Science and Formal Methods in Philosophy (CoPS-FaM-19) of the Polish Association for Logic and Philosophy of Science will take place on December 4-6, 2019 at the University of Gdansk (in cooperation with the University of Warsaw). Extended abstract submission: August 31, 2019.

*Keynote speakers*
Hitoshi Omori (Ruhr-Universität Bochum)
Oystein Linnebo (University of Oslo)
Miriam Schoenfield (MIT)
Stanislav Speransky (St. Petersburg State University)
Katya Tentori (University of Trento)

Full submission details available at:
http://lopsegdansk.blogspot.com/p/cops-fam-19-cfp.html


*Programme Committee*
Patrick Blackburn (University of Roskilde)
Cezary Cieśliński (University of Warsaw)
Matteo Colombo (Tilburg University)
Juliusz Doboszewski (Harvard University)
David Fernandez Duque (Ghent University)
Benjamin Eva (University of Konstanz)
Benedict Eastaugh (LMU Munich)
Federico Faroldi (Ghent University)
Michał Tomasz Godziszewski (University of Warsaw)
Valentin Goranko (Stockholm University)
Rafał Gruszczyński (Nicolaus Copernicus University)
Alexandre Guay (University of Louvain)
Zalan Gyenis (Jagiellonian University)
Ronnie Hermens (Utrecht University)
Leon Horsten (University of Bristol)
Johannes Korbmacher (Utrecht University)
Louwe B. Kuijer (University of Liverpool)
Juergen Landes (LMU Munich)
Marianna Antonnutti Marfori (LMU Munich)
Frederik Van De Putte (Ghent University)
Jan-Willem Romeijn (University of Groningen)
Sonja Smets (University of Amsterdam)
Anthia Solaki (University of Amsterdam)
Jan Sprenger (University of Turin)
Stanislav Speransky (St. Petersburg State University)
Tom F. Sterkenburg (LMU Munich)
Johannes Stern (University of Bristol)
Allard Tamminga (University of Groningen)
Mariusz Urbański (Adam Mickiewicz University)
Erik Weber (Ghent University)
Leszek Wroński (Jagiellonian University)

*Local Organizing Committee:*
Rafal Urbaniak
Patryk Dziurosz-Serafinowicz
Pavel Janda
Pawel Pawlowski
Paula Quinon
Weronika Majek
Przemek Przepiórka
Małgorzata Stefaniak

Friday, 17 May 2019

What is conditionalization and why should we do it?

The three central tenets of traditional Bayesian epistemology are these:

Precision Your doxastic state at a given time is represented by a credence function, $c$, which takes each proposition $X$ about which you have an opinion and returns a single numerical value, $c(X)$, that measures the strength of your belief in $X$. By convention, we let $0$ represent your minimal credence and we let $1$ represent your maximal credence.

Probabilism Your credence function should be a probability function. That is, you should assign minimal credence (i.e. 0) to necessarily false propositions, maximal credence (i.e. 1) to necessarily true propositions, and your credence in the disjunction of two propositions whose conjunction is necessarily false should be the sum of your credences in the disjuncts.

Conditionalization You should update your credences by conditionalizing on your total evidence.

Note: Precision sets out the way in which doxastic states will be represented; Probabilism and Conditionalization are norms that are stated using that representation.

Here, we will assume Precision and Probabilism and focus on Conditionalization. In particular, we are interested in what exactly the norm says; and, more specifically, which versions of the norm are supported by the standard arguments in its favour. That is, we are interested in what versions of the norm we can justify using the existing arguments. We will consider three versions of the norm; and we will consider four arguments in its favour. For each combination, we'll ask whether the argument can support the norm. In each case, we'll notice that the standard formulation relies on a particular assumption, which we call Deterministic Updating and which we formulate precisely below. We'll ask whether the argument really does rely on this assumption, or whether it can be amended to support the norm without that assumption. Let's meet the interpretations and the arguments informally now; then we'll be ready to dive into the details.

Here are the three interpretations of Conditionalization. According to the first, Actual Conditionalization, Conditionalization governs your actual updating behaviour.

Actual Conditionalization (AC)

If
  • $c$ is your credence function at $t$ (we'll often refer to this as your prior);
  • the total evidence you receive between $t$ and $t'$ comes in the form of a proposition $E$ learned with certainty;
  • $c(E) > 0$;
  • $c'$ is your credence function at the later time $t'$ (we'll often refer to this as your posterior);
then it should be the case that $c'(-) = c(-|E) = \frac{c(-\ \&\ E)}{c(E)}$.
According to the second, Plan Conditionalization, Conditionalization governs the updating behaviour you would endorse in all possible evidential situations you might face:

Plan Conditionalization (PC)

If
  • $c$ is your credence function at $t$;
  • the total evidence you receive between $t$ and $t'$ will come in the form of a proposition learned with certainty, and that proposition will come from the partition $\mathcal{E} = \{E_1, \ldots, E_n\}$;
  • $R$ is the plan you endorse for how to update in response to each possible piece of total evidence,
then it should be the case that, if you were to receive evidence $E_i$ and if $c(E_i) > 0$, then $R$ would exhort you to adopt credence function $c_i(-) = c(-|E_i) = \frac{c(-\ \&\ E_i)}{c(E_i)}$.

According to the third, Dispositional Conditionalization, Conditionalization governs the updating behaviour you are disposed to exhibit.
 
Dispositional Conditionalization (DC)

If
  • $c$ is your credence function at $t$;
  • the total evidence you receive between $t$ and $t'$ will come in the form of a proposition learned with certainty, and that proposition will come from the partition $\mathcal{E} = \{E_1, \ldots, E_n\}$;
  • $R$ is the plan you are disposed to follow in response to each possible piece of total evidence,
then it should be the case that, if you were to receive evidence $E_i$ and if $c(E_i) > 0$, then $R$ would exhort you to adopt credence function $c_i(-) = c(-|E_i) = \frac{c(-\ \&\ E_i)}{c(E_i)}$.

Next, let's meet the four arguments. Since it will take some work to formulate them precisely, I will give only an informal gloss here. There will be plenty of time to see them in high-definition in what follows.

Diachronic Dutch Book or Dutch Strategy Argument (DSA) This purports to show that, if you violate conditionalization, there is a pair of decisions you might face, one before and one after you receive your evidence, such that your prior and posterior credences lead you to choose options when faced with those decisions that are guaranteed to be worse by your own lights than some alternative options (Lewis 1999).

Expected Pragmatic Utility Argument (EPUA) This purports to show that, if you will face a decision after learning your evidence, then your prior credences will expect your updated posterior credences to do the best job of making that decision if they are obtained by conditionalizing on your priors (Brown 1976).

Expected Epistemic Utility Argument (EEUA) This purports to show that your prior credences will expect your posterior credences to be best epistemically speaking if they are obtained by conditionalizing on your priors (Greaves & Wallace 2006).

Epistemic Utility Dominance Argument (EUDA) This purports to show that, if you violate conditionalization, then there will be alternative priors and posteriors  that are guaranteed to be better epistemically speaking, when considered together, than your priors and posteriors (Briggs & Pettigrew 2018).

The framework


In the following sections, we will consider each of the arguments listed above. As we will see, these arguments are concerned directly with updating plans or dispositions, rather than actual updating behaviour. That is, the items that they consider don't just specify how you in fact update in response to the particular piece of evidence you actually receive. Rather, they assume that your evidence between the earlier and later time will come in the form of a proposition learned with certainty (Certain Evidence); they assume the possible propositions that you might learn with certainty by the later time form a partition (Evidential Partition); and they assume that each of the propositions you might learn with certainty is one about which you had a prior opinion (Evidential Availability); and then they specify, for each of the possible pieces of evidence in your evidential partition, how you might update if you were to receive it.

Some philosophers, like David Lewis (1999), assume that all three assumptions---Certain Evidence, Evidential Partition, Evidential Availability---hold in all learning situations. Others, deny one or more. So Richard Jeffrey (1992) denies Certain Evidence and Evidential Availability; Jason Konek (2019) denies Evidential Availability but not Certain Evidence; Bas van Fraassen (1999), Miriam Schoenfield (2017), and Jonathan Weisberg (2007) deny Evidential Partition. But all agree, I think, that there are certain important situations when all three assumptions are true; there are certain situations where there is a set of propositions that forms a partition and about each member of which you have a prior opinion, and the possible evidence you might receive at the later time comes in the form of one of these propositions learned with certainty. Examples might include: when you are about to discover the outcome of a scientific experiment, perhaps by taking a reading from a measuring device with unambiguous outputs; when you've asked an expert a yes/no question; when you step on the digital scales in your bathroom or check your bank balance or count the number of spots on the back of the ladybird that just landed on your hand. So, if you disagree with Lewis, simply restrict your attention to these cases in what follows.

As we will see, we can piggyback on conclusions about plans and dispositions to produce arguments about actual behaviour in certain situations. But in the first instance, we will take the arguments to address plans and dispositions defined on evidential partitions primarily, and actual behaviour only secondarily. Thus, to state these arguments, we need a clear way to represent updating plans or dispositions. We will talk neutrally here of an updating rule. If you think conditionalization governs your updating dispositions, then you take it to govern the updating rule that matches those dispositions; if you think it governs your updating intentions, then you take it to govern the updating rule you intend to follow.

We'll introduce a slew of terminology here. You needn't take it all in at the moment, but it's worth keeping it all in one place for ease of reference.

Agenda  We will assume that your prior and posterior credence functions are defined on the same set of propositions $\mathcal{F}$, and we'll assume that $\mathcal{F}$ is finite and $\mathcal{F}$ is an algebra. We say that $\mathcal{F}$ is your agenda.

Possible worlds  Given an agenda $\mathcal{F}$, the set of possible worlds relative to $\mathcal{F}$ is the set of classically consistent assignments of truth values to the propositions in $\mathcal{F}$. We'll abuse notation throughout and write $w$ for (i) a truth value assignment to the propositions in $\mathcal{F}$, (ii) the proposition in $\mathcal{F}$ that is true at that truth value assignment and only at that truth value assignment, and (iii) what we might call the omniscient credence function relative to that truth value assignment, which is the credence function that assigns maximal credence (i.e. 1) to all propositions that are true on it and minimal credence (i.e. 0) to all propositions that are false on it.

Updating rules An updating rule has two components:
  • a set of propositions, $\mathcal{E} = \{E_1, \ldots, E_n\}$. This contains the propositions that you might learn with certainty at the later time $t'$; each $E_i$ is in $\mathcal{F}$, so $\mathcal{E} \subseteq \mathcal{F}$; $\mathcal{E}$ forms a partition;
  • a set of sets of credence functions, $\mathcal{C} = \{C_1, \ldots, C_n\}$. For each $E_i$, $C_i$ is the set of possible ways that the rule allows you to respond to evidence $E_i$; that is, it is the set of possible posteriors that the rule permits when you learn $E_i$; each $c'$ in $C_i$ in $\mathcal{C}$ is defined on $\mathcal{F}$.

Deterministic updating rule We say that an updating rule $R = (\mathcal{E}, \mathcal{C})$ is deterministic if each $C_i$ is a singleton set $\{c_i\}$. That is, for each piece of evidence there is exactly one possible response to it that the rule allows.

Stochastic updating rule  A stochastic updating rule is an updating rule $R = (\mathcal{C}, \mathcal{E})$ equipped with a probability function $P$. $P$ records, for each $E_i$ in $\mathcal{E}$ and $c'$ in $C_i$, how likely it is that I will adopt $c'$ in response to learning $E_i$. We write this $P(R^i_{c'} | E_i)$, where $R^i_{c'}$ is the proposition that says that you adopt posterior $c'$ in response to evidence $E_i$.
  • We assume $P(R^i_{c'} | E_i) > 0$ for all $c'$ in $C_i$. If the probability that you will adopt $c'$ in response to $E_i$ is zero, then $c'$ does not count as a response to $E_i$ that the rule allows.
  • Note that every deterministic updating rule is a stochastic updating rule for which $P(R^i_{c'} | E_i) = 1$ for each $c'$ in $C_i$. If $R = (\mathcal{E}, \mathcal{C})$ is deterministic, then, for each $E_i$, $C_i = \{c_i\}$. So let $P(R^i_{c_i} | E_i) = 1$.

Conditionalizing updating rule An updating rule $R = (\mathcal{E}, \mathcal{C})$ is a conditionalizing rule for a prior $c$ if, whenever $c(E_i) > 0$, $C_i = \{c_i\}$ and $c_i(-) = c(-|E_i)$.

Conditionalizing pairs  A pair $\langle c, R \rangle$ of a prior and an updating rule is a conditionalizing pair if $R$ is a conditionalizing rule for $c$.

Pseudo-conditionalizing updating rule Suppose $R = (\mathcal{E}, \mathcal{C})$ is an updating rule. Then let $\mathcal{F}^*$ be the smallest algebra that contains all of $\mathcal{F}$ and also $R^i_{c'}$ for each $E_i$ in $\mathcal{E}$ and $c'$ in $C_i$. (As above $R^i_{c'}$ is the proposition that says that you adopt posterior $c'$ in response to evidence $E_i$.) Then an updating rule $R$ is a pseudo-conditionalizing rule for a prior $c$ if it is possible to extend $c$, a credence function defined on $\mathcal{F}$, to $c^*$, a credence function defined on $\mathcal{F}^*$, such that, for each $E_i$ in $\mathcal{E}$ and $c'$ in $C_i$, $c'(-) = c^*(-|R^i_{c'})$. That is, each posterior is the result of conditionalizing the extended prior $c^*$ on the evidence to which it is a response and the fact that it was your response to this evidence.

Pseudo-conditionalizing pair A pair $\langle c, R \rangle$ of a prior and an updating rule is a pseudo-conditionalizing pair if $R$ is a pseudo-conditionalizing rule for $c$.

Let's illustrate these definitions using an example. Condi is a meteorologist. There is a hurricane in the Gulf of Mexico. She knows that it will make landfall soon in one of the following four towns: Pensecola, FL, Panama City, FL, Mobile, AL, Biloxi, MS. She calls a friend and asks whether it has hit yet. It has. Then she asks whether it has hit in Florida. At this point, the evidence she will receive when her friend answers is either $F$---which says that it made landfall in Florida, that is, in Pensecola or Panama City---or $\overline{F}$---which says it hit elsewhere, that is, in Mobile or Biloxi. Her prior is $c$:

Her evidential partition is $\mathcal{E} = \{F, \overline{F}\}$. And here are some posteriors she might adopt:



And here are four possible rules she might adopt, along with their properties:


As we will see below, for each of our four arguments for conditionalization---DSA, EPUA, EEUA, and EUDA---the standard formulation of the argument assumes a norm that we will call Deterministic Updating:

Deterministic Updating (DU) Your updating rule should be deterministic.

As we will see, this is crucial for the success of these arguments. In what follows, I will present each argument in its standard formulation, which assumes Deterministic Updating. Then I will explore what happens when we remove that assumption.

The Dutch Strategy Argument (DSA)


The DSA and EPUA both evaluate updating rules by their pragmatic consequences. That is, they look to the choices that your priors and/or your possible posteriors lead you to make and they conclude that they are optimal only if your updating rule is a conditionalizing rule for your prior.

DSA with Deterministic Updating


Let's look at the DSA first. In what follows, we'll take a decision problem to be a set of options that are available to an agent: e.g. accept a particular bet or refuse it; buy a particular lottery ticket or don't; take an umbrella when you go outside, take a raincoat, or take neither; and so on. The idea behind the DSA is this. One of the roles of credences is to help us make choices when faced with decision problems. They play that role badly if they lead us to make one series of choices when another series is guaranteed to serve our ends better. The DSA turns on the claim that, unless we update in line with Conditionalization, our credences will lead us to make such a series of choices when faced with a particular series of decision problems.

Here, we restrict attention to a particular class of decision problems you might face. They are the decision problems in which, for each available option, its outcome at a given possible world obtains for you a certain amount of a particular quantity, such as money or chocolate or pure pleasure, and your utility is linear in that quantity---that is, obtaining some amount of that quantity increases your utility by the same amount regardless of how much of the quantity you already have. The quantity is typically taken to be money, and we'll continue to talk like that in what follows. But it's really a placeholder for some quantity with this property. We restrict attention to such decision problems because, in the argument, we need to combine the outcome of one decision, made at the earlier time, with the outcome of another decision, made at the later time. So we need to ensure that the utility of a combination of outcomes is the sum of the utilities of the individual outcomes.

Now, as we do throughout, we assume that the prior $c$ and the possible posteriors $c_1, \ldots, c_n$ permitted by a deterministic updating rule $R$ are all probability functions. And we will assume further that, when your credences are probabilistic, and you face a decision problem, then you should choose from the available options one of those that maximises expected utility relative to your credences.

With this in hand, let's define two closely related features of a pair $\langle c, R \rangle$ that are undesirable from a pragmatic point of view, and might be thought to render that pair irrational. First:

Strong Dutch Strategies  $\langle c, R \rangle$ is vulnerable to a strong Dutch strategy if there are two decision problems, $\mathbf{d}$, $\mathbf{d}'$ such that
  1. $c$ requires you to choose option $A$ from the possible options available in $\mathbf{d}$;
  2. for each $E_i$ and each $c'$ in $C_i$, $c'$ requires you to choose $B$ from $\mathbf{d}'$;
  3. there are alternative options, $X$ in $\mathbf{d}$ and $Y$ in $\mathbf{d}'$, such that, at every possible world, you'll receive more utility from choosing $X$ and $Y$ than you receive from choosing $A$ and $B$. In the language of decision theory, $X + Y$ strongly dominates $A + B$.
Weak Dutch Strategies  $\langle c, R \rangle$ is vulnerable to a weak Dutch strategy if there are decision problems $\mathbf{d}$ and, for each $c'$ in $C_i$ in $\mathcal{C}$, $\mathbf{d}_{c'}$ such that
  1. $c$ requires you to choose $A$ from $\mathbf{d}$;
  2. for each $E_i$ and each $c'$ in $C_i$, $c'$ requires you to choose $B^i_{c'}$ from $\mathbf{d}'_{c'}$;
  3. there are alternative options, $X$ in $\mathbf{d}$ and, for $E_i$ and $c'$ in $C_i$, $Y^i_{c'}$ in $\mathbf{d}'_{c'}$, such that (a) for each $E_i$, each world in $E_i$, and each $c'$ in $C_i$, you'll receive at least as much utility at that world from choosing $X$ and $Y^i_{c'}$ as you'll receive from choosing $A$ and $B^i_{c'}$, and (b) for some $E_i$, some world in $E_i$, and some $c'$ in $C_i$, you'll receive strictly more utility at that world from $X$ and $Y^i_{c'}$ than you'll receive from $A$ and $B^i_{c'}$.
Then the Dutch Strategy Argument is based on the following mathematical fact (de Finetti 1974):

Theorem 1 Suppose $R$ is a deterministic updating rule. Then:
  1. if $R$ is not a conditionalizing pair for $c$, then $\langle c, R \rangle$ is vulnerable to a strong Dutch strategy;
  2. if $R$ is a conditionalizing rule for $c$, then $\langle c, R \rangle$ is not vulnerable even to a weak Dutch strategy.
That is, if your updating rule is not a conditionalizing rule for your prior, then your credences will lead you to choose a strongly dominated pair of options when faced with a particular pair of decision problems; if you satisfy it, that can't happen.

Now that we have seen how the argument works, let's see whether it supports the three versions of conditionalization that we met above: Actual (AC), Plan (PC), and Dispositional (DC) Conditionalization. Since they speak directly of rules, let's begin with PC and DC.

The DSA shows that, if you endorse a deterministic rule that isn't a conditionalizing rule for your prior, then there is pair of decision problems, one that you'll face at the earlier time and the other at the later time, where your credences at the earlier time and your planned credences at the later time will require you to choose a dominated pair of options. And it seems reasonable to say that it is irrational to endorse a plan when you will be rendered vulnerable to a Dutch Strategy  if you follow through on it. So, for those who endorse deterministic rules, DSA plausibly supports Plan Conditionalization.

The same is true of Dispositional Conditionalization. Just as it is irrational to plan to update in a way that would render you vulnerable to a Dutch Strategy if you were to stick to the plan, it is surely irrational to be disposed to update in a way that will renders you vulnerable in this way. So, for those whose updating dispositions are deterministic, DSA plausibly supports Dispositional Conditionalization.

Finally, AC. There various different ways to move from either PC or DC to AC, but each one of them requires some extra assumptions. For instance:

(I) I might assume: (i) between an earlier and a later time, there is always a partition such that you know that the strongest pieces of evidence you might receive between those times is a proposition from that partition learned with certainty; (ii) if you know you'll receive evidence from some partition, you are rationally required to plan how you will update on each possible piece of evidence before you receive it; and (iii) if you plan how to respond to evidence before you receive it, you are rationally required to follow through on that plan once you have received it. Together with PC + DU, these give AC.

(II) I might assume: (i) you have updating dispositions. So, if you actually update other than by conditionalization, then it must be a manifestation of a  disposition other than conditionalizing. Together with DC + DU, this gives  AC.

(III) I might assume: (i) that you are rationally required to update in any way that can be represented as the result of updating on a plan that you were rationally permitted to endorse or as the result of dispositions that you were rationally permitted to have, even if you did not in fact endorse any plan prior to receiving the evidence nor have any updating dispositions. Again, together with PC + DU or DC + DU, this gives AC.

Notice that, in each case, it was essential to invoke Deterministic Updating (DU). As we will see below, this causes problems for AC.

DSA without Deterministic Updating


We have now seen how the DSA proceeds if we assume Deterministic Updating. But what if we don't? Consider, for instance, rule $R_3$ from our list of examples above:
$$R_3 = (\mathcal{E} = \{F, \overline{F}\}, \mathcal{C} = \{\{c^\circ_F, c^+_F\}, \{c^\circ_{\overline{F}}, c^+_{\overline{F}}\}\})$$
That is, if Condi learns $F$, rule $R_3$ allows her to update to $c^\circ_F$ or to $c^+_F$. And if she receives $\overline{F}$, it allows her to update to $c^\circ_{\overline{F}}$ or to $c^+_{\overline{F}}$. Notice that $R_3$ violates conditionalization thoroughly: it is not deterministic; and, moreover, as well as not mandating the posteriors that conditionalization demands, it does not even permit them. Can we adapt the DSA to show that $R_3$ is irrational? No. We cannot use Dutch Strategies to show that $R_3$ is irrational because it isn't vulnerable to them.

To see this, we first note that, while $R_3$ is not deterministic and not a conditionalizing rule, it is a pseudo-conditionalizing rule.  And to see that, it helps to state the following representation theorem for pseudo-conditionalizing rules.

Lemma 1 $R$ is a pseudo-conditionalizing pair for $c$ iff
  1. for all $E_i$ in $\mathcal{E}$ and $c'$ in $C_i$, $c'(E_i) = 1$, and
  2. $c$ is in the convex hull of the possible posteriors that $R$ permits.
But note:$$c(-) = 0.4c^\circ_F(-) + 0.4c^+_F(-) + 0.1c^\circ_{\overline{F}}(-) + 0.1 c^+_{\overline{F}}(-)$$
So $R_3$ is pseudo-conditionalizing. What's more:


Theorem 2
  • If $R$ is not a pseudo-conditionalizing rule for $c$, then $\langle c, R \rangle$ is vulnerable at least to a weak Dutch Strategy, and possibly also a strong Dutch Strategy.
  • If $R$ is a pseudo-conditionalizing rule for $c$, then $\langle c, R \rangle$ is not vulnerable to a weak Dutch Strategy.
Thus, $\langle c, R_3 \rangle$ is not vulnerable even to a weak Dutch Strategy. The DSA, then, cannot say what is irrational about Condi if she begins with prior $c$ and either endorses $R_3$ or is disposed to update in line with it. Thus, the DSA cannot justify Deterministic Updating. And without DU, it cannot support PC or DC either. After all, $R_3$ violates each of those, but it is not vulnerable even to a weak Dutch Strategy. And moreover, each of the three arguments for AC break down because they depend on PC or DC. The problem is that, if Condi updates from $c$ to $c^\circ_F$ upon learning $F$, she violates AC; but there is a non-deterministic updating rule---namely, $R_3$---that allows $c^\circ_F$ as a response to learning $F$, and, for all DSA tells us, she might have rationally endorsed $R_3$ before learning $F$ or she might rationally have been disposed to follow it. Indeed, the only restriction that DSA can place on your actual updating behaviour is that you should become certain of the evidence that you learned. After all:

Theorem 3 Suppose $c$ is your prior and $c'$ is your posterior. Then there is a rule $R$ such that:
  1. $c'$ is in $C_i$, and
  2. $R$ is a pseudo-conditionalizing rule for $c$
iff $c'(E_i) = 1$.

Thus, at the end of this section, we can conclude that, whatever is irrational about planning to update using non-deterministic but pseudo-conditionalizing updating rules, it cannot be that following through on those plans leaves you vulnerable to a Dutch Strategy, for it does not. And similarly, whatever is irrational about being disposed to update in those ways, it cannot be that those dispositions will equip you with credences that lead you to choose dominated options, for they do not. With PC and DC thus blocked, our route to AC is therefore also blocked.

The Expected Pragmatic Utility Argument (EPUA)


Let's look at EPUA next. Again, we will consider how our credences guide our actions when we face decision problems. In this case, there is no need to restrict attention to monetary decision problems. We will only consider a single decision problem, which we face at the later time, after we've received the evidence, so we won't have to combine the outcomes of multiple options as we did in the DSA. The idea is this. Suppose you will make a decision after you receive whatever evidence it is that you receive at the later time. And suppose that you will use your later updated credence function to make that choice---indeed, you'll choose from the available options by maximising expected utility from the point of view of your new updated credences. Which updating rules does your prior expect will lead you to make the choice best?

EPUA with Deterministic Updating


Suppose you'll face decision problem $\mathbf{d}$ after you've updated. And suppose further that you'll use a deterministic updating rule $R$. Then, if $w$ is a possible world and $E_i$ is the element of the evidential partition $\mathcal{E}$ that is true at $w$, the idea is that we take the pragmatic utility of $R$ relative to $\mathbf{d}$ at $w$ to be the utility at $w$ of whatever option from $\mathbf{d}$ we should choose if our posterior credence function were $c_i$, as $R$ requires it to be at $w$. But of course, for many decision problems, this isn't well defined because there is no unique option in $\mathbf{d}$ that maximises expected utility by the lights of $c_i$; rather there are sometimes many such options, and they might have different utilities at $w$. Thus, we need not only $c_i$ but also a selection function, which picks a single option from  any set of options. If $f$ is such a selection function, then let $A^{\mathbf{d}}_{c_i, f}$ be the option that $f$ selects from the set of options in $\mathbf{d}$ that maximise expected utility by the lights of $c_i$. And let
$$u_{\mathbf{d},f}(R, w) = u(A^{\mathbf{d}}_{c_i, f}, w).$$
Then the EPUA argument turns on the following mathematical fact (Brown 1976):

Theorem 4 Suppose $R$ and $R^\star$ are both deterministic updating rules. Then:
  • If $R$ and $R^\star$ are both conditionalizing rules for $c$, and $f$, $g$ are selection functions, then for all decision problems $\mathbf{d}$ $$\sum_{w \in W} c(w) u_{\mathbf{d}, f}(R, w) = \sum_{w \in W} c(w) u_{\mathbf{d}, g}(R^\star, w)$$
  • If $R$ is a conditionalizing rule for $c$, and $R^\star$ is not, and $f$, $g$ are selection functions, then for all decision problems $\mathrm{d}$, $$\sum_{w \in W} c(w) u_{\mathbf{d}, f}(R, w) \geq \sum_{w \in W} c(w) u_{\mathbf{d}, g}(R^\star, w)$$with strict inequality for some decision problems $\mathbf{d}$.
That is, a deterministic updating rule maximises expected pragmatic utility by the lights of your prior just in case it is a conditionalizing rule for your prior.

As in the case of the DSA above, then, if we assume Deterministic Updating (DU), we can establish PC and DC, and on the back of those AC as well. After all, it is surely irrational to plan to update in one way when you expect another way to guide your actions better in the future; and it is surely irrational to be disposed to update in one way when you expect another to guide you better. And as before there are the same three arguments for AC on the back of PC and DC.

EPUA without Deterministic Updating


How does EPUA fare when we widen our view to include non-deterministic updating rules as well? An initial problem is that it is no longer clear how to define the pragmatic utility of such an updating rule relative to a decision problem at a possible world. Above, we said that, relative to a decision problem $\mathbf{d}$ and a selection function $f$, the pragmatic utility of rule $R$ at world $w$ is the utility of the option that you would choose when faced with $\mathbf{d}$ using the credence function that $R$ mandates at $w$ and $f$: that is, if $E_i$ is true at $w$, then
$$u_{\mathbf{d}, f}(R, w) = u(A^{\mathbf{d}}_{c_i, f}, w).$$
But, if $R$ is not deterministic, there might be no single credence function that it mandates at $w$. If $E_i$ is the piece of evidence you'll learn at $w$ and $R$ permits more than one credence function in response to $E_i$, then there might be a range of different options in $\mathbf{d}$, each of which maximises expected utility relative to a different credence function $c'$ in $C_i$. So what are we to do?

Our response to this problem depends on whether we wish to argue for Plan or Dispositional Conditionalization (PC or DC). Suppose, first, that we are interested in DC. That is, we are interested in a norm that governs the updating rule that records how you are disposed to update when you receive certain evidence. Then it seems reasonable to assume that the updating rule that records your dispositions is stochastic. That is, for each possible piece of evidence $E_i$ and each possible response $c'$ in $C_i$ to that evidence that you might adopt in response to receiving that evidence, there is some objective chance that you will respond to $E_i$ by adopting $c'$. As we explained above, we'll write this $P(R^i_{c'} | E_i)$, where $R^i_{c'}$ is the proposition that you receive $E_i$ and respond by adopting $c'$. Then, if $E_i$ is true at $w$,  we might take the pragmatic utility of $R$ relative to $\mathbf{d}$ and $f$ at $w$ to be the expectation of the utility of the options that each permitted response to $E_i$ (and selection function $f$) would lead us to choose:
$$u_{\mathbf{d}, f}(R, w) = \sum_{c' \in C_i} P(R^i_{c'} | E_i) u(A^{\mathbf{d}}_{c', f}, w)$$
With this in hand, we have the following result:

Theorem 5 Suppose $R$ and $R^\star$ are both updating rules. Then:
  • If $R$ and $R^\star$ are both conditionalizing rules for $c$, and $f$, $g$ are selection functions, then for all decision problems $\mathbf{d}$, $$\sum_{w \in W} c(w) u_{\mathbf{d}, f}(R, w) = \sum_{w \in W} c(w) u_{\mathbf{d}, g}(R^\star, w)$$
  • $R$ is a conditionalizing rule for $c$, and $R^\star$ is a stochastic but not conditionalizing rule, and $f$, $g$ are selection functions, then for all decision problems $\mathbf{d}$,$$\sum_{w \in W} c(w) u_{\mathbf{d}, f}(R, w) \geq \sum_{w \in W} c(w) u_{\mathbf{d}, g}(R^\star, w)$$with strictly inequality for some decision problems $\mathbf{d}$.
This shows the first difference between the DSA and EPUA. The latter, but not the former, provides a route to establishing Dispositional Conditionalization (DC). If we assume that your dispositions are governed by a chance function, and we use that chance function to calculate expectations, then we can show that your prior will expect your posteriors to do worse as a guide to action unless you are disposed to update by conditionalizing on the evidence you receive.

Next, suppose we are interested in Plan Conditionalization (PC). In this case, we might try to appeal again to Theorem 5. To do that, we must assume that, while there are non-deterministic updating rules that we might endorse, they are all at least stochastic updating rules; that is, they all come equipped with a probability function that determines how likely it is that I will adopt a particular permitted response to the evidence. That is, we might say that the updating rules that we might endorse are either deterministic or non-deterministic-but-stochastic. In the language of game theory, we might say that the updating strategies between which we choose are either pure or mixed. And then Theorem 5 will show that we should adopt a deterministic-and-conditionalizing rule, rather than any deterministic-but-non-conditionalizing or non-deterministic-but-stochastic rule. The problem with this proposal is that it seems just as arbitrary to restrict to deterministic and non-deterministic-but-stochastic rules as it was to restrict to deterministic rules in the first place. Why should we not be able to endorse a non-deterministic and non-stochastic rule---that is, a rule that says, for at least one possible piece of evidence $E_i$ in $\mathcal{E}$, there are two or more posteriors that the rule permits as responses, but does not endorse any chance mechanism by which we'll choose between them? But if we permit these rules, how are we to define their pragmatic utility relative to a decision problem and at a possible world?

Here's one suggestion. Suppose $E_i$ is the proposition in $\mathcal{E}$ that is true at world $w$. And suppose $\mathbf{d}$ is a decision problem and $f$ is a selection rule. Then we might take the pragmatic utility of $R$ relative to $\mathbf{d}$ and $f$ and at $w$ to be the average utility of the options that each permissible response to $E_i$ and $f$ would choose when faced with $\mathbf{d}$. That is,$$u_{\mathbf{d}, f}(R, w) = \frac{1}{|C_i|} \sum_{c' \in C_i}  u(A^{\mathbf{d}}_{c', f}, w)$$where $|C_i|$ is the size of $C_i$, that is, the number of possible responses to $E_i$ that $R$ permits. If that's the case, then we have the following:

Theorem 6 Suppose $R$ and $R^\star$ are updating rules. Then if $R$ is a conditionalizing rule for $c$, and $R^\star$ is not deterministic, not stochastic, and not a conditionalizing rule for $c$, and $f$, $g$ are selection functions, then for all decision problems $\mathbf{d}$,
$$\sum_{w \in W} c(w) u_{\mathbf{d}, f}(R, w) \geq \sum_{w \in W} c(w) u_{\mathbf{d}, f}(R^\star, w)$$with strictly inequality for some decision problems $\mathbf{d}$.

Put together with Theorems 4 and 5, this shows that our prior expects us to do better by endorsing a conditionalizing rule than by endorsing any other sort of rule, whether that is a deterministic and non-conditionalizing rule, a non-deterministic but stochastic rule, or a non-deterministic and non-stochastic rule.

So, again, we see a difference between DSA and EPUA. Just as the latter, but not the former, provides a route to establishing DC without assuming Deterministic Updating, so the latter but not the former provides a route to establishing PC without DU. And from both of those, we have the usual three routes to AC. This means that EPUA explains what might be irrational about endorsing a non-deterministic updating rule, or having dispositions that match one. If you do, there's some alternative updating rule that your prior expects to do better as a guide to future action.

Expected Epistemic Utility Argument (EEUA)


The previous two arguments criticized non-conditionalizing updating rules from the standpoint of pragmatic utility. The EEUA and EUDA both criticize such rules from the standpoint of epistemic utility. The idea is this: just as credences play a pragmatic role in guiding our actions, so they play other roles as well---they represent the world;  they respond to evidence; they might be more or less coherent. These roles are purely epistemic. And so just as we defined the pragmatic utility of a credence function at world when faced with a decision problem, so we can also define the epistemic utility of a credence function at a world---it is a measure of how valuable it is to have that credence function from a purely epistemic point of view.

EEUA with Deterministic Updating


We will not give an explicit definition of the epistemic utility of a credence function at a world. Rather, we'll simply state two properties that we'll take measures of such epistemic utility to have. These are widely assumed in the literature on epistemic utility theory and accuracy-first epistemology, and I'll defer to the arguments in favour of them that are outlined there (Joyce 2009, Pettigrew 2016, Horowitz 2019).

A local epistemic utility function is a function $s$ that takes a single credence and a truth value---either true (1) or false (0)---and returns the epistemic value of having that credence in a proposition with that truth value. Thus, $s(1, p)$ is the epistemic value of having credence $p$ in a truth, while $s(0, p)$ is the epistemic value of having credence $p$ in a falsehood. A global epistemic utility function is a function $EU$ that takes an entire credence function defined on $\mathcal{F}$ and a possible world and returns the epistemic value of having that credence function when the propositions in $\mathcal{F}$ have the truth values they have in that world.

Strict Propriety  A local epistemic utility function $s$ is strictly proper if each credence expects itself and only itself to have the greatest epistemic utility. That is, for all $0 \leq p \leq 1$,$$
ps(1, x) + (1-p) s(0, x)$$
is maximised, as a function of $x$ at $p = x$.

Additivity  A global epistemic utility function is additive if, for each proposition $X$ in $\mathcal{F}$, there is a local epistemic utility function $s_X$ such that the epistemic utility of a credence function $c$ at a possible world is the sum of the epistemic utilities at that world of the credences it assigns. If $w$ is a possible world and we write $w(X)$ for the truth value (0 or 1) of proposition $X$ at $w$, this says:$$EU(c, w) = \sum_{X \in \mathcal{F}} s_X(w(X), c(X))$$

We then define the epistemic utility of a deterministic updating rule $R$ in the same way we defined its pragmatic utility above: if $E_i$ is true at $w$, and $C_i = \{c_i\}$, then
$$EU(R, w) = EU(c_i, w)$$Then the standard formulation of the EEUA turns on the following theorem (Greaves & Wallace 2006):

Theorem 7 Suppose $R$ and $R^\star$ are deterministic updating rules. Then:
  • If $R$ and $R^\star$ are both conditionalizing rules for $c$, then$$\sum_{w \in W} c(w) EU(R, w) = \sum_{w \in W} c(w) EU(R^\star, w)$$
  • If $R$ is a conditionalizing rule for $c$ and $R^\star$ is not, then$$\sum_{w \in W} c(w) EU(R, w) > \sum_{w \in W} c(w) EU(R^\star, w)$$
That is, a deterministic updating rule maximises expected epistemic utility by the lights of your prior just in case it is a conditionalizing rule for your prior.
So, as for DSA and EPUA, if we assume Deterministic Updating, we obtain an argument for PC and DC, and indirectly one for AC too.

EEUA without Deterministic Updating


If we don't assume Deterministic Updating, the situation here is very similar to the one we encountered above when we considered EPUA. Suppose $R$ is a non-deterministic but stochastic updating rule. Then, as above, we let its epistemic utility at a world be the expectation of the epistemic utility that the various possible posteriors permitted by $R$ take at that world. That is, if $E_i$ is the proposition in $\mathcal{E}$ that is true at $w$, then$$EU(R, w) = \sum_{c' \in C_i} P(R_{c'} | E_i) EU(c', w)$$Then, we have a similar result to Theorem 5:

Theorem 8 Suppose $R$ and $R^\star$ are updating rules. Then if $R$ is a conditionalizing rule for $c$, and $R^\star$ is stochastic but not a conditionalizing rule for $c$, then
$$\sum_{w \in W} c(w) EU(R, w) > \sum_{w \in W} c(w) EU(R^\star, w)$$

Next, suppose $R$ is a non-deterministic but also a non-stochastic rule. Then we let its epistemic utility at a world be the average epistemic utility that the various possible posteriors permitted by $R$ take at that world. That is, if $E_i$ is the proposition in $\mathcal{F}$ that is true at $w$, then
$$EU(R, w) = \frac{1}{|C_i|}\sum_{c' \in C_i} EU(c', w)$$And again we have a similar result to Theorem 6:

Theorem 9 Suppose $R$ and $R^\star$ are updating rules. Then if $R$ is a conditionalizing rule for $c$, and $R^\star$ is not deterministic, not stochastic, and not a conditionalizing rule for $c$. Then:
$$\sum_{w \in W} c(w) EU(R, w) > \sum_{w \in W} c(w) EU(R^\star, w)$$

So the situation is the same as for EPUA. Whether we assess a rule by looking at how well the posteriors it produces guide our future actions, or how good they are from a purely epistemic point of view, our prior will expect a conditionalizing rule for itself to be better than any non-conditionalizing rule. And thus we obtain PC and DC, and indirectly AC as well.

Epistemic Utility Dominance Argument (EUDA)


Finally, we turn to the EUDA. In EPUA and EEUA, we assess the pragmatic or epistemic utility of the updating rule from the viewpoint of the prior. In DSA, we assess the prior and updating rule together, and from no particular point of view; but, unlike the EPUA and EEUA, we do not assign utilities, either pragmatic or epistemic, to the prior and the rule. In EUDA, like in DSA and unlike EPUA and EEUA, we assess the the prior and updating rule together, and again from no particular point of view; but unlike in DSA and like in EPUA and EEUA, we assign utilities to them---in particular, epistemic utilities---and assess them with reference to those.

EUDA with Deterministic Updating


Suppose $R$ is a deterministic updating rule. Then, as before, if $E_i$ is true at $w$, let the epistemic utility of $R$ be the epistemic utility of the credence function $c_i$ that it mandates at $w$: that is,$$EU(R, w) = EU(c_i, w).$$
But this time also let the epistemic utility of the pair $\langle c, R \rangle$ consisting of the prior and the updating rule be the sum of the epistemic utility of the prior and the epistemic utility of the updating rule: that is,$$EU(\langle c, R \rangle, w) = EU(c, w) + EU(R, w) = EU(c, w) + EU(c_i, w)$$
Then the EUDA turns on the following mathematical fact (Briggs & Pettigrew 2018):

Theorem 10  Suppose $EU$ is an additive, strictly proper epistemic utility function. And suppose $R$ and $R^\star$ are deterministic updating rules. Then:
  • if $\langle c, R \rangle$ is non-conditionalizing, there is $\langle c^\star, R^\star \rangle$ such that, for all $w$ $$EU(\langle c, R \rangle, w) < EU(\langle c^\star, R^\star \rangle, w))$$
  • if $\langle c, R \rangle$ is conditionalizing, there is no $\langle c^\star, R^\star \rangle$ such that, for all $w$ $$EU(\langle c, R \rangle, w) < EU(\langle c^\star, R^\star \rangle, w))$$
That is, if $R$ is not a conditionalizing rule for $c$, then together they are $EU$-dominated; if it is a conditionalizing rule, they are not. Thus, like EPUA and EEUA and unlike DSA, if we assume Deterministic Updating, EUDA gives PC, DC, and indirectly AC.

EUDA without Deterministic Updating


Now suppose we permit non-deterministic updating rules as well as deterministic ones. In this case, there are two approaches we might take. On the one hand, we might define the epistemic utility of non-deterministic rules, both stochastic and non-stochastic, just as we did for EEUA. That is, we might take the epistemic utility of a stochastic rule at a world to be the expectation of the epistemic utility of the various posteriors that it permits in response to the evidence that you obtain at that world; and the epistemic utility of a non-stochastic rule at a world is the average of those epistemic utilities. This gives us the following result:

Theorem 11  Suppose $EU$ is an additive, strictly proper epistemic utility function. Then, if $\langle c, R \rangle$ is not a conditionalizing pair, there is an alternative pair $\langle c^\star, R^\star \rangle$ such that, for all $w$, $$EU(\langle c, R \rangle, w) < EU(\langle c^\star, R^\star \rangle, w)$$And this therefore supports an argument for PC and DC and indirectly AC as well.

On the other hand, we might consider more fine-grained possible worlds, which specify not only the truth value of all the propositions in $\mathcal{F}$, but also which posterior I adopt. We can then ask: given a particular pair $\langle c, R \rangle$, is there an alternative pair $\langle c^\star, R^\star \rangle$ that has greater epistemic utility at every fine-grained world by the lights of $EU$? If we judge updating rules by this standard, we get a rather different answer. If $E_i$ is the element of $\mathcal{E}$ that is true at $w$, and $c'$ is in $C_i$ and $c^{\star \prime}$ is in $C^\star_i$, then we write $w\ \&\ R^i_{c'}\ \&\ R^{\star i}_{c^{\star \prime}}$ for the more fine-grained possible world we obtain from $w$ by adding that $R$ updates to $c'$ and $R^\star$ updates to $c^{\star\prime}$ upon receipt of $E_i$. And let
  • $EU(\langle c, R \rangle, w\ \&\ R^i_{c'}\ \&\ R^{\star i}_{c^{\star \prime}} ) = EU(c, w) + EU(c', w)$
  • $EU(\langle c^\star, R^\star \rangle, w\ \&\ R^i_{c'}\ \&\ R^{\star i}_{c^{\star \prime}} ) = EU(c^\star, w) + EU(c^{\star\prime}, w)$
Then:
Theorem 12  Suppose $EU$ is an additive, strictly proper epistemic utility function. Then:
  • If $\langle c, R \rangle$ is a pseudo-conditionalizing pair, there is no alternative pair $\langle c^\star, R^\star\rangle$ such that, for all $E_i$ in $\mathcal{E}$, $w$ in $E_i$, $c'$ in $C_i$ and $c^{\star\prime}$ in $C^\star_i$, $$EU(\langle c, R \rangle, w\ \&\ R^i_{c'}\ \&\ R^{\star i}_{c^{\star \prime}} ) < EU(\langle c^\star, R^\star \rangle, w\ \&\ R^i_{c'}\ \&\ R^{\star i}_{c^{\star \prime}})$$
  • There are pairs $\langle c, R \rangle$ that are non-conditionalizing and non-pseudo-conditionalizing for which there is no alternative pair $\langle c^\star, R^\star\rangle$ such that, for all $E_i$ in $\mathcal{E}$, $w$ in $E_i$, $c'$ in $C_i$ and $c^{\star\prime}$ in $C^\star_i$, $$EU(\langle c, R \rangle, w\ \&\ R^i_{c'}\ \&\ R^{\star i}_{c^{\star \prime}} ) < EU(\langle c^\star, R^\star \rangle, w\ \&\ R^i_{c'}\ \&\ R^{\star i}_{c^{\star \prime}})$$
Interpreted in this way, then, and without the assumption of Deterministic Updating, EUDA is the weakest of all the arguments. Where DSA at least establishes that your updating rule should be pseudo-conditionalizing for your prior, even if it does not establish that it should be conditionalizing, EUDA does not establish even that.

Conclusion


One upshot of this investigation is that, so long as we assume Deterministic Updating (DU), all four arguments support the same conclusions, namely, Plan and Dispositional Conditionalization, and also Actual Conditionalization. But once we drop DU, that agreement vanishes.

Without DU, DSA shows only that, if we plan to update using a particular rule, it should be a pseudo-conditionalizating rule for our prior; and similarly for our dispositions. As a result, it cannot support AC. Indeed, it can support only the weakest restrictions on our actual updating behaviour, since nearly any such behaviour can be seen as an implementation of a pseudo-conditionalizing rule.

EPUA and EEUA are much more hopeful. Let's consider our updating dispositions first. It seems natural to assume that, even if these are not deterministic, they are at least governed by some objective chances. If so, this gives a natural definition of the pragmatic and epistemic utilities of my updating dispositions at a world---they are expectations of pragmatic and epistemic utilities the posteriors, calculated using the objective chances. And, relative to that, we can in fact establish DU---we no longer need to assume it. With that in hand, we regain DC and two of the routes to AC.

Next, let's consider the updating plans we endorse. It also seems natural to assume that those plans, if not deterministic, might not be stochastic either. And, if that's the case, we can take their pragmatic or epistemic utility at a world to be the average pragmatic or epistemic utility of the different possible credence functions they endorse as responses to the evidence you gain at that world. And, relative to that, we can again establish DU. And with it PC and two of the routes to AC.

Finally, EUDA is a mixed bag. Understanding the epistemic and pragmatic utility of an updating rule as we have just described gives us DU and with it PC, DC, and AC. But if we take a fine-grained approach, we cannot even establish that your updating rule should be a pseudo-conditionalizing rule for your prior.

Proofs

For proofs of the theorems in this post, please see the paper version here.