For a PDF of this post, see here.

One of the central arguments in accuracy-first epistemology -- the one that gets the project off the ground, I think -- is the accuracy-dominance argument for Probabilism. This started life in a more pragmatic guise in de Finetti's proof that, if your credences are not probabilistic, there are alternatives that would lose less than yours would if they were penalised using the Brier score, which levies a price of $(1-x)^2$ on every credence $x$ in a truth and $x^2$ on every credence $x$ in a falsehood. This was then adapted to an accuracy-based argument by Roger Rosencrantz, where he interpreted the Brier score as a measure of inaccuracy, not a penalty score. Interpreted thus, de Finetti's result says that any non-probabilistic credences are accuracy-dominated by some probabilistic credences. Jim Joyce then noted that this argument only establishes Probabilism if you have a further argument that inaccuracy should be measured by the Brier score. He thought there was no particular reason to think that's right, so he greatly generalized de Finetti's result to show that, relative to a much wider range of inaccuracy measures, all non-probabilistic credences are accuracy dominated. One problem with this, which Al Hájek pointed out, was that he didn't give a converse argument -- that is, he didn't show that, for each of his inaccuracy measures, each probabilistic credence function is not accuracy dominated. Joel Predd and his Princeton collaborators then addressed this concern and proved a very general result, namely, that for any additive, continuous, and strictly proper inaccuracy measure, any non-probabilistic credences are accuracy-dominated, while no probabilistic credences are.

That brings us to this blogpost. Additivity is a controversial claim. It says that the inaccuracy of a credence function is the (possibly weighted) sum of the inaccuracies of the credences it assigns. So the question arises: can we do without additivity? In this post, I'll give a quick proof of the accuracy-dominance argument that doesn't assume anything about the inaccuracy measures other than that they are continuous and strictly proper. Anyone familiar with the Predd, et al. paper will see that the proof strategy draws very heavily on theirs. But it bypasses out the construction of the Bregman divergence that corresponds to the strictly proper inaccuracy measure. For that, you'll have to wait for Jason Konek's forthcoming work...

Suppose:

- $\mathcal{F}$ is a set of propositions;
- $\mathcal{W} = \{w_1, \ldots, w_n\}$ be the set of possible worlds relative to $\mathcal{F}$;
- $\mathcal{C}$ be the set of credence functions on $\mathcal{F}$;
- $\mathcal{P}$ be the set of probability functions on $\mathcal{F}$. So, by de Finetti's theorem, $\mathcal{P} = \{v_w : w \in \mathcal{W}\}^+$. If $p$ is in $\mathcal{P}$, we write $p_i$ for $p(w_i)$.

**Theorem**Suppose $\mathfrak{I}$ is a strictly proper inaccuracy measure on the credence functions in $\mathcal{F}$. Then if $c$ is not in $\mathcal{P}$, there is $c^\star$ in $\mathcal{P}$ such that, for all $w_i$ in $\mathcal{W}$,

$$

\mathfrak{I}(c^\star, w_i) < \mathfrak{I}(c, w_i)

$$

*Proof*. We begin by defining a divergence $\mathfrak{D} : \mathcal{P} \times \mathcal{C} \rightarrow [0, \infty]$ that takes a probability function $p$ and a credence function $c$ and measures the divergence from the former to the latter:

$$

\mathfrak{D}(p, c) = \sum_i p_i \mathfrak{I}(c, w_i) - \sum_i p_i \mathfrak{I}(p, w_i)

$$

Three quick points about $\mathfrak{D}$.

(1) $\mathfrak{D}$ is a divergence. Since $\mathfrak{I}$ is strictly proper, $\mathfrak{D}(p, c) \geq 0$ with equality iff $c = p$.

(2) $\mathfrak{D}(v_{w_i}, c) = \mathfrak{I}(c, w_i)$, for all $w_i$ in $\mathcal{W}$.

(3) $\mathfrak{D}$ is strictly convex in its first argument. Suppose $p$ and $q$ are in $\mathcal{P}$, and suppose $0 < \lambda < 1$. Then let $r = \lambda p + \lambda q$. Then, since $\sum_i p_i\mathfrak{I}(c, w_i)$ is uniquely minimized, as a function of $c$, at $c = p$, and $\sum_i q_i\mathfrak{I}(c, w_i)$ is uniquely minimized, as a function of $c$, at $c = q$, we have$$\begin{eqnarray*}

\sum_i p_i \mathfrak{I}(c, w_i) & < & \sum_i p_i \mathfrak{I}(r, w_i) \\

\sum_i q_i \mathfrak{I}(c, w_i) & < & \sum_i q_i \mathfrak{I}(r, w_i)

\end{eqnarray*}$$Thus

$\lambda [-\sum_i p_i \mathfrak{I}(p, w_i)] + (1-\lambda) [-\sum_i q_i \mathfrak{I}(q, w_i)] >$

$ \lambda [-\sum_i p_i \mathfrak{I}(r, w_i)] + (1-\lambda) [-\sum_i q_i \mathfrak{I}(r, w_i)] = $

$-\sum_i r_i \mathfrak{I}(r, w_i)$

Now, adding

$\lambda \sum_i p_i \mathfrak{I}(c, w_i) + (1-\lambda)\sum_i q_i\mathfrak{I}(c, w_i) =$

$\sum_i (\lambda p_i + (1-\lambda)q_i) \mathfrak{I}(c, w_i) = \sum_i r_i \mathfrak{I}(c, w_i)$

to both sides gives

$\lambda [\sum_i p_i \mathfrak{I}(c, w_i)-\sum_i p_i \mathfrak{I}(p, w_i)]+ $

$(1-\lambda) [\sum_i q_i\mathfrak{I}(c, w_i)-\sum_i q_i \mathfrak{I}(q, w_i)] > $

$\sum_i r_i \mathfrak{I}(c, w_i)-\sum_i r_i \mathfrak{I}(r, w_i)$

That is,$$\lambda \mathfrak{D}(p, c) + (1-\lambda) \mathfrak{D}(q, c) > \mathfrak{D}(\lambda p + (1-\lambda)q, c)$$as required.

Now, suppose $c$ is not in $\mathcal{P}$. Then, since $\mathcal{P}$ is a closed convex set, there is a unique $c^\star$ in $\mathcal{P}$ that minimizes $\mathfrak{D}(x, c)$ as a function of $x$. Now, suppose $p$ is in $\mathcal{P}$. We wish to show that$$\mathfrak{D}(p, c) \geq \mathfrak{D}(p, c^\star) + \mathfrak{D}(c^\star, c)$$We can see that this holds iff$$\sum_i (p_i - c^\star_i) (\mathfrak{I}(c, w_i) - \mathfrak{I}(c^\star, w_i)) \geq 0$$After all,

$$\begin{eqnarray*}

& & \mathfrak{D}(p, c) - \mathfrak{D}(p, c^\star) - \mathfrak{D}(c^\star, c) \\

& = & [\sum_i p_i \mathfrak{I}(c, w_i) - \sum_i p_i \mathfrak{I}(p, w_i)] - \\

&& [\sum_i p_i \mathfrak{I}(c^\star, w_i) - \sum_i p_i \mathfrak{I}(p, w_i)] - \\

&& [\sum_i c^\star_i \mathfrak{I}(c, w_i) - \sum_i c^\star_i \mathfrak{I}(c^\star, w_i)] \\

& = & \sum_i (p_i - c^\star_i)(\mathfrak{I}(c, w_i) - \mathfrak{I}(c^\star, w_i))

\end{eqnarray*}$$

Now we prove this inequality. We begin by observing that, since $p$, $c^\star$ are in $\mathcal{P}$, since $\mathcal{P}$ is convex, and since $\mathfrak{D}(x, c)$ is minimized uniquely at $x = c^\star$, if $0 < \varepsilon < 1$, then$$\frac{1}{\varepsilon}[\mathfrak{D}(\varepsilon p + (1-\varepsilon) c^\star, c) - \mathfrak{D}(c^\star, c)] > 0$$Expanding that, we get

$\frac{1}{\varepsilon}[\sum_i (\varepsilon p_i + (1- \varepsilon) c^\star_i)\mathfrak{I}(c, w_i) -$

$\sum_i (\varepsilon p_i + (1-\varepsilon)c^\star_i)\mathfrak{I}(\varepsilon p + (1-\varepsilon) c^\star, w_i) - $

$\sum_i c^\star_i\mathfrak{I}(c, w_i) + \sum_i c^\star_i \mathfrak{I}(c^\star, i)] > 0$\medskip

So

$\frac{1}{\varepsilon}[\sum_i ( c^\star_i + \varepsilon(p_i - c^\star_i))\mathfrak{I}(c, w_i) -$

$\sum_i ( c^\star_i + \varepsilon(p_i-c^\star_i))\mathfrak{I}(\varepsilon p + (1-\varepsilon) c^\star, w_i) - $

$\sum_i c^\star_i\mathfrak{I}(c, w_i) + \sum_i c^\star_i \mathfrak{I}(c^\star, w_i)] > 0 $\medskip

So\medskip

$\sum_i (p_i - c^\star_i)(\mathfrak{I}(c, w_i) - \mathfrak{I}(\varepsilon p+ (1-\varepsilon) c^\star), w_i) +$

$ \frac{1}{\varepsilon}[\sum_i c^\star_i \mathfrak{I}(c^\star, w_i) - \sum_ic^\star_i \mathfrak{I}(\varepsilon p + (1-\varepsilon) c^\star, w_i)] > 0$\medskip

Now, since $\mathfrak{I}$ is strictly proper,

$$\frac{1}{\varepsilon}[\sum_i c^\star_i \mathfrak{I}(c^\star, w_i) - \sum_ic^\star_i \mathfrak{I}(\varepsilon p + (1-\varepsilon) c^\star, w_i)] < 0$$

So, for all $\varepsilon > 0$,$$\sum_i (p_i - c^\star_i)(\mathfrak{I}(c, w_i) - \mathfrak{I}(\varepsilon p+ (1-\varepsilon) c^\star, w_i) > 0$$

So, since $\mathfrak{I}$ is continuous$$\sum_i (p_i - c^\star_i)(\mathfrak{I}(c, w_i) - \mathfrak{I}(c^\star, w_i)) \geq 0$$which is what we wanted to show. So, by above,$$\mathfrak{D}(p,c) \geq \mathfrak{D}(p, c^\star) + \mathfrak{D}(c^\star, c) $$In particular, since each $w_i$ is in $\mathcal{P}$,$$\mathfrak{D}(v_{w_i}, c) \geq \mathfrak{D}(v_{w_i}, c^\star) + \mathfrak{D}(c^\star, c)$$But, since $c^\star$ is in $\mathcal{P}$ and $c$ is not, and since $\mathfrak{D}$ is a divergence, $\mathfrak{D}(c^\star, c) > 0$. So$$\mathfrak{I}(c, w_i) = \mathfrak{D}(v_{w_i}, c) > \mathfrak{D}(v_{w_i}, c^\star) = \mathfrak{I}(c^\star, w_i)$$as required. $\Box$

$\lambda [\sum_i p_i \mathfrak{I}(c, w_i)-\sum_i p_i \mathfrak{I}(p, w_i)]+ $

$(1-\lambda) [\sum_i q_i\mathfrak{I}(c, w_i)-\sum_i q_i \mathfrak{I}(q, w_i)] > $

$\sum_i r_i \mathfrak{I}(c, w_i)-\sum_i r_i \mathfrak{I}(r, w_i)$

That is,$$\lambda \mathfrak{D}(p, c) + (1-\lambda) \mathfrak{D}(q, c) > \mathfrak{D}(\lambda p + (1-\lambda)q, c)$$as required.

Now, suppose $c$ is not in $\mathcal{P}$. Then, since $\mathcal{P}$ is a closed convex set, there is a unique $c^\star$ in $\mathcal{P}$ that minimizes $\mathfrak{D}(x, c)$ as a function of $x$. Now, suppose $p$ is in $\mathcal{P}$. We wish to show that$$\mathfrak{D}(p, c) \geq \mathfrak{D}(p, c^\star) + \mathfrak{D}(c^\star, c)$$We can see that this holds iff$$\sum_i (p_i - c^\star_i) (\mathfrak{I}(c, w_i) - \mathfrak{I}(c^\star, w_i)) \geq 0$$After all,

$$\begin{eqnarray*}

& & \mathfrak{D}(p, c) - \mathfrak{D}(p, c^\star) - \mathfrak{D}(c^\star, c) \\

& = & [\sum_i p_i \mathfrak{I}(c, w_i) - \sum_i p_i \mathfrak{I}(p, w_i)] - \\

&& [\sum_i p_i \mathfrak{I}(c^\star, w_i) - \sum_i p_i \mathfrak{I}(p, w_i)] - \\

&& [\sum_i c^\star_i \mathfrak{I}(c, w_i) - \sum_i c^\star_i \mathfrak{I}(c^\star, w_i)] \\

& = & \sum_i (p_i - c^\star_i)(\mathfrak{I}(c, w_i) - \mathfrak{I}(c^\star, w_i))

\end{eqnarray*}$$

Now we prove this inequality. We begin by observing that, since $p$, $c^\star$ are in $\mathcal{P}$, since $\mathcal{P}$ is convex, and since $\mathfrak{D}(x, c)$ is minimized uniquely at $x = c^\star$, if $0 < \varepsilon < 1$, then$$\frac{1}{\varepsilon}[\mathfrak{D}(\varepsilon p + (1-\varepsilon) c^\star, c) - \mathfrak{D}(c^\star, c)] > 0$$Expanding that, we get

$\frac{1}{\varepsilon}[\sum_i (\varepsilon p_i + (1- \varepsilon) c^\star_i)\mathfrak{I}(c, w_i) -$

$\sum_i (\varepsilon p_i + (1-\varepsilon)c^\star_i)\mathfrak{I}(\varepsilon p + (1-\varepsilon) c^\star, w_i) - $

$\sum_i c^\star_i\mathfrak{I}(c, w_i) + \sum_i c^\star_i \mathfrak{I}(c^\star, i)] > 0$\medskip

So

$\frac{1}{\varepsilon}[\sum_i ( c^\star_i + \varepsilon(p_i - c^\star_i))\mathfrak{I}(c, w_i) -$

$\sum_i ( c^\star_i + \varepsilon(p_i-c^\star_i))\mathfrak{I}(\varepsilon p + (1-\varepsilon) c^\star, w_i) - $

$\sum_i c^\star_i\mathfrak{I}(c, w_i) + \sum_i c^\star_i \mathfrak{I}(c^\star, w_i)] > 0 $\medskip

So\medskip

$\sum_i (p_i - c^\star_i)(\mathfrak{I}(c, w_i) - \mathfrak{I}(\varepsilon p+ (1-\varepsilon) c^\star), w_i) +$

$ \frac{1}{\varepsilon}[\sum_i c^\star_i \mathfrak{I}(c^\star, w_i) - \sum_ic^\star_i \mathfrak{I}(\varepsilon p + (1-\varepsilon) c^\star, w_i)] > 0$\medskip

Now, since $\mathfrak{I}$ is strictly proper,

$$\frac{1}{\varepsilon}[\sum_i c^\star_i \mathfrak{I}(c^\star, w_i) - \sum_ic^\star_i \mathfrak{I}(\varepsilon p + (1-\varepsilon) c^\star, w_i)] < 0$$

So, for all $\varepsilon > 0$,$$\sum_i (p_i - c^\star_i)(\mathfrak{I}(c, w_i) - \mathfrak{I}(\varepsilon p+ (1-\varepsilon) c^\star, w_i) > 0$$

So, since $\mathfrak{I}$ is continuous$$\sum_i (p_i - c^\star_i)(\mathfrak{I}(c, w_i) - \mathfrak{I}(c^\star, w_i)) \geq 0$$which is what we wanted to show. So, by above,$$\mathfrak{D}(p,c) \geq \mathfrak{D}(p, c^\star) + \mathfrak{D}(c^\star, c) $$In particular, since each $w_i$ is in $\mathcal{P}$,$$\mathfrak{D}(v_{w_i}, c) \geq \mathfrak{D}(v_{w_i}, c^\star) + \mathfrak{D}(c^\star, c)$$But, since $c^\star$ is in $\mathcal{P}$ and $c$ is not, and since $\mathfrak{D}$ is a divergence, $\mathfrak{D}(c^\star, c) > 0$. So$$\mathfrak{I}(c, w_i) = \mathfrak{D}(v_{w_i}, c) > \mathfrak{D}(v_{w_i}, c^\star) = \mathfrak{I}(c^\star, w_i)$$as required. $\Box$