The Accuracy Dominance Argument for Probabilism without the Additivity assumption

For a PDF of this post, see here.

One of the central arguments in accuracy-first epistemology -- the one that gets the project off the ground, I think -- is the accuracy-dominance argument for Probabilism. This started life in a more pragmatic guise in de Finetti's proof that, if your credences are not probabilistic, there are alternatives that would lose less than yours would if they were penalised using the Brier score, which levies a price of $(1-x)^2$ on every credence $x$ in a truth and $x^2$ on every credence $x$ in a falsehood. This was then adapted to an accuracy-based argument by Roger Rosencrantz, where he interpreted the Brier score as a measure of inaccuracy, not a penalty score. Interpreted thus, de Finetti's result says that any non-probabilistic credences are accuracy-dominated by some probabilistic credences. Jim Joyce then noted that this argument only establishes Probabilism if you have a further argument that inaccuracy should be measured by the Brier score. He thought there was no particular reason to think that's right, so he greatly generalized de Finetti's result to show that, relative to a much wider range of inaccuracy measures, all non-probabilistic credences are accuracy dominated. One problem with this, which Al Hájek pointed out, was that he didn't give a converse argument -- that is, he didn't show that, for each of his inaccuracy measures, each probabilistic credence function is not accuracy dominated. Joel Predd and his Princeton collaborators then addressed this concern and proved a very general result, namely, that for any additive, continuous, and strictly proper inaccuracy measure, any non-probabilistic credences are accuracy-dominated, while no probabilistic credences are.

That brings us to this blogpost. Additivity is a controversial claim. It says that the inaccuracy of a credence function is the (possibly weighted) sum of the inaccuracies of the credences it assigns. So the question arises: can we do without additivity? In this post, I'll give a quick proof of the accuracy-dominance argument that doesn't assume anything about the inaccuracy measures other than that they are continuous and strictly proper. Anyone familiar with the Predd, et al. paper will see that the proof strategy draws very heavily on theirs. But it bypasses out the construction of the Bregman divergence that corresponds to the strictly proper inaccuracy measure. For that, you'll have to wait for Jason Konek's forthcoming work...

Suppose:
  • $\mathcal{F}$ is a set of propositions;
  • $\mathcal{W} = \{w_1, \ldots, w_n\}$ be the set of possible worlds relative to $\mathcal{F}$;
  • $\mathcal{C}$ be the set of credence functions on $\mathcal{F}$;
  • $\mathcal{P}$ be the set of probability functions on $\mathcal{F}$. So, by de Finetti's theorem, $\mathcal{P} = \{v_w : w \in \mathcal{W}\}^+$. If $p$ is in $\mathcal{P}$, we write $p_i$ for $p(w_i)$.
Theorem Suppose $\mathfrak{I}$ is a strictly proper inaccuracy measure on the credence functions in $\mathcal{F}$. Then if $c$ is not in $\mathcal{P}$, there is $c^\star$ in $\mathcal{P}$ such that, for all $w_i$ in $\mathcal{W}$,
$$
\mathfrak{I}(c^\star, w_i) < \mathfrak{I}(c, w_i)
$$

Proof. We begin by defining a divergence $\mathfrak{D} : \mathcal{P} \times \mathcal{C} \rightarrow [0, \infty]$ that takes a probability function $p$ and a credence function $c$ and measures the divergence from the former to the latter:
$$
\mathfrak{D}(p, c) = \sum_i p_i  \mathfrak{I}(c, w_i) - \sum_i p_i \mathfrak{I}(p, w_i)
$$
Three quick points about $\mathfrak{D}$.

(1) $\mathfrak{D}$ is a divergence. Since $\mathfrak{I}$ is strictly proper, $\mathfrak{D}(p, c) \geq 0$ with equality iff $c = p$.

(2) $\mathfrak{D}(v_{w_i}, c) = \mathfrak{I}(c, w_i)$, for all $w_i$ in $\mathcal{W}$.

(3) $\mathfrak{D}$ is strictly convex in its first argument.  Suppose $p$ and $q$ are in $\mathcal{P}$, and suppose $0 < \lambda < 1$. Then let $r = \lambda p + \lambda q$. Then, since $\sum_i p_i\mathfrak{I}(c, w_i)$ is uniquely minimized, as a function of $c$, at $c = p$, and $\sum_i q_i\mathfrak{I}(c, w_i)$ is uniquely minimized, as a function of $c$, at $c = q$, we have$$\begin{eqnarray*}
\sum_i p_i \mathfrak{I}(c, w_i) & < & \sum_i p_i \mathfrak{I}(r, w_i) \\
\sum_i q_i \mathfrak{I}(c, w_i) & < & \sum_i q_i \mathfrak{I}(r, w_i)
\end{eqnarray*}$$Thus

 $\lambda [-\sum_i p_i \mathfrak{I}(p, w_i)] + (1-\lambda) [-\sum_i q_i \mathfrak{I}(q, w_i)] >$

$ \lambda [-\sum_i p_i \mathfrak{I}(r, w_i)] + (1-\lambda) [-\sum_i q_i \mathfrak{I}(r, w_i)] = $

$-\sum_i r_i \mathfrak{I}(r, w_i)$

Now, adding

$\lambda \sum_i p_i \mathfrak{I}(c, w_i) + (1-\lambda)\sum_i q_i\mathfrak{I}(c, w_i) =$

$\sum_i (\lambda p_i + (1-\lambda)q_i) \mathfrak{I}(c, w_i) = \sum_i r_i \mathfrak{I}(c, w_i)$

to both sides gives

$\lambda [\sum_i p_i \mathfrak{I}(c, w_i)-\sum_i p_i \mathfrak{I}(p, w_i)]+ $

$(1-\lambda) [\sum_i q_i\mathfrak{I}(c, w_i)-\sum_i q_i \mathfrak{I}(q, w_i)] > $

 $\sum_i r_i \mathfrak{I}(c, w_i)-\sum_i r_i \mathfrak{I}(r, w_i)$

That is,$$\lambda \mathfrak{D}(p, c) + (1-\lambda) \mathfrak{D}(q, c) > \mathfrak{D}(\lambda p + (1-\lambda)q, c)$$as required.

Now, suppose $c$ is not in $\mathcal{P}$. Then, since $\mathcal{P}$ is a closed convex set, there is a unique $c^\star$ in $\mathcal{P}$ that minimizes $\mathfrak{D}(x, c)$ as a function of $x$. Now, suppose $p$ is in $\mathcal{P}$. We wish to show that$$\mathfrak{D}(p, c) \geq \mathfrak{D}(p, c^\star) + \mathfrak{D}(c^\star, c)$$We can see that this holds iff$$\sum_i (p_i - c^\star_i) (\mathfrak{I}(c, w_i) - \mathfrak{I}(c^\star, w_i)) \geq 0$$After all,
$$\begin{eqnarray*}
& & \mathfrak{D}(p, c) - \mathfrak{D}(p, c^\star) - \mathfrak{D}(c^\star, c) \\
& = & [\sum_i p_i \mathfrak{I}(c, w_i) - \sum_i p_i \mathfrak{I}(p, w_i)] - \\
&& [\sum_i p_i \mathfrak{I}(c^\star, w_i) - \sum_i p_i \mathfrak{I}(p, w_i)] - \\
&& [\sum_i c^\star_i \mathfrak{I}(c, w_i) - \sum_i c^\star_i \mathfrak{I}(c^\star, w_i)] \\
& = & \sum_i (p_i - c^\star_i)(\mathfrak{I}(c, w_i) - \mathfrak{I}(c^\star, w_i))
\end{eqnarray*}$$
Now we prove this inequality. We begin by observing that, since $p$, $c^\star$ are in $\mathcal{P}$, since $\mathcal{P}$ is convex, and since $\mathfrak{D}(x, c)$ is minimized uniquely at $x = c^\star$, if $0 < \varepsilon < 1$, then$$\frac{1}{\varepsilon}[\mathfrak{D}(\varepsilon p + (1-\varepsilon) c^\star, c) - \mathfrak{D}(c^\star, c)] > 0$$Expanding that, we get

$\frac{1}{\varepsilon}[\sum_i (\varepsilon p_i + (1- \varepsilon) c^\star_i)\mathfrak{I}(c, w_i) -$

$\sum_i (\varepsilon p_i + (1-\varepsilon)c^\star_i)\mathfrak{I}(\varepsilon p + (1-\varepsilon) c^\star, w_i) - $

$\sum_i c^\star_i\mathfrak{I}(c, w_i) + \sum_i c^\star_i \mathfrak{I}(c^\star,  i)] > 0$\medskip

 So

$\frac{1}{\varepsilon}[\sum_i ( c^\star_i + \varepsilon(p_i - c^\star_i))\mathfrak{I}(c, w_i) -$

$\sum_i ( c^\star_i + \varepsilon(p_i-c^\star_i))\mathfrak{I}(\varepsilon p + (1-\varepsilon) c^\star, w_i) - $

$\sum_i c^\star_i\mathfrak{I}(c, w_i) + \sum_i c^\star_i \mathfrak{I}(c^\star, w_i)] > 0 $\medskip

 So\medskip

$\sum_i (p_i - c^\star_i)(\mathfrak{I}(c, w_i) - \mathfrak{I}(\varepsilon p+ (1-\varepsilon) c^\star), w_i) +$

$ \frac{1}{\varepsilon}[\sum_i c^\star_i \mathfrak{I}(c^\star, w_i) - \sum_ic^\star_i \mathfrak{I}(\varepsilon p + (1-\varepsilon) c^\star, w_i)] > 0$\medskip

Now, since $\mathfrak{I}$ is strictly proper,
$$\frac{1}{\varepsilon}[\sum_i c^\star_i \mathfrak{I}(c^\star, w_i) - \sum_ic^\star_i \mathfrak{I}(\varepsilon p + (1-\varepsilon) c^\star, w_i)] < 0$$
So, for all $\varepsilon > 0$,$$\sum_i (p_i - c^\star_i)(\mathfrak{I}(c, w_i) - \mathfrak{I}(\varepsilon p+ (1-\varepsilon) c^\star, w_i) > 0$$
So, since $\mathfrak{I}$ is continuous$$\sum_i (p_i - c^\star_i)(\mathfrak{I}(c, w_i) - \mathfrak{I}(c^\star, w_i)) \geq 0$$which is what we wanted to show. So, by above,$$\mathfrak{D}(p,c) \geq \mathfrak{D}(p, c^\star) + \mathfrak{D}(c^\star, c) $$In particular, since each $w_i$ is in $\mathcal{P}$,$$\mathfrak{D}(v_{w_i}, c) \geq \mathfrak{D}(v_{w_i}, c^\star) + \mathfrak{D}(c^\star, c)$$But, since $c^\star$ is in $\mathcal{P}$ and $c$ is not, and since $\mathfrak{D}$ is a divergence, $\mathfrak{D}(c^\star, c) > 0$. So$$\mathfrak{I}(c, w_i) = \mathfrak{D}(v_{w_i}, c) > \mathfrak{D}(v_{w_i}, c^\star) = \mathfrak{I}(c^\star, w_i)$$as required. $\Box$




Comments

  1. What do you mean when you say that the inaccuracy measure is continuous, and how do you know this to be true? Is it an additional assumption, or does it follow from strict propriety somehow?

    ReplyDelete
    Replies
    1. Sorry, you’re right — I should have included this explicitly. The result covers all continuous strictly proper inaccuracy measures.

      Delete
  2. In the proof of the Theorem, in part (3), where you claim that D is strictly convex in its first argument, I don't see how the first pair of offset strict inequalities could hold. In the argument, c is supposed to be fixed and arbitrary. So for all you've said, we could have c=r there, and the inequalities would not be strict.

    ReplyDelete
  3. I think the proof in (3) that $D$ is strictly convex in its first argument might be incorrect, and I don't see why this result should hold in general. In the part of the proof where you say "Now adding...to both sides gives...," the quantities you're adding could be infinite, in which case the strict inequality will not be preserved. In general, in order for $D$ to be strictly convex in its first argument, there can be no $c$ such that $J(c,w) = \infty$ for every $w$ (otherwise $D$ is a constant function ($= \infty$) in its first argument, and therefore not strictly convex). But I don't see that continuity and strict propriety rule out the possibility that there is such a $c$.

    ReplyDelete
  4. Also, even if $D$ were strictly convex in its first argument, that wouldn't imply (as you claim it does in your paper) that it attains a minimum on a closed convex set. For example, the real function on $[0,1]$ defined by $f(x) = 1-x$ if $x \in [0,1)$ and $f(1) = 1$ is strictly convex, but it does not attain a minimum due to the discontinuity on the boundary. In order to show that $D$ attains a minimum, you need to appeal to some kind of continuity property for $D$.

    ReplyDelete

Post a Comment