### How inaccurate is your total doxastic state?

I've written a lot on this blog about ways in which we might measure the inaccuracy of an agent when she has precise numerical credences in propositions. I've tried to describe the various ways in which philosophers have tried to use such measures to help argue for different principles of rationality that govern these credences. For instance, Jim Joyce has argued that credences should satisfy the axioms of the probability calculus because any non-probabilistic credences are accuracy-dominated by probabilistic credences: that is, if $c$ is a non-probabilistic credence function, there is a probabilistic credence function $c^*$ such that $c^*$ is guaranteed to be more accurate than $c$.

Of course much of the epistemological literature is concerned with agents who have quite different sorts of doxastic attitudes. It is concerned with agents who have not credences, which we might think of as partial beliefs, but rather agents who have full or all-or-nothing or categorical beliefs. One might wonder whether we can also describe ways of measuring the inaccuracy of these doxastic attitudes. It turns out that we can. The principles of rationality that follow have been investigated by (amongst others) Hempel, Maher, Easwaran, and Fitelson. I'll describe some of the inaccuracy measures below.

This raises a question. Suppose you think that credences and full beliefs are both genuine doxastic attitudes, neither of which can be reduced to the other. Then it is natural to think that the inaccuracy of one's total doxastic state is the sum of the inaccuracy of the credal part and the inaccuracy of the full belief part. Now suppose that you think that, while neither sort of attitude can be reduced to the other, there is a tight connection between them for rational believers. Indeed, you accept a normative version of the Lockean thesis: that is, you say that an agent should have a belief in $p$ iff her credence in $p$ is at least $t$ (for some threshold $0.5 < t \leq 1$) and she should have a disbelief in $p$ iff her credence in $p$ is at most $1-t$. Then it turns out that something rather unfortunate happens. Joyce's accuracy dominance argument for probabilism described above fails. It now turns out that there are non-probabilistic credence functions with the following properties: while they are accuracy-dominated, the rational total doxastic state that they generate via the normative Lockean thesis -- that is, the total doxastic state that includes those credences together with the full beliefs or disbeliefs that the normative Lockean thesis demands -- is not accuracy-dominated by any other total doxastic state that satisfies the normative Lockean thesis.

Let's see how this happens. We need three ingredients:

The inaccuracy of a credence $x$ in proposition $X$ at world $w$ is given by the quadratic scoring rule:

$$

i(x, w) = \left \{ \begin{array}{ll}

(1-x)^2 & \mbox{if $X$ is true at $w$} \\

x_k & \mbox{if $X$ is false at $w$}

\end{array}

\right.

$$

Suppose $c = \{c_1, \ldots, c_n\}$ is a set of credences on a set of propositions $\mathbf{F} = \{X_1, \ldots, X_n\}$. The inaccuracy of the whole credence function is given as follows:

$$

I(c, w) = \sum_k i(c_k, w)

$$

Suppose $\mathbf{B} = \{b_1, \ldots, b_n\}$ is a set of beliefs and disbeliefs on a set of propositions $\mathbf{F} = \{X_1, \ldots, X_n\}$. Thus, each $b_k$ is either a belief in $X_k$ (denoted $B(X_k)$), a disbelief in $X_k$ (denoted $D(X_k)$), or a suspension of judgment in $X_k$ (denoted $S(X_k)$). Then we measure the inaccuracy of attitude $b$ in proposition $X$ at world $w$ is given as follows: there is a reward $R$ for a true belief or a false disbelief; there is a penalty $W$ for a false belief or a true disbelief; and suspensions receive neither penalty nor reward regardless of the truth of the proposition in question. We assume $R, W > 0$. Since we are interested in measuring inaccuracy rather than accuracy, the reward then makes a negative contribution to inaccuracy and the penalty makes a positive contribution. Thus:

$$

i(B(X), w) = \left \{\begin{array}{ll}

-R & \mbox{if $X$ is true at $w$} \\

W & \mbox{if $X$ is false at $w$}

\end{array}

\right.

$$

$$

i(S(X), w) = \left \{\begin{array}{ll}

0 & \mbox{if $X$ is true at $w$} \\

0 & \mbox{if $X$ is false at $w$}

\end{array}

\right.

$$

$$

i(D(X), w) = \left \{ \begin{array}{ll}

W & \mbox{if $X$ is true at $w$} \\

-R & \mbox{if $X$ is false at $w$}

\end{array}

\right.

$$

This then generates an inaccuracy measure on a set of beliefs $\mathbf{B}$ as follows:

$$

I(\mathbf{B}, w) = \sum_k i(b_k, w)

$$

Hempel noticed that, if $R = W$ and $p$ is a probability function, then: $B(X)$ uniquely minimises expected utility by the lights of $p$ iff $p(X) > 0.5$; $D(X)$ uniquely maximises expected utility by the lights of $p$ iff $p(X) < 0.5$; $S(X)$ maximises expected utility iff $p(X_k) = 0.5$, but in that situation, $B(X)$ and $D(X)$ do too. Easwaran has investigated what happens if $R \neq W$.

For some $0.5 < t \leq 1$:

We can now put these three ingredients together to give an inaccuracy measure for a total doxastic state that satisfies the normative Lockean thesis. We state the measure as a measure of the inaccuracy of a credence $x$ in proposition $X$ at world $w$, since any total doxastic state that satisfies the normative Lockean thesis is completely determined by the credal part.

$$

i_t(x, w) = \left \{ \begin{array}{ll}

(1-x)^2 - R & \mbox{if } t \leq x \leq 1\mbox{ and } X \mbox{ is true} \\

(1-x)^2 & \mbox{if } 1- t < x < t\mbox{ and } X \mbox{ is true} \\

(1-x)^2 + W & \mbox{if } 0 \leq x \leq t\mbox{ and } X \mbox{ is true} \\

x^2 + W & \mbox{if } t \leq x \leq 1\mbox{ and } X \mbox{ is false} \\

x^2 & \mbox{if } 1- t < x < t \mbox{ and } X \mbox{ is false}\\

x^2 - R & \mbox{if } 0 \leq x \leq t \mbox{ and } X \mbox{ is false}\\

\end{array}

\right.

$$

Finally, we give the total inaccuracy of such a doxastic state:

$$

I_t(c, w) = \sum_k i_t(c_k, w)

$$

Three things are interesting about this inaccuracy measure. First, unlike the inaccuracy measures we usually deal with, it's discontinuous. The inaccuracy of $x$ in $X$ is discontinuous at $t$ and at $1-t$. If $X$ is true, this is because, as $x$ crosses the Lockean threshold $t$, it gives rise to a true belief, whose reward contributes negatively to the inaccuracy; and as it crosses the other Lockean threshold $1-t$, it gives rise to a true disbelief, whose penalty contributes positively to the inaccuracy.

Second, the measure is proper. That is, each probabilistic set of credences expects itself to be amongst the least inaccurate.

Third, as mentioned above, there are non-probabilistic credence functions that are not accuracy-dominated when inaccuracy is measured by $I_t$. Consider the following example.

The following figure helps us to see why.

Here, we plot the possible credence functions on $\mathbf{F} = \{X, \neg X\}$ on the unit square. The dotted lines represent the Lockean thresholds: a belief threshold for $X$ and a disbelief threshold for $X$; and similarly for $\neg X$. The undotted diagonal line include all the probabilistically coherent credence functions; that is, those for which the credence in $X$ and the credence in $\neg X$ sum to 1. $c$ is the credence function described above. It is probabilistically incoherent. The lower right-hand arc includes all the possible credence functions that are exactly as inaccurate as $c$ when $X$ is true and inaccuracy is measured by $I$. The upper left-hand arc includes all the possible credence functions that are exactly as inaccurate as $c$ when $\neg X$ is true and inaccuracy is measured by $I$.

Note that, in line with Joyce's accuracy-domination argument for probabilism, $c$ is $I$-dominated. It is $I$-dominated by all of the credence functions that lie between the two arcs. Some of these -- namely, those that also lie on the diagonal line -- are not themselves $I$-dominated. This seems to rule out $c$ as irrational. But of course, when we are considering not only the inaccuracy of $c$ but also the inaccuracy of the beliefs and disbeliefs to which $c$ gives rise in line with the Lockean thesis, our measure of inaccuracy is $I_t$, not $I$. Notice that all the credence functions that $I$-dominate $c$ do not $I_t$-dominate it. The reason is that every such credence function assigns $X$ a credence less than 0.6. Thus, none of them give rise to a full belief in $X$. As a result, the decrease in $I$ that is obtained by moving to one of these does not exceed $R$, which is the accuracy 'boost' obtained by having the true belief in $X$ to which $c$ gives rise. By checking cases, we can see further that no other credence function $I_t$-dominates $c$.

Is this a problem? That depends on whether one takes credences and beliefs to be two separate, but related doxastic states. If one does, and if one accepts further that the Lockean thesis describes the way in which they are related, then $I_t$ seems the natural way to measure the total doxastic state that arises when both are present. But then one loses the accuracy-domination argument for probabilism. However, one might avoid this conclusion if one were to say that, really, there are only credence functions; and that beliefs, to the extent they exist at all, are reducible to credences. That is, if one were to take the Lockean thesis to be a reductionist claim rather than a normative claim, it would seem natural to measure the inaccuracy of a credence function using $I$ instead of $I_t$. While one would still say that, as a credence in $X$ moves across the Lockean threshold for belief, it gives rise to a new belief, it would no longer seem right to think that this discontinuous change in doxastic state should give rise to a discontinuous change in inaccuracy; for the new belief is not really a genuinely new doxastic state; it is rather a way of classifying the credal state.

Of course much of the epistemological literature is concerned with agents who have quite different sorts of doxastic attitudes. It is concerned with agents who have not credences, which we might think of as partial beliefs, but rather agents who have full or all-or-nothing or categorical beliefs. One might wonder whether we can also describe ways of measuring the inaccuracy of these doxastic attitudes. It turns out that we can. The principles of rationality that follow have been investigated by (amongst others) Hempel, Maher, Easwaran, and Fitelson. I'll describe some of the inaccuracy measures below.

This raises a question. Suppose you think that credences and full beliefs are both genuine doxastic attitudes, neither of which can be reduced to the other. Then it is natural to think that the inaccuracy of one's total doxastic state is the sum of the inaccuracy of the credal part and the inaccuracy of the full belief part. Now suppose that you think that, while neither sort of attitude can be reduced to the other, there is a tight connection between them for rational believers. Indeed, you accept a normative version of the Lockean thesis: that is, you say that an agent should have a belief in $p$ iff her credence in $p$ is at least $t$ (for some threshold $0.5 < t \leq 1$) and she should have a disbelief in $p$ iff her credence in $p$ is at most $1-t$. Then it turns out that something rather unfortunate happens. Joyce's accuracy dominance argument for probabilism described above fails. It now turns out that there are non-probabilistic credence functions with the following properties: while they are accuracy-dominated, the rational total doxastic state that they generate via the normative Lockean thesis -- that is, the total doxastic state that includes those credences together with the full beliefs or disbeliefs that the normative Lockean thesis demands -- is not accuracy-dominated by any other total doxastic state that satisfies the normative Lockean thesis.

Let's see how this happens. We need three ingredients:

#### Inaccuracy for credences

The inaccuracy of a credence $x$ in proposition $X$ at world $w$ is given by the quadratic scoring rule:

$$

i(x, w) = \left \{ \begin{array}{ll}

(1-x)^2 & \mbox{if $X$ is true at $w$} \\

x_k & \mbox{if $X$ is false at $w$}

\end{array}

\right.

$$

Suppose $c = \{c_1, \ldots, c_n\}$ is a set of credences on a set of propositions $\mathbf{F} = \{X_1, \ldots, X_n\}$. The inaccuracy of the whole credence function is given as follows:

$$

I(c, w) = \sum_k i(c_k, w)

$$

#### Inaccuracy for beliefs

Suppose $\mathbf{B} = \{b_1, \ldots, b_n\}$ is a set of beliefs and disbeliefs on a set of propositions $\mathbf{F} = \{X_1, \ldots, X_n\}$. Thus, each $b_k$ is either a belief in $X_k$ (denoted $B(X_k)$), a disbelief in $X_k$ (denoted $D(X_k)$), or a suspension of judgment in $X_k$ (denoted $S(X_k)$). Then we measure the inaccuracy of attitude $b$ in proposition $X$ at world $w$ is given as follows: there is a reward $R$ for a true belief or a false disbelief; there is a penalty $W$ for a false belief or a true disbelief; and suspensions receive neither penalty nor reward regardless of the truth of the proposition in question. We assume $R, W > 0$. Since we are interested in measuring inaccuracy rather than accuracy, the reward then makes a negative contribution to inaccuracy and the penalty makes a positive contribution. Thus:

$$

i(B(X), w) = \left \{\begin{array}{ll}

-R & \mbox{if $X$ is true at $w$} \\

W & \mbox{if $X$ is false at $w$}

\end{array}

\right.

$$

$$

i(S(X), w) = \left \{\begin{array}{ll}

0 & \mbox{if $X$ is true at $w$} \\

0 & \mbox{if $X$ is false at $w$}

\end{array}

\right.

$$

$$

i(D(X), w) = \left \{ \begin{array}{ll}

W & \mbox{if $X$ is true at $w$} \\

-R & \mbox{if $X$ is false at $w$}

\end{array}

\right.

$$

This then generates an inaccuracy measure on a set of beliefs $\mathbf{B}$ as follows:

$$

I(\mathbf{B}, w) = \sum_k i(b_k, w)

$$

Hempel noticed that, if $R = W$ and $p$ is a probability function, then: $B(X)$ uniquely minimises expected utility by the lights of $p$ iff $p(X) > 0.5$; $D(X)$ uniquely maximises expected utility by the lights of $p$ iff $p(X) < 0.5$; $S(X)$ maximises expected utility iff $p(X_k) = 0.5$, but in that situation, $B(X)$ and $D(X)$ do too. Easwaran has investigated what happens if $R \neq W$.

#### Lockean thesis

For some $0.5 < t \leq 1$:

- A rational agent has a belief in $X$ iff $c(X) \geq t$;
- A rational agent has a disbelief in $X$ iff $c(X) \leq 1-t$;
- A rational agent suspends judgment in $X$ iff $1-t < c(X) < t$.

#### Inaccuracy for total doxastic state

We can now put these three ingredients together to give an inaccuracy measure for a total doxastic state that satisfies the normative Lockean thesis. We state the measure as a measure of the inaccuracy of a credence $x$ in proposition $X$ at world $w$, since any total doxastic state that satisfies the normative Lockean thesis is completely determined by the credal part.

$$

i_t(x, w) = \left \{ \begin{array}{ll}

(1-x)^2 - R & \mbox{if } t \leq x \leq 1\mbox{ and } X \mbox{ is true} \\

(1-x)^2 & \mbox{if } 1- t < x < t\mbox{ and } X \mbox{ is true} \\

(1-x)^2 + W & \mbox{if } 0 \leq x \leq t\mbox{ and } X \mbox{ is true} \\

x^2 + W & \mbox{if } t \leq x \leq 1\mbox{ and } X \mbox{ is false} \\

x^2 & \mbox{if } 1- t < x < t \mbox{ and } X \mbox{ is false}\\

x^2 - R & \mbox{if } 0 \leq x \leq t \mbox{ and } X \mbox{ is false}\\

\end{array}

\right.

$$

Finally, we give the total inaccuracy of such a doxastic state:

$$

I_t(c, w) = \sum_k i_t(c_k, w)

$$

Three things are interesting about this inaccuracy measure. First, unlike the inaccuracy measures we usually deal with, it's discontinuous. The inaccuracy of $x$ in $X$ is discontinuous at $t$ and at $1-t$. If $X$ is true, this is because, as $x$ crosses the Lockean threshold $t$, it gives rise to a true belief, whose reward contributes negatively to the inaccuracy; and as it crosses the other Lockean threshold $1-t$, it gives rise to a true disbelief, whose penalty contributes positively to the inaccuracy.

Second, the measure is proper. That is, each probabilistic set of credences expects itself to be amongst the least inaccurate.

Third, as mentioned above, there are non-probabilistic credence functions that are not accuracy-dominated when inaccuracy is measured by $I_t$. Consider the following example.

- $\mathbf{F} = \{X, \neg X\}$. That is, our agent has credences only in two propositions.
- $c(X) = 0.6$ and $c(\neg X) = 0.5$.
- $R = 0.4$, $W = 0.6$. That is, the penalty for a false belief or true disbelief is fifty percent higher than the reward for a true belief.
- $t = 0.6$. That is, a rational agent has a belief in $X$ iff her credence is at least than 0.6; and she has a disbelief in $X$ iff her credence is at most 0.4. It's worth noting that, for probabilistic agents who specify $R$ and $W$ as we just have, satisfying the Lockean thesis with $t = 0.6$ will always minimize expected inaccuracy.

The following figure helps us to see why.

Here, we plot the possible credence functions on $\mathbf{F} = \{X, \neg X\}$ on the unit square. The dotted lines represent the Lockean thresholds: a belief threshold for $X$ and a disbelief threshold for $X$; and similarly for $\neg X$. The undotted diagonal line include all the probabilistically coherent credence functions; that is, those for which the credence in $X$ and the credence in $\neg X$ sum to 1. $c$ is the credence function described above. It is probabilistically incoherent. The lower right-hand arc includes all the possible credence functions that are exactly as inaccurate as $c$ when $X$ is true and inaccuracy is measured by $I$. The upper left-hand arc includes all the possible credence functions that are exactly as inaccurate as $c$ when $\neg X$ is true and inaccuracy is measured by $I$.

Note that, in line with Joyce's accuracy-domination argument for probabilism, $c$ is $I$-dominated. It is $I$-dominated by all of the credence functions that lie between the two arcs. Some of these -- namely, those that also lie on the diagonal line -- are not themselves $I$-dominated. This seems to rule out $c$ as irrational. But of course, when we are considering not only the inaccuracy of $c$ but also the inaccuracy of the beliefs and disbeliefs to which $c$ gives rise in line with the Lockean thesis, our measure of inaccuracy is $I_t$, not $I$. Notice that all the credence functions that $I$-dominate $c$ do not $I_t$-dominate it. The reason is that every such credence function assigns $X$ a credence less than 0.6. Thus, none of them give rise to a full belief in $X$. As a result, the decrease in $I$ that is obtained by moving to one of these does not exceed $R$, which is the accuracy 'boost' obtained by having the true belief in $X$ to which $c$ gives rise. By checking cases, we can see further that no other credence function $I_t$-dominates $c$.

Is this a problem? That depends on whether one takes credences and beliefs to be two separate, but related doxastic states. If one does, and if one accepts further that the Lockean thesis describes the way in which they are related, then $I_t$ seems the natural way to measure the total doxastic state that arises when both are present. But then one loses the accuracy-domination argument for probabilism. However, one might avoid this conclusion if one were to say that, really, there are only credence functions; and that beliefs, to the extent they exist at all, are reducible to credences. That is, if one were to take the Lockean thesis to be a reductionist claim rather than a normative claim, it would seem natural to measure the inaccuracy of a credence function using $I$ instead of $I_t$. While one would still say that, as a credence in $X$ moves across the Lockean threshold for belief, it gives rise to a new belief, it would no longer seem right to think that this discontinuous change in doxastic state should give rise to a discontinuous change in inaccuracy; for the new belief is not really a genuinely new doxastic state; it is rather a way of classifying the credal state.

Great post, Richard!

ReplyDeleteFor my stability account of belief, I have two accuracy arguments which work like this (and which I both discuss in the monograph that I am writing):

(i) In the first one, inaccuracy is--as usual--distance from the truth. But I use different inaccuracy measures for degrees of belief and for all-or-nothing belief (where belief is analyzed, say, on an ordinal scale, such as in belief revision theory or nonmonotonic reasoning, though the approach also works for belief on a strictly categorical scale); for degrees of belief one may use the Brier score, while for belief I use a class of inaccuracy measures that are generalizations of Hempel's or of your i-functions from above and which also include, e.g., Branden's (from his recent work on inaccuracy for orders of propositions) as a special case. Then I make an additional assumption, and that is: all-or-nothing belief on an ordinal scale is given by a total pre-order < on worlds (rather than propositions); a total pre-order on propositions can be determined from that order on worlds, as this is done in belief revision theory or nonmonotonic reasoning, but the primary object is the order on worlds. (I leave out any defense of that additional assumption here.) Finally, I formulate an accuracy norm simultaneously for the degree of belief function P and for belief as given by the ordering <: the pair (P, <) ought to be such that P minimizes expected inaccuracy relative to P, and < minimizes expected inaccuracy relative to P, where the respective inaccuracy measures in the two cases are as sketched above. One can then prove a theorem to the effect that if (P, <) satisfies the norm, then P is a probability measure (that's just the standard arguments from the literature repeated), and < has the kind of stability property (relative to P) that I want to argue for in my theory.

(ii) In the second approach, which I won't explain here in any detail, I also do something like the above, however, this time I determine the inaccuracy of < not with respect to truth but instead with respect to P: taking a subjective probability measure P (and the order on propositions that it induces) as given, I formulate a norm to the effect that < ought to approximate P to best possible extent (the justification being that (P, <) should be, as it were, in "maximal harmony" or coherence with each other). I make precise that this means, and then I prove again that the < that minimize inaccuracy relative to P are precisely those that have the stability property (relative to P) that I aim to defend.

[Continued in part 2.]

[Part 2:]

ReplyDeleteThe framework is open to different interpretations, but my intended interpretation is that neither P nor < (that is, belief) ought to be eliminated, and neither ought to be reduced to the other either, rather each of the two of them has a life on its own; however, in order for the agent who has such a degree of belief function P and a belief ordering < simultaneously to be rational overall, P and < must satisfy a certain bridge principle, and that is just the stability account that I am advocating. (i) above aims to justify that account by considerations on accuracy with respect to truth on both sides, that is, for both P and <; (ii) above aims to justify that account by considerations on accuracy with respect to truth for P (that's just the standard arguments again), and accuracy with respect to P for belief. In that second approach, belief might still be said to aim at truth, but only indirectly: belief aims at P, and P aims at truth.

In my intended interpretation, I don't regard either of P or belief to be prior to the other conceptually, or epistemologically, or ontologically, though I would want to say, of course, that P occupies a more complex and fine-grained scale of measurement than belief does, which is also why there will always be some kind of asymmetry between the two of them in terms of "information content" (and this shows up in the theory at various places).

Finally, about the Lockean thesis: if degrees of belief and belief satisfy either of the norms formulated above, then one can prove there is always a threshold, such that the corresponding instance of the Lockean thesis for that very threshold must hold as well. So the stability account entails an instance of the Lockean thesis (but only with a special threshold). The difference to your way of proceeding is then that the Lockean thesis is but a corollary to accuracy considerations in my approach, while in your approach the Lockean thesis is presupposed already in the accuracy considerations themselves.

Doesn't setting the level at 0.6 imply the assumption that there are not degrees of validity to the positive end of credences or beliefs?

ReplyDeleteAm I confused on this one?