## Thursday, 16 July 2020

### Taking risks and picking priors

For a PDF of this post, see here.

In my last couple of posts (here and here), I've been discussing decision rules for situations in which probabilities over the possible states of the world are not available for some reason. Perhaps your evidence is too sparse, and points in no particular direction, or too complex, and points in too many. So subjective probabilities are not available. And perhaps you simply don't know the objective probabilities. In those situations, you can't appeal to standard subjective expected utility theory of the sort described by Ramsey, Savage, Jeffrey, etc., nor to objective expected utility theory of the sort described by von Neumann & Morgenstern. What, then, are you to do? As I've discussed in the previous posts, this was in fact a hot topic in the early days of decision theory. Abraham Wald discussed the Maximin approach, Leonid Hurwicz expanded that to give his Criterion, Franco Modigliani mentioned Maximax approaches, and Leonard Savage discussed Minimax Regret. Perhaps the culmination of this research programme was John Milnor's elegant 'Games against Nature' in 1951 (revised in 1954), where he provided simple axiomatic characterisations of each of these decision rules. In the first post on this, I pointed out a problem with Hurwicz's approach (independently noted in this draft paper by Johan Gustafsson); in the second, I expanded that approach to avoid the problem.

Perhaps unsurprisingly, my interest in these decision rules stems from the possibility of applying them in accuracy-first epistemology. On that approach in formal epistemology, we determine the rationality of credences by considering the extent to which they promote what we take to be the sole epistemic good for credences, namely, accuracy. And we make this precise by applying decision theory to the question of which credence functions are rational. Thus: first, in place of the array of acts or options that are the usual focus of standard applications of decision theory, we substitute the different possible credence functions you might adopt; second, in place of the utility function that measures the value of acts or options at different possible states of the world, we substitute mathematical functions that measure the accuracy of a possible credence function at a possible state of the world. Applying a decision rule then identifes the rationally permissible credence functions. For instance, in the classic paper by Jim Joyce in 1998, he applies a dominance rule to show that only probabilistic credence functions are rationally permissible.

Now, one of the central questions in the epistemology of credences is the problem of the priors. Before I receive any evidence at all, which credences should I have? What should my ur-priors be, if you like? What should a superbaby's credences be, as David Lewis would put it? Now, people have reasonable concerns about the very idea of a superbaby -- this cognitive agent who has no evidence whatsoever but is nonetheless equipped with the conceptual resources to formulate a rich algebra of propositions. Evidence and conceptual resources grow together, after all. However, the problem of the priors arises even when you do have lots of evidence about other topics, but take yourself to have none that bears on a new topic in which you have yet to set your credences. And indeed it arises even if you do have credences that bear on the new topic, but you don't think they are justified, or they're inadmissible for the purpose to which you wish to use the credences -- for instance, if you are a scientist producing the priors for the Bayesian model you will use in your paper. So I think we needn't see the superbaby as a problematic idealization, but as a way of representing a situation in which we in fact quite often find ourselves.

So, what credences should a superbaby choose? The key point is that they have no recourse to any subjective or objective probabilities. So they can't appeal to expected accuracy to set their credences. Thus, we might naturally look to Wald, Hurwicz, Milnor, etc. to see what they might use instead. In this paper from 2016, I explored what Maximin requires, and in this follow-up later that year, I explored what the Hurwicz Criterion mandates. In this post, I'd like to explore what the Generalized Hurwicz Criterion stated in the previous blogpost requires of your credences. Let's remind ourselves of this decision rule.

Generalized Hurwicz Criterion (GHC) Suppose the set of possible states of the world is $W = \{w_1, \ldots, w_n\}$. Pick $0 \leq \alpha_1, \ldots, \alpha_n \leq 1$ with $\alpha_1 + \ldots + \alpha_n = 1$, and denote this sequence of weights $A$. If $a$ is an option defined on $W$ and$$a(w_{i_1}) \geq a(w_{i_2}) \geq \ldots \geq a(w_{i_n})$$then let$$H^A(a) = \alpha_1a(w_{i_1}) + \ldots + \alpha_na(w_{i_n})$$Pick an option that maximises $H^A$.

So $\alpha_1$ weights the best-case utility, $\alpha_2$ the second best, and so on down to $\alpha_n$, which weights the worst-case utility. We then sum these weighted utilities to give the generalised Hurwicz score and we choose in order to maximise this.

Now, suppose that our options are credence functions, and the utility of a credence function is given by an accuracy measure $\mathfrak{I}$. In fact, for our purposes here, we'll consider only credence functions defined over three worlds $w_1, w_2, w_3$. Things get complicated pretty fast here, and there will be plenty of interest in this simple case. As usual in accuracy-first epistemology, we'll say that the accuracy of your credence function is determined as follows:

(1) $\mathfrak{s}(1, x)$ is your measure of accuracy for credence $x$ in a truth,

(2) $\mathfrak{s}(0, x)$ is your measure of accuracy for credence $x$ in a falsehood,

(3) $\mathfrak{s}$ is strictly proper, so that, for all $0 \leq p \leq 1$, $p\mathfrak{s}(1, x) + (1-p)\mathfrak{s}(0, x)$ is maximised, as a function of $x$, at $x = p$,

(4) if $c$ is a credence function defined on $w_1, w_2, w_3$, and we write $c_i$ for $c(w_i)$, then

• $\mathfrak{I}(c, w_1) = \mathfrak{s}(1, c_1) + \mathfrak{s}(0, c_2) + \mathfrak{s}(0, c_3)$
• $\mathfrak{I}(c, w_2) = \mathfrak{s}(0, c_1) + \mathfrak{s}(1, c_2) + \mathfrak{s}(0, c_3)$
• $\mathfrak{I}(c, w_3) = \mathfrak{s}(0, c_1) + \mathfrak{s}(0, c_2) + \mathfrak{s}(1, c_3)$
So the accuracy of your credence function at a world is the sum of the accuracy at that world of the credences that it assigns.

Now, given a credence function on $w_1, w_2, w_3$, we represent it by the triple $(c_1, c_2, c_3)$. And let's set our generalized Hurwicz weights to be $\alpha_1, \alpha_2, \alpha_3$. The first thing to note is that, if $(c_1, c_2, c_3)$ minimizes $H^A_\mathfrak{I}$, then so does any permutation of it---that is,$$(c_1, c_2, c_3),\ \ (c_1, c_3, c_2),\ \ (c_2, c_1, c_3),\ \ (c_2, c_3, c_1),\ \ (c_3, c_1, c_2),\ \ (c_3, c_2, c_1)$$all minimize $H^A_\mathfrak{I}$ if any of one of them does. The reason is that the generalized Hurwicz score for the three-world case depends on the best, middle, and worst inaccuracies for a credence function, and those are exactly the same for those six credence functions, even though the best, middle, and worst inaccuracies occur at different worlds for each. This means that, in order to find the minimizers, we only need to seek those for which $c_1 \geq c_2 \geq c_3$, since all others will be permutations of those. Let $\mathfrak{X} = \{(c_1, c_2, c_3)\, |\, c_1 \geq c_2 \geq c_3\}$. Since the accuracy measure $\mathfrak{I}$ is strictly proper and therefore truth-directed, for each $c$ in $\mathfrak{X}$,$$\mathfrak{I}(c, w_1) \geq \mathfrak{I}(c, w_2) \geq \mathfrak{I}(c, w_3)$$And so$$H^A_\mathfrak{I}(c) = \alpha_1 \mathfrak{I}(c, w_1) + \alpha_2\mathfrak{I}(c, w_2) + \alpha_3 \mathfrak{I}(c, w_3)$$
That means that $H^A_\mathfrak{I}(c)$ is the expected inaccuracy of $c$ by the lights of the credence function $(\alpha_1, \alpha_2, \alpha_3)$ generated by the Hurwicz weights. This allows us to calculate each case. As Catrin Campbell-Moore helped me to see, it turns out that the minimizer does not depend on which strictly proper scoring rule you use---each gives the same. In the first column of the table below, I list the different possible orderings of the three Hurwicz weights. In two cases, specifying that order is not sufficient to determine the minimizer. To do that, you also have to know the absolute values of some of the weights. Where necessary, I include those in the second column. In the third column, I specify the member of $\mathfrak{X}$ that minimizes $H^A_\mathfrak{I}$ relative to those weights.$$\begin{array}{c|c|ccc} \mbox{Ordering of} & \mbox{Further properties} & c_1 & c_2 & c_3\\ \mbox{the weights} & \mbox{of the weights} & && \\ &&&\\ \hline &&&\\ \alpha_1 \leq \alpha_2 \leq \alpha_3 & - & \frac{1}{3} & \frac{1}{3} & \frac{1}{3} \\ &&&\\ \alpha_1 \leq \alpha_3 \leq \alpha_2 & \alpha_1 + \alpha_2 \geq \frac{2}{3} & \frac{\alpha_1 + \alpha_2}{2} & \frac{\alpha_1 + \alpha_2}{2} & \alpha_3 \\ &&&\\ \alpha_1 \leq \alpha_3 \leq \alpha_2 & \alpha_1 + \alpha_2 \leq \frac{2}{3} & \frac{1}{3} & \frac{1}{3} & \frac{1}{3} \\ &&&\\ \alpha_2 \leq \alpha_1 \leq \alpha_3 & \alpha_1 \leq \frac{1}{3}& \frac{1}{3} & \frac{1}{3} & \frac{1}{3} \\ &&&\\ \alpha_2 \leq \alpha_1 \leq \alpha_3 & \alpha_1 > \frac{1}{3} & \alpha_1 & \frac{\alpha_2 + \alpha_3}{2} & \frac{\alpha_2 + \alpha_3}{2} \\ &&&\\ \alpha_2 \leq \alpha_3 \leq \alpha_1 &- & \alpha_1 & \frac{\alpha_2 + \alpha_3}{2} & \frac{\alpha_2 + \alpha_3}{2} \\ &&&\\ \alpha_3 \leq \alpha_1 \leq \alpha_2 &- & \frac{\alpha_1 + \alpha_2}{2} & \frac{\alpha_1 + \alpha_2}{2} & \alpha_3 \\ &&&\\ \alpha_3 \leq \alpha_2 \leq \alpha_1 &- & \alpha_1 & \alpha_2 & \alpha_3 \end{array}$$
In the following diagram, we plot the different possible Hurwicz weights in a barycentric plot, so that the bottom left corner of the triangle is $(1, 0, 0)$, the bottom-right is $(0, 1, 0)$ and the top is $(0, 0, 1)$. We then divide this into four regions. If your weights $A = (\alpha_1, \alpha_2, \alpha_3)$ lie in a given region, then the triple I've placed in that region gives the credence function that minimizes the Generalized Hurwicz Score $H^A$ for those weights. Note, the bottom left triangle is $\mathfrak{X}$. Essentially, to find which member of $\mathfrak{X}$ a given weighting demands, you plot that weighting in this diagram and then find the closest member of $\mathfrak{X}$. You can use Euclidean closeness for this purpose, since all strictly proper accuracy measures will give that same result.

First, for any probabilistic credence function, there are Hurwicz weights for which that credence function is a minimizer. If $c = (c_1, c_2, c_3)$ and $c_{i_1} \geq c_{i_2} \geq c_{i_3}$, then the weights are $c_{i_1}, c_{i_2}, c_{i_3}$.

Second, in my earlier paper, I noted that Maximin demands the uniform distribution. So you might see the uniform distribution as the maximally risk-averse option. And indeed you can read something like that into William James' remark that the person who always suspends judgment has an irrational aversion to being a dupe. But actually it turns out that quite a wide range of attitudes to risk--in this case represented by generalised Hurwicz weights in the upper trapezoid region--demand the uniform distribution.

Third, in my first blogpost about all this, I criticized Hurwicz's original version of his criterion for being incompatible with a natural dominance norm, which says that if one option is always at least as good as another and sometimes better, then it should be strictly preferred. Suppose we apply this strengthening to Weak Dominance in our characterization of the Generalized Hurwicz Criterion in the previous blogpost. Then we obtain a slightly more demanding version of the Generalized Hurwicz Criterion that requires that each of your Hurwicz weightings is strictly positive. And if we do that, then it's no longer true that any probabilistic credence function can be rationalised in this way. Only regular ones can be. So that plausible strengthening of Weak Dominance leads us to an argument for Regularity, the principle that says that you should not assign extremal credences to propositions that are not tautologies or contradictions.