Wednesday, 6 January 2021

Life on the edge: a response to Schultheis' challenge to epistemic permissivism about credences

In their 2018 paper, 'Living on the Edge', Ginger Schultheis issues a powerful challenge to epistemic permissivism about credences, the view that there are bodies of evidence in response to which there are a number of different credence functions it would be rational to adopt. The heart of the argument is the claim that a certain sort of situation is impossible. Schultheis thinks that all motivations for permissivism must render situations of this sort possible. Therefore, permissivism must be false, or at least these motivations for it must be wrong.

Here's the situation, where we write $R_E$ for the set of credence functions that it is rational to have when your total evidence is $E$. 

  • Our agent's total evidence is $E$.
  • There is $c$ in $R_E$ that our agent knows is a rational response to $E$.
  • There is $c'$ in $R_E$ that our agent does not know is a rational response to $E$.

Schultheis claims that the permissivist must take this to be possible, whereas in fact it is impossible. Here are a couple of specific examples that the permissivist will typically take to be possible.

Example 1: we might have a situation in which the credences it is rational to assign to a proposition $X$ in response to evidence $E$ form the interval $[0.4, 0.7]$. But we might not be sure of quite the extent of the interval. For all we know, it might be $[0.41, 0.7]$ or $[0.39, 0.71]$. Or it might be $[0.4, 0.7]$. So we are sure that $0.5$ is a rational credence in $X$, but we're not sure whether $0.4$ is a rational credence in $X$. In this case, $c(X) = 0.5$ and $c'(X) = 0.4$.

Example 2: you know that Probablism is a rational requirement on credence functions, and you know that satisfying the Principle of Indifference is rationally permitted, but you don't know whether or not it is also rationally required. In this case, $c$ is the uniform distribution required by the Principle of Indifference, but $c'$ is any other probability function.

Schultheis then appeals to a principle called Weak Rationality Dominance. We say that one credence function $c$ rationally dominates another $c'$ if $c$ is rational in all worlds in which $c'$ is rational, and also rational in some worlds in which $c'$ is not rational. Weak Rationality Dominance says that it is irrational to adopt a rationally dominated credence function. The important consequence of this for Schultheis' argument is that, if you know that $c$ is rational, but you don't know whether $c'$ is, then $c'$ is irrational. As a result, in our example above, $c'$ is not rational, contrary to what the permissivist claims, because it is rationally dominated by $c$. So permissivism must be false.

If Weak Rationality Dominance is correct, then, it follows that the permissivist must say that, for any body of evidence $E$ and set $R_E$ of rational responses, the agent with evidence $E$ either must know of each credence function in $R_E$ that it is in $R_E$, or they must not know of any credence function in $R_E$ that it is in $R_E$. If they know of some credence functions in $R_E$ that they are in $R_E$ and not know of others in $R_E$ that they are in $R_E$, then they clash with Weak Rationality Dominance. But, whatever your reason for being a permissivist, it seems very likely that it will entail situations in which there are some credence functions that are rational responses to your evidence and that you know are such responses, while you are unsure about other credence functions that are, in fact, rational responses whether or not they are, in fact, rational responses. This is Schultheis' challenge.

I'd like to explore a response to Schultheis' argument that takes issue with Weak Rationality Dominance (WRD). I'll spell out the objection in general to begin with, and then see how it plays out for a specific motivation for permissivism, namely, the Jamesian motivation I sketched in this previous blogpost

One worry about WRD is that it seems to entail a deference principle of exactly the sort that I objected to in this blogpost. According to such deference principles, for certain agents in certain situations, if they learn of a credence function that it is rational, they should adopt it. For instance, Ben Levinstein claims that, if you are certain that you are irrational, and you learn that $c$ is rational, then you should adopt $c$ -- or at least you should have the conditional credences that would lead you to do this if you were to apply conditionalization. We might slightly strengthen Levinstein's version of the deference principle as follows: if you are unsure whether you are rational or not, and you learn that $c$ is rational, then you should adopt $c$. WRD entails this deference principle. After all, suppose you have credence function $c'$, and you are unsure whether or not it is rational. And suppose you learn that $c$ is rational (and don't thereby learn that $c'$ is as well). Then, according to Schultheis' principle, you are irrational if you stick with $c'$.

In the previous blogpost, I objected to Levinstein's deference principle, and others like it, because it relies on the assumption that all rational credence functions are better than all irrational credence functions. I think that's false. I think there are certain sorts of flaw that render you irrational, and lacking those flaws renders you rational. But lacking those flaws doesn't ensure that you're going to be better than someone who has those flaws. Consider, for instance, the extreme subjective Bayesian who justifies their position using an accuracy dominance argument of the sort pioneered by Jim Joyce. That is, they say that accuracy is the sole epistemic good for credence functions. And they say that non-probabilistic credence functions are irrational because, for any such credence function, there are probabilistic ones that accuracy dominate them; and all probabilistic credence functions are rational because, for any such credence function, there is no probabilistic one that accuracy dominates it. Now, suppose I have credence $0.91$ in $X$ and $0.1$ in $\overline{X}$. And suppose I am either sure that this is irrational, or I'm uncertain it is. I then learn that assigning credence $0.1$ to $X$ and $0.9$ to $\overline{X}$ is rational. What should I do? It isn't at all obvious to me that I should move from my credence function to the one I've learned is rational. After all, even from my slightly incoherent standpoint, it's possible to see that the rational one is going to be a lot less accurate than mine if $X$ is true, and I'm very confident that it is. 

So I think that the rational deference principle is wrong, and therefore any version of WRD that entails it is also wrong. But perhaps there is a more restricted version of WRD that is right. And one that is nonetheless capable of sinking permissivism. Consider, for instance, a restricted version of WRD that applies only to agents who have no credence function --- that is, it applies to your initial choice of a credence function; it does not apply when you have a credence function and you are deciding whether to adopt a new one. This makes a difference. The problem with a version that applies when you already have a credence function $c'$ is that, even if it is irrational, it might nonetheless be better than the rational credence function $c$ in some situation, and it might be that $c'$ assigns a lot of credence to that situation. So it's hard to see how to motivate the move from $c'$ to $c$. However, in a situation in which you have no credence function, and you are unsure whether $c'$ is rational (even though it is) and you're certain that $c$ is rational (and indeed it is), WRD's demand that you should not pick $c'$ seems more reasonable. You occupy no point of view such that $c'$ is less of a depature from that point of view than $c$ is. You know only that $c$ lacks the flaws for sure, whereas $c'$ might have them. Better, then, to go for $c$, is it not? And if it is, this is enough to defeat permissivism.

I think it's not quite that simple. I noted above that Levinstein's deference principle relies on the assumption that all rational credence functions are better than all irrational credence functions. Schultheis' WRD seems to rely on something even stronger, namely, the assumption that all rational credence functions are equally good in all situations. For suppose they are not. You might then be unsure whether $c'$ is rational (though it is) and sure that $c$ is rational (and it is), but nonetheless rationally opt for $c'$ because you know that $c'$ has some good feature that you know $c$ lacks and you're willing to take the risk of having an irrational credence function in order to open the possibility of having that good feature.

Here's an example. You are unsure whether it is rational to assign $0.7$ to $X$ and $0.3$ to $\overline{X}$. It turns out that it is, but you don't know that. On the other hand, you do know that it is rational to assign 0.5 to each proposition. But the first assignment and the second are not equally good in all situations. The second has the same accuracy whether $X$ is true or false; the first, in constrast, is better than the first if $X$ is true and worse than the first if $X$ is false. The second does not open up the possibility of high accuracy that the first does; though, to compensate, it also precludes the possibility of low accuracy, which the first doesn't. Surveying the situation, you think that you will take the risk. You'll adopt the first, even though you aren't sure whether or not it is rational. And you'll do this because you want the possibility of being rational and having that higher accuracy. This seems a rational thing to do. So, it seems to me, WRD is false.

Although I think this objection to WRD works, I think it's helpful to see how it might play out for a particular motivation for permissivism. Here's the motivation: Some credence functions offer the promise of great accuracy -- for instance, assigning 0.9 to $X$ and 0.1 to $\overline{X}$ will be very accurate if $X$ is true. However, those that do so also open the possibility of great inaccuracy -- if $X$ is false, the credence function just considered is very inaccurate. Other credence functions neither offer great accuracy nor risk great inaccuracy. For instance, assigning 0.5 to both $X$ and $\overline{X}$ guarantees the same inaccuracy whether or not $X$ is true. You might say that you are more risk-averse the lower is the maximum possible inaccuracy you are willing to risk. Thus, the options that are rational for you are those undominated options with maximum inaccuracy at most whatever the threshold is that you set. Now, suppose you use the Brier score to measure your inaccuracy -- so that the inaccuracy of the credence function $c(X) = p$ and $c(\overline{X}) = 1-p$ is $2(1-p)^2$ if $X$ is true and $2p^2$ if $X$ is false. And suppose you are willing to tolerate a maximum possible inaccuracy of $0.5$, which also gives you a mininum inaccuracy of $0.5$. In that case, only $c(X) = 0.5 = c(\overline{X})$ will be rational from the point of view of your risk attitudes --- since $2(1-0.5)^2 = 0.5 = 2(0.5^2)$. On the other hand, suppose you are willing to tolerate a maximum inaccuracy of $0.98$, which also gives you a minimum inaccuracy of $0.18$. In that case, any credence function $c$ with $0.3 \leq c(X) \leq 0.7$ and $c(\overline{X}) = 1-c(X)$ is rational from the point of view of your risk attitudes.

Now, suppose that you are in the sort of situation that Schultheis imagines. You are uncertain of the extent of the set $R_E$ of rational responses to your evidence $E$. On the account we're considering, this must be because you are uncertain of your own attitudes to epistemic risk. Let's say that the threshold of maximum inaccuracy that you're willing to tolerate is $0.98$, but you aren't certain of that --- you think it might be anything between $0.72$ and $1.28$. So you're sure that it's rational to assign anything between 0.4 and 0.6 to $X$, but unsure whether it's rational to assign $0.7$ to $X$ --- if your threshold turns out to be less than 0.98, then assigning $0.7$ to $X$ would be irrational, because it risks inaccuracy of $0.98$. In this situation, is it rational to assign $0.7$ to $X$? I think it is. Among the credence functions that you know for sure are rational, the ones that give you the lowest possible inaccuracy are the one that assigns 0.4 to $X$ and the one that assigns 0.6 to $X$. They have maximum inaccuracy of 0.72, and they open up the possibility of an inaccuracy of 0.32, which is lower than the lowest possible inaccuracy opened up by any others that you know to be rational. On the other hand, assigning 0.7 to $X$ opens up the possibility of an inaccuracy of 0.18, which is considerably lower. As a result, it doesn't seem irrational to assign 0.7 to $X$, even though you don't know whether it is rational from the point of view of your attitudes to risk, and you do know that assigning 0.6 is rational. 

There is another possible response to Schultheis' challenge for those who like this sort of motivation for permissivism. You might simply say that, if your attitudes to risk are such that you will tolerate a maximum inaccuracy of at most $t$, then regardlesss of whether you know this fact, indeed regardless of your level of uncertainty about it, the rational credence functions are precisely those that have maximum inaccuracy of at most $t$. This sort of approach is familiar from expected utility theory. Suppose I have credences in $X$ and in $\overline{X}$. And suppose I face two options whose utility is determined by whether or not $X$ is true or false. Then, regardless of what I believe about my credences in $X$ and $\overline{X}$, I should choose whichever option maximises expected utility from the point of view of my actual credences. The point is this: if what it is rational for you to believe or to do is determined by some feature of you, whether it's your credences or your attitudes to risk, being uncertain about those features doesn't change what it is rational for you to do. This introduces a certain sort of externalism to our notion of rationality. There are features of ourselves -- our credences or our attitudes to risk -- that determine what it is rational for us to believe or do, which are nonetheless not luminous to us. But I think this is inevitable. Of course, we might might move up a level and create a version of expected utility theory that appeals not to our first-order credences but to our credences concerning those first-order credences -- perhaps you use the higher-order credences to define a higher-order expected value for the first-order expected utilities, and you maximize that. But it simply pushes the problem back a step. For your higher-order credences are no more luminous than your first-order ones. And to stop the regress, you must fix some level at which the credences at that level simply determine the expectation that rationality requires you to maximize, and any uncertainty concerning those does not affect rationality. And the same goes in this case. So, given this particular motivation for permissivism, which appeals to your attitudes to epistemic risk, it seems that there is another reason why WRD is false. If $c$ is in $R_E$, then it is rational for you, regardless of your epistemic attitude to its rationality.

Monday, 4 January 2021

Using a generalized Hurwicz criterion to pick your priors

Over the summer, I got interested in the problem of the priors again. Which credence functions is it rational to adopt at the beginning of your epistemic life? Which credence functions is it rational to have before you gather any evidence? Which credence functions provide rationally permissible responses to the empty body of evidence? As is my wont, I sought to answer this in the framework of epistemic utility theory. That is, I took the rational credence functions to be those declared rational when the appropriate norm of decision theory is applied to the decision problem in which the available acts are all the possible credence functions, and where the epistemic utility of a credence function is measured by a strictly proper measure. I considered a number of possible decision rules that might govern us in this evidence-free situation: Maximin, the Principle of Indifference, and the Hurwicz criterion. And I concluded in favour of a generalized version of the Hurwicz criterion, which I axiomatised. I also described which credence functions that decision rule would render rational in the case in which there are just three possible worlds between which we divide our credences. In this post, I'd like to generalize the results from that treatment to the case in which there any finite number of possible worlds.

Here's the decision rule (where $a(w_i)$ is the utility of $a$ at world $w_i$).

Generalized Hurwicz Criterion  Given an option $a$ and a sequence of weights $0 \leq \lambda_1, \ldots, \lambda_n \leq 1$ with $\sum^n_{i=1} \lambda_i = 1$, which we denote $\Lambda$, define the generalized Hurwicz score of $a$ relative to $\Lambda$ as follows: if $$a(w_{i_1}) \geq a(w_{i_2}) \geq \ldots \geq a(w_{i_n})$$ then $$H^\Lambda(a) := \lambda_1a(w_{i_1}) + \ldots + \lambda_na(w_{i_n})$$That is, $H^\Lambda(a)$ is the weighted average of all the possible utilities that $a$ receives, where $\lambda_1$ weights the highest utility, $\lambda_2$ weights the second highest, and so on.

The Generalized Hurwicz Criterion says that you should order options by their generalized Hurwicz score relative to a sequence $\Lambda$ of weightings of your choice. Thus, given $\Lambda$,$$a \preceq^\Lambda_{ghc} a' \Leftrightarrow H^\Lambda(a) \leq H^\Lambda(a')$$And the corresponding decision rule says that you should pick your Hurwicz weights $\Lambda$ and then, having done that, it is irrational to choose $a$ if there is $a'$ such that $a \prec^\Lambda_{ghc} a'$.

Now, let $\mathfrak{U}$ be an additive strictly proper epistemic utility measure. That is, it is generated by a strictly proper scoring rule. A strictly proper scoring rule is a function $\mathfrak{s} : \{0, 1\} \times [0, 1] \rightarrow [-\infty, 0]$ such that, for any $0 \leq p \leq 1$, $p\mathfrak{s}(1, x) + (1-p)\mathfrak{s}(0, x)$ is maximized, as a function of $x$, uniquely at $x = p$. And an epistemic utility measure is generated by $\mathfrak{s}$ if, for any credence function $C$ and world $w_i$,$$\mathfrak{U}(C, w_i) = \sum^n_{j=1} \mathfrak{s}(w^j_i, c_j)$$where

  • $c_j = C(w_j)$, and
  • $w^j_i = 1$ if $j=i$ and $w^j_i = 0$ if $j \neq i$

In what follows, we write the sequence $(c_1, \ldots, c_n)$ to represent the credence function $C$.

Also, given a sequence $(\alpha_1, \ldots, \alpha_k)$ of numbers, let$$\mathrm{Av}((\alpha_1, \ldots, \alpha_k)) := \frac{\alpha_1 + \ldots  + \alpha_k}{k}$$That is, $\mathrm{av}(A)$ is the average of the numbers in $A$. And given $1 \leq k \leq n$, let $A|_k = (a_1, \ldots, a_k)$. That is, $A|_k$ is the truncation of the sequence $A$ that omits all terms after $a_k$. Then we say that $A$ does not exceed its average if, for each $1 \leq k \leq n$,$$\mathrm{av}(A) \geq \mathrm{av}(A|_k)$$That is, at no point in the sequence does the average of the numbers up to that point exceed the average of all the numbers in the sequence.

Theorem 1 Suppose $\Lambda = (\lambda_1, \ldots, \lambda_n)$ is a sequence of generalized Hurwicz weights. Then there is a sequence of subsequences $\Lambda_1, \ldots, \Lambda_m$ of $\Lambda$ such that

  1. $\Lambda = \Lambda_1 \frown \ldots \frown \Lambda_m$
  2. $\mathrm{av}(\Lambda_1) \geq \ldots \geq \mathrm{av} (\Lambda_m)$
  3. each $\Lambda_i$ does not exceed its average

Then, the credence function$$(\underbrace{\mathrm{av}(\Lambda_1), \ldots, \mathrm{av}(\Lambda_1)}_{\text{length of $\Lambda_1$}}, \underbrace{\mathrm{av}(\Lambda_2), \ldots, \mathrm{av}(\Lambda_2)}_{\text{length of $\Lambda_2$}}, \ldots, \underbrace{\mathrm{av}(\Lambda_m), \ldots, \mathrm{av}(\Lambda_m)}_{\text{length of $\Lambda_m$}})$$maximizes $H^\Lambda(\mathfrak{U}(-))$ among credence functions $C = (c_1, \ldots, c_n)$ for which $c_1 \geq \ldots \geq c_n$.

This is enough to give us all of the credence functions that maximise $H^\Lambda(\mathfrak{U}(-))$: they are the credence function mentioned together with any permutation of it --- that is, any credence function obtained from that one by switching around the credences assigned to the worlds.

Proof of Theorem 1. Suppose $\mathfrak{U}$ is a measure of epistemic value that is generated by the strictly proper scoring rule $\mathfrak{s}$. And suppose that $\Lambda$ is the following sequence of generalized Hurwicz weights $0 \leq \lambda_1, \ldots, \lambda_n \leq 1$ with $\sum^n_{i=1} \lambda_i = 1$.

First, due to a theorem that originates in Savage and is stated and proved fully by Predd, et al., if $C$ is not a probability function---that is, if $c_1 + \ldots + c_n \neq 1$---then there is a probability function $P$ such that $\mathfrak{U}(P, w_i) > \mathfrak{U}(C, w_i)$ for all worlds $w_i$. Thus, since GHC satisfies Strong Dominance, whatever maximizes $H^\Lambda(\mathfrak{U}(-))$ will be a probability function.

Now, since $\mathfrak{U}$ is generated by a strictly proper scoring rule, it is also truth-directed. That is, if $c_i > c_j$, then $\mathfrak{U}(C, w_i) > \mathfrak{U}(C, w_j)$. Thus, if $c_1 \geq c_2 \geq \ldots \geq c_n$, then$$H^\Lambda(\mathfrak{U}(C)) = \lambda_1\mathfrak{U}(C, w_1) + \ldots + \lambda_n\mathfrak{U}(C, w_n)$$This is what we seek to maximize. But notice that this is just the expectation of $\mathfrak{U}(C)$ from the point of view of the probability distribution $\Lambda = (\lambda_1, \ldots, \lambda_n)$.

Now, Savage also showed that, if $\mathfrak{s}$ is strictly proper and continuous, then there is a differentiable and strictly convex function $\varphi$ such that, if $P, Q$ are probabilistic credence functions, then
\begin{eqnarray*}
\mathfrak{D}_\mathfrak{s}(P, Q) & = & \sum^n_{i=1} \varphi(p_i) - \sum^n_{i=1} \varphi(q_i) - \sum^n_{i=1} \varphi'(q_i)(p_i - q_i) \\
& = & \sum^n_{i=1} p_i\mathfrak{U}(P, w_i) - \sum^n_{i=1} p_i\mathfrak{U}(Q, w_i)
\end{eqnarray*}
So $C$ maximizes $H^\Lambda(\mathfrak{U}(-))$ among credence functions $C$ with $c_1 \geq \ldots \geq c_n$ iff $C$ minimizes $\mathfrak{D}_\mathfrak{s}(\Lambda, -)$ among credence functions $C$ with $c_1 \geq \ldots \geq c_n$. We now use the KKT conditions to calculate which credence functions minimize $\mathfrak{D}_\mathfrak{s}(\Lambda, -)$ among credence functions $C$ with $c_1 \geq \ldots \geq c_n$.

Thus, if we write $x_n$ for $1 - x_1 - \ldots - x_{n-1}$, then
\begin{multline*}
f(x_1, \ldots, x_{n-1}) = \mathfrak{D}((\lambda_1, \ldots, \lambda_n), (x_1, \ldots, x_n)) = \\
\sum^n_{i=1} \varphi(\lambda_i) - \sum^n_{i=1} \varphi(x_i) - \sum^n_{i=1} \varphi'(x_i)(\lambda_i - x_i)
\end{multline*}
So
\begin{multline*}
\nabla f = \langle \varphi''(x_1) (x_1 - \lambda_1) - \varphi''(x_n)(x_n - \lambda_n), \\
\varphi''(x_2) (x_2 - \lambda_2) - \varphi''(x_n)(x_n - \lambda_n), \ldots \\
\varphi''(x_{n-1}) (x_{n-1} - \lambda_{n-1}) - \varphi''(x_n)(x_n - \lambda_n) )\rangle
\end{multline*}

Let $$\begin{array}{rcccl}
g_1(x_1, \ldots, x_{n-1}) & = & x_2 - x_1&  \leq & 0\\
g_2(x_1, \ldots, x_{n-1}) & = & x_3 - x_2&  \leq & 0\\
\vdots & \vdots & \vdots & \vdots & \vdots \\
g_{n-2}(x_1, \ldots, x_{n-1}) & = & x_{n-1} - x_{n-2}&  \leq & 0 \\
g_{n-1}(x_1, \ldots, x_{n-1}) & = & 1 - x_1 - \ldots - x_{n-2} - 2x_{n-1} & \leq & 0
\end{array}$$So,
\begin{eqnarray*}
\nabla g_1 & = & \langle -1, 1, 0, \ldots, 0 \rangle \\
\nabla g_2 & = & \langle 0, -1, 1, 0, \ldots, 0 \rangle \\
\vdots & \vdots & \vdots \\
\nabla g_{n-2} & = & \langle 0, \ldots, 0, -1, 1 \rangle \\
\nabla g_{n-1} & = & \langle -1, -1, -1, \ldots, -1,  -2 \rangle \\
\end{eqnarray*}
So the KKT theorem says that $x_1, \ldots, x_n$ is a minimizer iff there are $0 \leq \mu_1, \ldots, \mu_{n-1}$ such that$$\nabla f(x_1, \ldots, x_{n-1}) + \sum^{n-1}_{i=1} \mu_i \nabla g_i(x_1, \ldots, x_{n-1}) = 0$$That is, iff there are $0 \leq \mu_1, \ldots, \mu_{n-1}$ such that
\begin{eqnarray*}
\varphi''(x_1) (x_1 - \lambda_1) - \varphi''(x_n)(x_n - \lambda_n) - \mu_1 - \mu_{n-1} & = & 0 \\
\varphi''(x_2) (x_2 - \lambda_2) - \varphi''(x_n)(x_n - \lambda_n) + \mu_1 - \mu_2 - \mu_{n-1} & = & 0 \\
\vdots & \vdots & \vdots \\
\varphi''(x_{n-2}) (x_{n-2} - \lambda_{n-2}) - \varphi''(x_n)(x_n - \lambda_n) + \mu_{n-3} - \mu_{n-2} - \mu_{n-1}& = & 0 \\
\varphi''(x_{n-1}) (x_{n-1} - \lambda_{n-1}) - \varphi''(x_n)(x_n - \lambda_n)+\mu_{n-2} - 2\mu_{n-1} & = & 0
\end{eqnarray*}
By summing these identities, we get:
\begin{eqnarray*}
\mu_{n-1} &  = & \frac{1}{n} \sum^{n-1}_{i=1} \varphi''(x_i)(x_i - \lambda_i) - \frac{n-1}{n} \varphi''(x_n)(x_n - \lambda_n) \\
&= & \frac{1}{n} \sum^n_{i=1} \varphi''(x_i)(x_i - \lambda_i) - \varphi''(x_n)(x_n - \lambda_n) \\
& = & \sum^{n-1}_{i=1} \varphi''(x_i)(x_i - \lambda_i) - \frac{n-1}{n}\sum^n_{i=1} \varphi''(x_i)(x_i - \lambda_i)
\end{eqnarray*}
So, for $1 \leq k \leq n-2$,
\begin{eqnarray*}
\mu_k & = & \sum^k_{i=1} \varphi''(x_i)(x_i - \lambda_i) - k\varphi''(x_n)(x_n - \lambda_n) - \\
&& \hspace{20mm} \frac{k}{n}\sum^{n-1}_{i=1} \varphi''(x_i)(x_i - \lambda_i) + k\frac{n-1}{n} \varphi''(x_n)(x_n - \lambda_n) \\
& = & \sum^k_{i=1} \varphi''(x_i)(x_i - \lambda_i) - \frac{k}{n}\sum^{n-1}_{i=1} \varphi''(x_i)(x_i - \lambda_i) -\frac{k}{n} \varphi''(x_n)(x_n - \lambda_n) \\
&= & \sum^k_{i=1} \varphi''(x_i)(x_i - \lambda_i) - \frac{k}{n}\sum^n_{i=1} \varphi''(x_i)(x_i - \lambda_i)
\end{eqnarray*}
So, for $1 \leq k \leq n-1$,
$$\mu_k = \sum^k_{i=1} \varphi''(x_i)(x_i - \lambda_i) - \frac{k}{n}\sum^n_{i=1} \varphi''(x_i)(x_i - \lambda_i)$$
Now, suppose that there is a sequence of subsequences $\Lambda_1, \ldots, \Lambda_m$ of $\Lambda$ such that

  1. $\Lambda = \Lambda_1 \frown \ldots \frown \Lambda_m$
  2. $\mathrm{av}(\Lambda_1) \geq \ldots \geq \mathrm{av}(\Lambda_m)$
  3. each $\Lambda_i$ does not exceed its average.

And let $$P = (\underbrace{\mathrm{av}(\Lambda_1), \ldots, \mathrm{av}(\Lambda_1)}_{\text{length of $\Lambda_1$}}, \underbrace{\mathrm{av}(\Lambda_2), \ldots, \mathrm{av}(\Lambda_2)}_{\text{length of $\Lambda_2$}}, \ldots, \underbrace{\mathrm{av}(\Lambda_m), \ldots, \mathrm{av}(\Lambda_m)}_{\text{length of $\Lambda_m$}})$$Then we write $i \in \Lambda_j$ if $\lambda_i$ is in the subsequence $\Lambda_j$. So, for $i \in \Lambda_j$, $p_i = \mathrm{av}(\Lambda_j)$. Then$$\frac{k}{n}\sum^n_{i=1} \varphi''(p_i)(p_i - \lambda_i) = \frac{k}{n} \sum^m_{j = 1} \sum_{i \in \Lambda_j} \varphi''(\mathrm{av}(\Lambda_j))(\mathrm{av}(\Lambda_j) - \lambda_i) = 0 $$
Now, suppose $k$ is in $\Lambda_j$. Then
\begin{multline*}
\mu_k = \sum^k_{i=1} \varphi''(p_i)(p_i - \lambda_i) = \\
\sum_{i \in \Lambda_1} \varphi''(p_i)(p_i - \lambda_i) + \sum_{i \in \Lambda_2} \varphi''(p_i)(p_i - \lambda_i) + \ldots + \\
\sum_{i \in \Lambda_{j-1}} \varphi''(p_i)(p_i - \lambda_i) + \sum_{i \in \Lambda_j|_k} \varphi''(p_i)(p_i - \lambda_i) = \\
\sum_{i \in \Lambda_j|_k} \varphi''(p_i)(p_i - \lambda_i) = \sum_{i \in \Lambda_j|_k} \varphi''(\mathrm{av}(\Lambda_j)(\mathrm{av}(\Lambda_j) - \lambda_i)
\end{multline*}
So, if $|\Lambda|$ is the length of the sequence $\Lambda$,$$\mu_k \geq 0 \Leftrightarrow |\Lambda_j|_k|\mathrm{av}(\Lambda_j) - \sum_{i \in \Lambda_j|_k} \lambda_i \geq 0 \Leftrightarrow \mathrm{av}(\Lambda_j) \geq \mathrm{av}(\Lambda_j|_k)$$But, by assumption, this is true for all $1 \leq k \leq n-1$. So $P$ minimizes $H^\Lambda(\mathfrak{U}(-))$, as required.

We now show that there is always a series of subsequences that satisfy (1), (2), (3) from above.  We proceed by induction. 

Base Case  $n = 1$. Then it is clearly true with the subsequence $\Lambda_1 = \Lambda$.

Inductive Step  Suppose it is true for all sequences $\Lambda = (\lambda_1, \ldots, \lambda_n)$ of length $n$. Now consider a sequence $(\lambda_1, \ldots, \lambda_n, \lambda_{n+1})$. Then, by the inductive hypothesis, there is a sequence of sequences $\Lambda_1, \ldots, \Lambda_m$ such that

  1. $\Lambda \frown (\lambda_{n+1}) = \Lambda_1 \frown \ldots \frown \Lambda_m \frown (\lambda_{n+1})$
  2. $\mathrm{av}(\Lambda_1) \geq \ldots \geq \mathrm{av} (\Lambda_m)$
  3. each $\Lambda_i$ does not exceed its average.

Now, first, suppose $\mathrm{av}(\Lambda_m) \geq \lambda_{n+1}$. Then let $\Lambda_{m+1} = (\lambda_{n+1})$ and we're done.

So, second, suppose $\mathrm{av}(\Lambda_m) < \lambda_{n+1}$. Then we find the greatest $k$ such that$$\mathrm{av}(\Lambda_k) \geq \mathrm{av}(\Lambda_{k+1}\frown \ldots \frown \Lambda_m \frown (\lambda_{n+1}))$$Then we let $\Lambda^*_{k+1} = \Lambda_{k+1}\frown \ldots \frown \Lambda_m \frown (\lambda_{n+1})$. Then we can show that

  1. $(\lambda_1, \ldots, \lambda_n, \lambda_{n+1}) = \Lambda_1 \frown \Lambda_2 \frown \ldots \frown \Lambda_k \frown \Lambda^*_{k+1}$.
  2. Each $\Lambda_1, \ldots, \Lambda_k, \Lambda^*_{k+1}$ does not exceed average.
  3. $\mathrm{av}(\Lambda_1) \geq \mathrm{av}(\Lambda_2) \geq \ldots \geq \mathrm{av}(\Lambda_k) \geq \mathrm{av}(\Lambda^*_{k+1})$.

(1) and (3) are obvious. So we prove (2). In particular, we show that $\Lambda^*_{k+1}$ does not exceed average. We assume that each subsequence $\Lambda_j$ starts with $\Lambda_{i_j+1}$

  • Suppose $i \in \Lambda_{k+1}$. Then, since $\Lambda_{k+1}$ does not exceed average, $$\mathrm{av}(\Lambda_{k+1}) \geq \mathrm{av}(\Lambda_{k+1}|_i)$$But, since $k$ is the greatest number such that$$\mathrm{av}(\Lambda_k) \geq \mathrm{av}(\Lambda_{k+1}\frown \ldots \frown \Lambda_m \frown (\lambda_{n+1}))$$We know that$$\mathrm{av}(\Lambda_{k+2}\frown \ldots \frown \Lambda_m \frown (\lambda_{n+1})) > \mathrm{av}(\Lambda_{k+1})$$So$$\mathrm{av}(\Lambda_{k+1}\frown \ldots \frown \Lambda_m \frown (\lambda_{n+1})) > \mathrm{av}(\Lambda_{k+1})$$So$$\mathrm{av}(\Lambda_{k+1}\frown \ldots \frown \Lambda_m \frown (\lambda_{n+1})) > \mathrm{av}(\Lambda_{k+1}|_i)$$
  • Suppose $i \in \Lambda_{k+2}$. Then, since $\Lambda_{k+2}$ does not exceed average, $$\mathrm{av}(\Lambda_{k+2}) \geq \mathrm{av}(\Lambda_{k+2}|_i)$$But, since $k$ is the greatest number such that$$\mathrm{av}(\Lambda_k) \geq \mathrm{av}(\Lambda_{k+1}\frown \ldots \frown \Lambda_m \frown (\lambda_{n+1}))$$We know that$$\mathrm{av}(\Lambda_{k+3}\frown \ldots \frown \Lambda_m \frown (\lambda_{n+1})) > \mathrm{av}(\Lambda_{k+2})$$So$$\mathrm{av}(\Lambda_{k+1}\frown \ldots \frown \Lambda_m \frown (\lambda_{n+1})) > \mathrm{av}(\Lambda_{k+2}|_i)$$But also, from above,$$ \mathrm{av}(\Lambda_{k+1}\frown \ldots \frown \Lambda_m \frown (\lambda_{n+1})) > \mathrm{av}(\Lambda_{k+1})$$So$$\mathrm{av}(\Lambda_{k+1}\frown \ldots \frown \Lambda_m \frown (\lambda_{n+1})) > \mathrm{av}(\Lambda_{k+1} \frown \Lambda_{k+2}|_i)$$
  • And so on.

This completes the proof. $\Box$



Friday, 1 January 2021

How permissive is rationality? Horowitz's value question for moderate permissivism

Rationality is good; irrationality is bad. Most epistemologists would agree with this rather unnuanced take, regardless of their view of what exactly constitutes rationality and its complement. Granted this, a good test of a thesis in epistemology is whether it can explain why these two claims are true. Can it answer the value question: Why is rationality valuable and irrationality not? And indeed Sophie Horowitz gives an extremely illuminating appraisal of different degrees of epistemic permissivism and impermissivism by asking of each what answer it might give. Her conclusion is that the extreme permissivist -- played in her paper by the extreme subjective Bayesian, who thinks that satisfying Probabilism and being certain of your evidence is necessary and sufficient for rationality -- can give a satisfying answer to this question, or, at least, an answer that is satisfying from their own point of view. And the extreme impermissivist -- played here by the objective Bayesian, who thinks that rationality requires something like the maximum entropy distribution relative to your evidence -- can do so too. But, Horowitz argues, the moderate permissivist -- played by the moderate Bayesian, who thinks rationality imposes requirements more stringent than merely Probabilism, but who does not think they're stringent enough to pick out a unique credence function -- cannot. In this post, I'd like to raise some problems for Horowitz's assessment, and try to offer my own answer to the value question on behalf of the moderate Bayesian. (Full disclosure: If I'm honest, I think I lean towards extreme permissivism, but I'd like to show that moderate permissivism can defend itself against Horowitz's objection.)

Let's begin with the accounts that Horowitz gives on behalf of the extreme permissivist and the impermissivist.

The extreme permissivist -- the extreme subjective Bayesian, recall -- can say that only by being rational can you have a credence function that is immodest -- where a credence function is immodest if it uniquely maximizes expected epistemic utility from its own point of view. This is because Horowitz, like others in the epistemic utility theory literature, assume that epistemic utility is measured by strictly proper measures, so that, every probabilistic credence function expects itself to be better than any alternative credence function. From this, we can conclude that, on the extreme permissivist view, rationality is sufficient for immodesty. It's trickier to show that it is also necessary, since it isn't clear what we mean by the expected epistemic utility of a credence function from the point of view of a non-probabilistic credence function -- the usual definitions of expectation make sense only for probabilistic credence functions. Fortunately, however, we don't have to clarify this much. We need only say that, at the very least, if one credence function is epistemically better than another at all possible worlds -- that is, in decision theory parlance, the first dominates the second -- then any credence function, probabilistic or not, will expect the first to be better than the second. We then combine this with the result that, if epistemic utility is measured by a stricty proper measure, then, for each non-probabilistic credence function, there is a probabilistic credence function that dominates it, while for each probabilistic credence function, there is no such dominator (this result traces back to Savage's 1971 paper; Predd, et al. give the proof in detail when the measure is additive; I then generalised it to remove the additivity assumption). This then shows that being rational is necessary for being immodest. So, according to Horowitz's answer on behalf of the extreme permissivist, being rational is good and being irrational is bad because being rational is necessary and sufficient for being immodest; and it's good to be immodest and bad to be modest.

On the other hand, the impermissivist can say that, by being rational, you are maximizing expected accuracy from the point of view of the one true rational credence function. That's their answer to the value question, according to Horowitz.

We'll return to the question of whether these answers are satisfying below. But first I want to turn to Horowitz's claim that the moderate Bayesian cannot give a satisfactory answer. I'll argue that, if the two answers just given on behalf of the extreme permissivist and extreme impermissivist are satisfactory, then there is a satisfactory answer that the moderate permissivist can give. Then I'll argue that, in fact, these answers aren't very satisfying. And I'll finish by sketching my preferred answer on behalf of the moderate permissivist. This is inspired by William James' account of epistemic risks in The Will to Believe, which leads me to discuss another Horowitz paper.

Horowitz's strategy is to show that the moderate permissivist cannot find a good epistemic feature of credence functions that belongs to all that they count as rational, but does not belong to any they count as irrational. The extreme permissivist can point to immodesty; the extreme impermissivist can point to maximising expected epistemic utility from the point of view of the sole rational credence function. But, for the moderate, there's nothing. Or so Horowitz argues.

For instance, Horowitz initially considers the suggestion that rational credence functions guarantee you a minimum amount of epistemic utility. As she notes, the problem with this is that either it leads to impermissivism, or it fails to include all and only the credence functions the moderate considers rational. Let's focus on the case in which we have opinions about a proposition and its negation -- the point generalizes to richer sets of propositions. We'll represent the credence functions as pairs $(c(X), c(\overline{X}))$. And let's measure epistemic utility using the Brier score. So, when $X$ is true, the epistemic utility of $(x, y)$ is $-(1-x)^2 - y^2$, and when $X$ is false, it is $-x^2 - (1-y)^2$. Then, for $r > -0.25$, there is no credence function that guarantees you at least epistemic value $-0.25$ -- if you have at least that epistemic value at one world, you have less than that epistemic value at a different world. For $r = 0.25$, there is exactly one credence function that guarantees you at least epistemic value $-0.25$ -- it is the uniform credence function $(0.5, 0.5)$. And for $r < -0.25$, there are both probabilistic and non-probabilistic credence functions that guarantee you at least epistemic utility $r$. So, Horowitz concludes, a certain level of guaranteed epistemic utility can't be what separates the rational from the irrational for the moderate permissivist, since for any level, either no credence function guarantees it, exactly one does, or there are both credence functions the moderate considers rational and credence functions they consider irrational that guarantee it.

She identifies a similar problem if we think not about guaranteed accuracy but about expected accuracy. Suppose, as the moderate permissivist urges, that some but not all probability functions are rationally permissible. Then for many rational credence functions, there will be irrational ones that they expect to be better than they expect some rational credence functions to be. Horowitz gives the example of a case in which the rational credence in $X$ is between 0.6 and 0.8 inclusive. Then someone with credence 0.8 will expect the irrational credence 0.81 to be better than it expects the rational credence 0.7 to be -- at least according to many many strictly proper measures of epistemic utility. So, Horowitz concludes, whatever separates the rational from the irrational, it cannot be considerations of expected epistemic utility.

I'd like to argue that, in fact, Horowitz should be happy with appeals to guaranteed or expected epistemic utility. Let's take guaranteed utility first. All that the moderate permissivist needs to say to answer the value question is that there are two valuable things that you obtain by being rational: immodesty and a guaranteed level of epistemic utility. Immodesty rules out all non-probabilistic credence functions, while the guaranteed level of epistemic utility narrows further -- how narrow depends on how much epistemic utility you wish to guarantee. So, for instance, suppose we say that the rational credence functions are exactly those $(x, 1-x)$ with $0.4 \leq x \leq 0.6$. Then each is immodest. And each has a guaranteed epistemic utility of at least $-(1-0.4)^2 - 0.6^2 = -0.72$. If Horowitz is satisfied with the immodesty answer to the value question when the extreme permissivist gives it, I think she should also be satisfied with it when the moderate permissivist combines it with a requirement not to risk certain low epistemic utilities (in this case, utilities below $-0.72$). And this combination of principles rules in all of the credence functions that the moderate counts as rational and rules out all they count as irrational.

Next, let's think about expected epistemic utility. Suppose that the set of credence functions that the moderate permissivist counts as rational is a closed convex set. For instance, perhaps the set of rational credence function is $$R = \{c : \{X, \overline{X}\} \rightarrow [0, 1] : 0.6 \leq c(X) \leq 0.8\ \&\ c(\overline{X}) = 1- c(X)\}$$ Then we can prove the following: if a credence function is not in $R$, then there is $c^*$ in $R$ such that each $p$ in $R$ expects $c^*$ to be better than it expects $c$ to be (for the proof strategy, see Section 3.2 here, but replace the possible chance functions with the rational credence functions). Thus, just as the extreme impermissivist answers the value question by saying that, if you're irrational, there's a credence function the unique rational credence function prefers to yours, while if you're rational, there isn't, the moderate permissivist can say that, if you're irrational, there is a credence function that all the rational credence functions prefer to yours, while if you're rational, there isn't. 

Of course, you might think that it is still a problem for moderate permissivists that there are rational credence functions that expect some irrational credence functions to be better than some alternative rational ones. But I don't think Horowitz will have this worry. After all, the same problem affects extreme permissivism, and she doesn't take issue with this -- at least, not in the paper we're considering. For any two probabilistic credence functions $p_1$ and $p_2$, there will be some non-probabilistic credence function $p'_1$ that $p_1$ will expect to be better than it expects $p_2$ to be -- $p'_1$ is just a very slight perturbation of $p_1$ that makes it incoherent; a perturbation small enough to ensure it lies closer to $p_1$ than $p_2$ does.

A different worry about the account of the value of rationality that I have just offered on behalf of the moderate permissivist is that it seems to do no more than push the problem back a step. It says that all irrational credence functions have a flaw that all rational credence functions lack. The flaw is this: there is an alternative preferred by all rational credence functions. But to assume that this is indeed a flaw seems to presuppose that we should care how rational credence functions evaluate themselves and other credence functions. But isn't the reason for caring what they say exactly what we have been asking for? Isn't the person who posed the value question in the first place simply going to respond: OK, but what's so great about all the rational credence functions expecting something else to be better, when the question on the table is exactly why rational credence functions are so good?

This is a powerful objection, but note that it applies equally well to Horowitz's response to the value question on behalf of the impermissivist. There, she claims that what is good about being rational is that you thereby maximise expected accuracy from the point of view of the unique rational credence function. But without an account of what's so good about being rational, I think we equally lack an account of what's so good about maximizing expected accuracy from the point of view of the rational credence functions.

So, in the end, I think Horowitz's answer to the value question on behalf of the impermissivist and my proposed expected epistemic utility answer on behalf of the moderate permissivist are ultimately unsatisfying.

What's more, Horowitz's answer on behalf of the extreme permissivist is also a little unsatisfying. The answer turns on the claim that immodesty is a virtue, together with the fact that precisely those credence functions identified as rational by subjective Bayesianism have that virtue. But is it a virtue? Just as arrogance in a person might seem excusable if they genuinely are very competent, but not if they are incompetent, so immodesty in a credence function only seems virtuous if the credence function itself is good. If the credence function is bad, then evaluating itself as uniquely the best seems just another vice to add to its collection. 

So I think Horowitz's answer to the value question on behalf of the extreme permissivist is a little unsatisfactory. But it lies very close to an answer I find compelling. That answer appeals not to immodesty, but to non-dominance. Having a credence function that is dominated is bad. It leaves free epistemic utility on the table in just the same way that a dominated action in practical decision theory leaves free pragmatic utility on the table. For the extreme permissivist, what is valuable about rationality is that it ensures that you don't suffer from this flaw. 

One noteworthy feature of this answer is the conception of rationality to which it appeals. On this conception, the value of rationality does not derive fundamentally from the possession of a positive feature, but from the lack of a negative feature. Ultimately, the primary notion here is irrationality. A credence function is irrational if it exhibits certain flaws, which are spelled out in terms of its success in the pursuit of epistemic utility. You are rational if you are free of these flaws. Thus, for the extreme permissivist, there is just one such flaw -- being dominated. So the rational credences are simply those that lack that flaw -- and the maths tells us that those are precisely the probabilistic credence functions.

We can retain this conception of rationality, motivate moderate permissivism, and answer the value question for it. In fact, there are at least two ways to do this. We have met something very close to one of these ways when we tried to rehabilitate the moderate permissivist's appeal to guaranteed epistemic utility above. There, we said that what makes rationality good is that it ensures that you are immodest and also ensures a certain guaranteed level of accuracy. But, a few paragraphs back, we argued that immodesty is no virtue. So that answer can't be quite right. But we can replace the appeal to immodesty with an appeal to non-dominance, and then the answer will be more satisfying. Thus, the moderate permissivist who says that the rational credence functions are exactly those $(x, 1-x)$ with $0.4 \leq x \leq 0.6$ can say that being rational is valuable for two reasons: (i) if you're rational, you aren't dominated; (ii) if you're rational you are guaranteed to have epistemic utility at least $-0.72$; (iii) only if you are rational will (i) and (ii) both hold. This answers the value question by appealing to how well credence functions promote epistemic utility, and it separates out the rational from the irrational precisely.

To explain the second way we might do this, we invoke William James. Famously, in The Will to Believe, James said that we have two goals when we believe: to believe truth, and to avoid error. But these pull in different directions. If we pursue the first by believing something, we open ourselves up to the possibility of error. If we pursue the second by suspending judgment on something, we foreclose the possibility of believing the truth about it. Thus, to govern our epistemic life, we must balance these two goals. James held that how we do this is a subjective matter of personal judgment, and a number of different ways of weighing them are permissible. Thomas Kelly has argued that this can motivate permissivism in the case of full beliefs. Suppose the epistemic utility you assign to getting things right -- that is, believing truths and disbelieving falsehoods -- is $R > 0$. And suppose you assign epistemic utility $-W < 0$ to getting things wrong -- that is, disbelieving truths and believing falsehoods. And suppose you assign $0$ to suspending judgment. And suppose $W > R$. Then, as Kenny Easwaran and Kevin Dorst have independently pointed out, if $r$ is the evidential probability of $X$, believing $X$ maximises expected epistemic utility from its point of view iff $\frac{W}{R + W} \leq r$, while suspending on $X$ maximises expected epistemic utility iff $\frac{R}{W+R} \leq r \leq \frac{W}{R+W}$. If William James is right, different values for $R$ and $W$ are permissible. The more you value believing truths, the greater will be $R$. The more you value avoiding falsehoods, the greater will be $W$ (and the lower will be $-W$). Thus, there will be a possible evidential probability $r$ for $X$, as well as permissible values $R$, $R'$ for getting things right and permissible values $W$, $W'$ for getting things wrong such that $$\frac{W}{R+W} < r < \frac{W'}{R'+W'}$$So, for someone with epistemic utilities characterised by $R$, $W$, it is rational to suspend judgment on $X$, while for someone with $W'$, $R'$, it is rational to believe $X$. Hence, permissivism about full beliefs.

As Horowitz points out, however, the same trick won't work for credences. After all, as we've seen, all legitimate measures of epistemic utility for credences are strictly proper measures. And thus, if $r$ is the evidential probability of $X$, then credence $r$ in $X$ uniquely maximises expected epistemic utility relative to any one of those measures. So, a Jamesian permissivism about measures of epistemic value gives permissivism about doxastic states in the case of full belief, but not in the case of credence.

Nonetheless, I think we can derive permissivism about credences from James' insight. The key is to encode our attitudes towards James' two great goals for belief not in our epistemic utilities but in the rule we adopt when we use those epistemic utilities to pick our credences. Here's one suggestion, which I pursued at greater length in this paper a few years ago, and that I generalised in some blog posts over the summer -- I won't actually present the generalization here, since it's not required to make the basic point. James recognised that, by giving yourself the opportunity to be right about something, you thereby run the risk of being wrong. In the credal case, by giving yourself the opportunity to be very accurate about something, you thereby run the risk of being very inaccurate. In the full belief case, to avoid that risk completely, you must never commit on anything. It was precisely this terror of being wrong that he lamented in Clifford. By ensuring he could never be wrong, there were true beliefs to which Clifford closed himself off. James believed that the extent to which you are prepared to take these epistemic risks is a passional matter -- that is, a matter of subjective preference. We might formalize it using a decision rule called the Hurwicz criterion. This rule was developed by Leonid Hurwicz for situations in which no probabilities are not available to guide our decisions, so it is ideally suited for the situation in which we must pick our prior credences. 

Maximin is the rule that says you should pay attention only to the worst-case scenario and choose a credence function that does best there -- you should maximise your minimum possible utility. Maximax is the rule that says you should pay attention only to the best-case scenario and choose a credence function that does best there -- you should maximise your maximum possible utility. The former is maximally risk averse, the latter maximally risk seeking. As I showed here, if you measure epistemic utility in a standard way, maximin demands that you adopt the uniform credence function -- its worst case is best. And almost however you measure epistemic utility, maximax demands that you pick a possible world and assign maximal credence to all propositions that are true there and minimal credence to all propositions that are false there -- its best case, which obviously occurs at the world you picked, is best, because it is perfect there. 

The Hurwicz criterion is a continuum of decision rules with maximin at one end and maximax at the other. You pick a weighting $0 \leq \lambda \leq 1$ that measures how risk-seeking you are and you define the Hurwicz score of an option $a$, with utility $a(w)$ at world $w$, to be$$H^\lambda(a) = \lambda \max \{a(w) : w \in W\} + (1-\lambda) \min \{a(w) : w \in W\}$$And you pick an option with the highest Hurwicz score.

Let's see how this works out in the simplest case, namely, that in which you have credences only in $X$ and $\overline{X}$. As before, we write credence functions defined on these two propositions as $(c(X), c(\overline{X})$. Then, if $\lambda \leq \frac{1}{2}$ --- that is, if you give at least as much weight to the worst case as to the best case --- then the uniform distribution $(\frac{1}{2}, \frac{1}{2})$ maximises the Hurwicz score relative to any strictly proper measure. And if $\lambda > \frac{1}{2}$ --- that is, if you are risk seeking and give more weight to the best case than the worst --- then $(\lambda, 1 - \lambda)$ and $(1-\lambda, \lambda)$ both maximise the Hurwicz score.

Now, if any $0 \leq \lambda \leq 1$ is permissible, then so is any credence function $(x, 1-x)$, and we get extreme permissivism. But I think we're inclined to say that there are extreme attitudes to risk that are not rationally permissible, just as there are preferences relating the scratching of one's finger and the destruction of the world that are not rationally permissible. I think we're inclined to think there is some range from $a$ to $b$ with $0 \leq a < b \leq 1$ such that the only rational attitudes to risk are precisely those encoded by the Hurwicz weights that lie between $a$ and $b$. If that's the case, we obtain moderate permissivism.

To be a bit more precise, this gives us both moderate interpersonal and intrapersonal permissivism. It gives us moderate interpersonal permissivism if $\frac{1}{2} < b < 1$ -- that is, if we are permitted to give more than half our weight to the best case epistemic utility. For then, since $a < b$, there is $b'$ such that $\frac{1}{2} < b' < b$, and then both $(b, 1-b)$ and $(b', 1-b')$ are both rationally permissible. But there is also $b < b'' < 1$, and for any such $b''$, $(b'', 1-b'')$ is not rationally permissible. It also gives us moderate intrapersonal permissivism under the same condition. For if $\frac{1}{2} < b$ and $b$ is your Hurwicz weight, then for you, both $(b, 1-b)$ and $(1-b, b)$ are different, but both are rationally permissible.

How does this motivation for moderate permissivism fare with respect to the value question? I think it fares as well as the non-dominance-based answer I sketched above for the extreme permissivist. There, I appealed to a single flaw that a credence function might have: it might be dominated by another. Here, I introduced another flaw. It might be rationalised only by Jamesian attitudes to epistemic risk that are too extreme or otherwise beyond the pale. Like being dominated, this is a flaw that relates to the pursuit of epistemic utility. If you exhibit it, you are irrational. And to be rational is to be free of such flaws. The moderate permissivist can thereby answer the value question that Horowitz poses.