Taking risks and picking posteriors

For a PDF of this blog, see here.

When are my credences rational? In Bayesian epistemology, there's a standard approach to this question. We begin by asking what credences would be rational were you to have no evidence at all; then we ask what ways of updating your credences are rational when you receive new evidence; and finally we say that your current credences are rational if they are the result of updating rational priors in a rational way on the basis of your current total evidence. This account can be read in one of two ways: on the doxastic reading, you're rational if, in fact, when you had no evidence you had priors that were rational and if, in fact, when you received evidence you updated in a rational way; on the propositional reading, you're rational if there exists some rational prior and there exists some rational way of updating such that applying the updating rule to the prior based on your current evidence issues in your current credences.

In this previous post, I asked how we might use accuracy-first epistemology and decision-making rules for situations of massive uncertainty to identify the rational priors. I suggested that we should turn to the early days of decision theory, when there was still significant interest in how we might make a decision in situations in which it is not possible to assign probabilities to the different possible states of the world. In particular, I noted Hurwicz's generalization of Wald's Maximin rule, which is now called the Hurwicz Criterion, and I offered a further generalization of my own, which I then applied to the problem of picking priors. Here's my generalization:

Generalized Hurwicz Criterion (GHC) Suppose the set of possible states of the world is $W = \{w_1, \ldots, w_n\}$. Pick $0 \leq \alpha_1, \ldots, \alpha_n \leq 1$ with $\alpha_1 + \ldots + \alpha_n = 1$, and denote this sequence of weights $A$. Suppose $a$ is an option defined on $W$, and write $a(w_i)$ for the utility of $a$ at world $w_i$. Then if$$a(w_{i_1}) \geq a(w_{i_2}) \geq \ldots \geq a(w_{i_n})$$then let$$H^A(a) = \alpha_1a(w_{i_1}) + \ldots + \alpha_na(w_{i_n})$$Pick an option that maximises $H^A$.

Thus, whereas Wald's Maximin puts all of the weight onto the worst case, and Hurwicz's Criterion distributes all the weight between the best and worst cases, the generalized Hurwicz Criterion allows you to distribute the weight between best, second-best, and so on down to second-worst, and worst. I said that you should pick your priors by applying GHC with a measure of accuracy for credences. Then I described the norms for priors that it imposes.

In this post, I'm interested in the second component of the Bayesian approach I described above, namely, the rational ways of updating. How does rationality demand we update our prior when new evidence arrives? Again, I'll be asking this within the accuracy-first framework.

As in the previous post, we'll consider the simplest possible case. We'll assume there are just three possible states of the world that you entertain and to which you will assign credences. They are $w_1, w_2, w_3$. If $p$ is a credence function, then we'll write $p_i$ for $p(w_i)$, and we'll denote the whole credence function $(p_1, p_2, p_3)$. At the beginning of your epistemic life you have no evidence, and you must pick your prior.  We'll assume that you do as I proposed in the previous post and set your prior using GHC with some weights that you have picked, $\alpha_1$ for the best-case scenario, $\alpha_2$ for the second-best, and $\alpha_3$ for the worst. Later on, let's suppose, you learn evidence $E = \{w_1, w_2\}$. How should you update? As we will see, the problem is that there are many seemingly plausible approaches to this question, most of which disagree and most of which give implausible answers.

A natural first proposal is to use the same decision rule to select our posterior as we used to select our prior. To illustrate, let's suppose that our Hurwicz weight for the best-case scenario is $\alpha_1 = 0.5$, for the second-best $\alpha_2 = 0.3$, and for the worst $\alpha_3 = 0.2$. Applying GHC with these weights and any additive and continuous strictly proper (acsp) accuracy measure gives the following as the permissible priors:$$\begin{array}{ccc}(0.5, 0.3, 0.2) & (0.5, 0.2, 0.3) & (0.3, 0.5, 0.2) \\ (0.3, 0.2, 0.5) & (0.2, 0.5, 0.3) &(0.2, 0.3, 0.5)\end{array}$$
Let's suppose we pick $(0.5, 0.3, 0.2)$. And now suppose we learn $E = \{w_1, w_2\}$. If we simply apply GHC again, we get the same set of credence functions as permissible posteriors. But none of these even respects the evidence we've obtained -- that is, all of them assign positive credence to world $w_3$, which our evidence has ruled out. So that can't be quite right.

Perhaps, then, we should first limit the permissible  posteriors to those that respect our evidence -- by assigning credence $0$ to world $w_3$ -- and then find the credence function that maximizes GHC among them. It turns out that the success of this move depends on the measure of accuracy that you use. Suppose, for instance, you use the Brier score $\mathfrak{B}$, whose accuracy for a credence function $p = (p_1, p_2, p_3)$ at world $w_i$ is $$\mathfrak{B}(p, w_i) = 2p_i - (p_1^2 + p_2^2 + p_3^2)$$That is, you find the credence function of the form $q = (q_1, 1-q_1, 0)$ that minimizes $H^A_\mathfrak{B}$. But it turns out that this is $q = (0.6, 0.4, 0)$, which is not the result of conditioning $(0.5, 0.3, 0.2)$ on $E = \{w_1, w_2\}$ -- that would be $q = (0.625, 0.375, 0)$.

However, as I explained in another previous post, there is a unique additive and continuous strictly proper accuracy measure that will give conditionalization in this way. I called it the enhanced log score $\mathfrak{L}^\star$, and it is also found in Juergen Landes' paper here (Proposition 9.1) and Schervish, Seidenfeld, and Kadane's paper here (Example 6). Its accuracy for a credence function $p = (p_1, p_2, p_3)$ at world $w_i$ is $$\mathfrak{L}^\star(p, w_i) = \log p_i - (p_1 + p_2 + p_3)$$If we apply GHC with that accuracy measure and with the restriction to credence functions that satisfy the evidence, we get $(0.625, 0.375, 0)$ or $(0.375, 0.625, 0)$, as required. So while GHC doesn't mandate conditioning on your evidence, it does at least permit it. However, while this goes smoothly if we pick $(0.5, 0.3, 0.2)$ as our prior, it does not work so well if we pick $(0.2, 0.3, 0.5)$, which, if you recall, is also permitted by the Hurwicz weights we are using. After all, the two permissible posteriors remain the same, but neither is the result of conditioning that prior on $E$. This proposal, then, is a non-starter.

There is, in any case, something strange about the approach just mooted. After all, GHC assigns a weight to the accuracy of a candidate posterior in each of the three worlds, even though in world $w_3$ you wouldn't receive evidence $E$ and would thus not adopt this posterior. Let's suppose that you'd receive evidence $\overline{E} = \{w_3\}$ instead at world $w_3$; and let's suppose you'd adopt the only credence function that respects this evidence, namely, $(0, 0, 1)$. If that's the case, we might try applying GHC not to potential posteriors but to potential rules for picking posteriors. I'll call these posterior rules. In the past, I've called them updating rules, but this is a bit misleading. An updating rule would take as inputs both prior and evidence and give the result of updating the former on the latter. But these rules really just take evidence as an input and say which posterior you'll adopt if you receive it. Thus, for our situation, in which you might learn either $E$ or $\overline{E}$, the posterior rule would have the following form:$$p' = \left \{ \begin{array}{rcl}E & \mapsto & p'_E \\ \overline{E} & \mapsto & p'_{\overline{E}}\end{array}\right.$$for some suitable specification of $p'_E$ and $p'_\overline{E}$. Then the accuracy of a rule $p'$ at a world is just the accuracy of the output of that rule at that world. Thus, in this case:$$\begin{array}{rcl}\mathfrak{I}(p', w_1) & = & \mathfrak{I}(p'_E, w_1) \\ \mathfrak{I}(p', w_2) & = & \mathfrak{I}(p'_E, w_2) \\\mathfrak{I}(p', w_3) & = & \mathfrak{I}(p'_\overline{E}, w_3)\end{array}$$The problem is that this move doesn't help. Part of the reason is that whatever was the best-case scenario for the prior, the best case for the posterior is sure to be world $w_3$, since $p'_\overline{E} = (0, 0, 1)$ is perfectly accurate at that world. Thus, suppose you pick $(0.5, 0.3, 0.2)$ as your prior. It turns out that the rules that minimize $H^A_{\mathfrak{L}^\star}$ will give $p'_E = (0.4, 0.6, 0)$ or $p'_E = (0.6, 0.4, 0)$, whereas conditioning your prior on $E$ gives $p'_E = (0.625, 0.375, 0)$ or $p'_E = (0.375, 0.625, 0)$.

Throughout our discussion so far, we have dismissed various possible approaches because they are not consistent with conditionalization. But why should that be a restriction? Perhaps the approach we are taking will tell us that the Bayesian fixation with conditionalization is misguided. Perhaps. But there are strong arguments for conditionalization within accuracy-first epistemology, so we'd have to see why they go wrong before we start rewriting Bayesian textbooks. I'll consider three such arguments here. The first isn't as strong as it seems; the second isn't obviously available to someone who used GHC to pick priors; the third is promising but it leads us initially down a tempting road into an inhospitable morass.

The first is closely related to a proposal I explored in a previous blogpost. So I'll briefly outline the approach here and refer to the issues raised in that post. The idea is this: Your prior is $(p_1, p_2, p_3)$. You learn $E$. You must now adopt a posterior that respects your new evidence, namely, $(q_1, 1-q_1, 0)$. You should choose the posterior of that form that maximises expected accuracy from the point of view of your prior, that is, you're looking for $(x, 1-x, 0)$ that maximizes$$p_1 \mathfrak{I}((x, 1-x, 0), w_1) + p_2 \mathfrak{I}((x, 1-x, 0), w_2) + p_3 \mathfrak{I}((x, 1-x, 0), w_3)$$This approach is taken in a number of places: at the very least, here, here, and here. Now, it turns out that there is only one additive and continuous strictly proper accuracy measure that is guaranteed always to give conditionalization on this approach. That is, there is only one measure such that, for any prior, the posterior it expects to be best among those that respect the evidence is the one that results from conditioning the prior on the evidence. Indeed, that accuracy measure is one we've already met above, namely, the enhanced log score $\mathfrak{L}^\star$ (see here). However, it turns out that it only works if we assume our credences are defined only over the set of possible states of the world, and not over more coarse-grained propositions (see here). So I think this approach is a non-starter.

More promising at first sight is the argument by Hilary Greaves and David Wallace from 2006. Here, just as we considered earlier, we look not just at the posterior we will adopt having learned $E$, but also the posterior we would adopt were we to learn $\overline{E}$. Thus, if your prior is $(p_1, p_2, p_3)$, then you are looking for $(x, 1-x, 0)$ that maximizes$$p_1 \mathfrak{I}((x, 1-x, 0), w_1) + p_2 \mathfrak{I}((x, 1-x, 0), w_2) + p_3 \mathfrak{I}((0, 0, 1), w_3)$$And it turns out that this will always be$$x = \frac{p_1}{p_1 + p_2}\ \ 1-x = \frac{p_2}{p_1+p_2}$$providing $\mathfrak{I}$ is strictly proper.

Does this help us? Does it show that, if we set our priors using GHC, we should then set our posteriors using conditionalization? One worry might be this: What justifies you in choosing your posteriors using one decision rule -- namely, maximise subjective expected utility -- when you picked your priors using a different one -- namely, GHC? But there seems to be a natural answer. As I emphasised above, GHC is specifically designed for situations in which probabilities, either subjective or objective, are not available. It allows us to make decisions in their absence. But of course when it comes to choosing the posterior, we are no longer in such a situation. At that point, we can simply resort to what became more orthodox decision theory, namely, Savage's subjective expected utility theory.

But there's a problem with this. GHC is not a neutral norm for picking priors. When you pick your Hurwicz weights for the best case, the second-best case, and so on down to the second-worst case and the worst case, you reflect an attitude to risk. Give more weight to the worst cases and you're risk averse, choosing options that make those worst cases better; give more weight to the best cases and you're risk seeking; spread the weights equally across all cases and you are risk neutral. But the problem is that subjective expected utility theory is a risk neutral theory. (One way to see this is to note that it is the special case of Lara Buchak's risk-weighted expected utility theory that results from using the neutral risk function $r(x) = x$.) Thus, for those who have picked their prior using a risk-sensitive instance of GHC when they lacked probabilities, the natural decision rule when they have access to probabilities is not going to be straightforward expected utility theory. It's going to be a risk-sensitive rule that can accommodate subjective probabilities. The natural place to look would be Lara Buchak's theory, for instance. And it's straightforward to show that Greaves and Wallace's result does not hold when you use such a rule. (In forthcoming work, Catrin Campbell-Moore and Bernhard Salow have been working on what does follow and how we might change our accuracy measures to fit with such a theory and what follows from an argument like Greaves and Wallace's when you do that.) In sum, I think arguments for conditionalization based on maximizing expected accuracy won't help us here.

Fortunately, however, there is another argument, and it doesn't run into this problem. As we will see, though, it does face other challenges. In Greaves and Wallace's argument, we took the view from from the prior that we picked using GHC, and we used it to evaluate our way of picking posteriors. In this argument, due to me and Ray Briggs, we take the view from nowhere, and we use it to evaluate the prior and the posterior rule together. Thus, suppose $p$ is your prior and $p'$ is your posterior rule. Then we evaluate them together by taking their joint accuracy to be the sum of their individual accuracies. Thus,$$\mathfrak{I}((p, p'), w) = \mathfrak{I}(p, w) + \mathfrak{I}(p', w)$$Then we have the following fact, where $p'$ is a conditioning rule for $p$ over some partition $\mathcal{E}$ iff, for all $E$ in $\mathcal{E}$, if $p(E)  > 0$, then $p'_E(-) = p(-|E)$:

Theorem Suppose $\mathfrak{I}$ is an additive and continuous strictly proper scoring rule. Then, if $p'$ is not a conditioning rule for $p$ over $\mathcal{E}$, there are $q$ and $q'$ such that$$\mathfrak{I}((p, p'), w) < \mathfrak{I}((q, q'), w)$$for all worlds $w$.

That is, if $p'$ is not a conditioning rule for $p$, then, taken together, they are accuracy-dominated. There is an alternative pair, $q$ and $q'$, that, taken together, are guaranteed to be more accurate than $p$ and $p'$ are, taken together.

Notice that this argument establishes a slightly different norm from the one that the expected accuracy argument secures. The latter is a narrow scope norm: if $p$ is your prior, then your posterior rule should be to condition on $p$ with whatever evidence you learn. The former is a wide scope norm: you should not have prior $p$ and a posterior rule that does not condition on the evidence you learn. This suggests that, if you're sitting at the beginning of your epistemic life and you're picking priors and posterior rules together, as a package, you should pick them so that the posterior rule involves conditioning on the prior with the evidence received. Does it also tell you anything about what to do if you're sitting with your prior already fixed and new evidence comes in? I'm not sure. Here's a reason to think it might. You might think that it's only rational to do at a later time what it was rational to plan to do at an earlier time. If that's right, then we can obtain the narrow scope norm from the wide scope one.

Let's park those questions for the moment. For the approach taken in this argument suggests something else. In the previous post, we asked how to pick your priors, and we hit upon GHC. Now that we have a way of evaluating priors and posterior rules together, perhaps we should just apply GHC to those? Let's see what happens if we do that. As before, assume the best case receives weight $\alpha_1 = 0.5$, the second-best $\alpha_2 = 0.3$, and the third best $\alpha_3 = 0.2$. Then we know that the priors that GHC permits when we consider them on their own without the posteriors plans appended to them are just$$\begin{array}{ccc}(0.5, 0.3, 0.2) & (0.5, 0.2, 0.3) & (0.3, 0.5, 0.2) \\ (0.3, 0.2, 0.5) & (0.2, 0.5, 0.3) &(0.2, 0.3, 0.5)\end{array}$$Now let's consider what happens when we add in the posterior rules for learning $E = \{w_1, w_2\}$ or $\overline{E} = \{w_3\}$. Then it turns out that the minimizers are the priors$$(0.3, 0.2, 0.5)\ \ (0.2, 0.3, 0.5)$$combined with the corresponding conditionalizing posterior rules. Now, since those two priors are among the ones that GHC permits when applied to the priors alone, this might seem consistent with the original approach. The problem is that these priors are specific to the case in which you'll learn either $E$ or $\overline{E}$. If, on the other hand, you'll learn $F = \{w_1\}$ or $\overline{F} = \{w_2, w_3\}$, the permissible priors are$$(0.5, 0.2, 0.3)\ \ (0.5, 0.3, 0.2)$$And, at the beginning of your epistemic life, you don't know which, if either, is correct.

In fact, there's what seems to me a deeper problem. In the previous paragraph we considered a situation in which you might learn either $E$ or $\overline{E}$ or you might learn either $F$ or $\overline{F}$, and you don't know which. But the two options determine different permissible priors. The same thing happens if there are four possible states of the world $\{w_1, w_2, w_3, w_4\}$ and you might learn either $E_1 = \{w_1, w_2\}$ or $E_2 = \{w_3, w_4\}$ or you might learn either $F_1 = \{w_1, w_2\}$ or $F_2 = \{w_3\}$ or $F_3 = \{w_4\}$. Now, suppose you assign the following Hurwicz weights: to the best case, you assign $\alpha_1 = 0.4$, to the second best $\alpha_2 = 0.3$, to the second worst $\alpha_3 = 0.2$ and to the worst $\alpha_1 = 0.1$. Then if you'll learn $E_1 = \{w_1, w_2\}$ or $E_2 = \{w_3, w_4\}$, then the permissible priors are
 $$\begin{array}{cccc}(0.1, 0.4, 0.2, 0.3) & (0.4, 0.1, 0.2, 0.3) & (0.1, 0.4, 0.3, 0.2) & (0.4, 0.1, 0.2, 0.3) \\ (0.2, 0.3, 0.1, 0.4) & (0.2, 0.3, 0.4, 0.1) & (0.3, 0.2, 0.1, 0.4) & (0.2, 0.3, 0.4, 0.1) \end{array}$$But if you'll learn $F_1 = \{w_1, w_2\}$ or $F_2 = \{w_3\}$ or $F_3 = \{w_4\}$, then your permissible priors are
 $$\begin{array}{cccc}(0.1, 0.2, 0.3, 0.4) & (0.1, 0.2, 0.4, 0.3) & (0.2, 0.1, 0.3, 0.4) & (0.2, 0.1, 0.4, 0.3) \end{array}$$That is, there is no overlap between the two. It seems to me that the reason this is such a problem is that it's always been a bit of an oddity that the two accuracy-first arguments for conditionalization seem to depend on this assumption that there is some partition from which your evidence will come. It seems strange that when you learn $E$, in order to determine how to update, you need to know what alternative propositions you might have learned instead. The reason this assumption hasn't proved so problematic so far is that the update rule is in fact not sensitive to the partition. For instance, if I will learn $E_1 = F_1 = \{w_1, w_2\}$, both the Greaves and Wallace argument and the Briggs and Pettigrew argument for conditionalization say that you should update on that in the same way whether or not you might have learned $E_2 = \{w_3, w_4\}$ instead or whether you might have learned $F_2 = \{w_3\}$ or $F_3 = \{w_4\}$ instead. But here the assumption does seem problematic, because the permissible priors are sensitive to what the partition is from which you'll receive your future evidence.

What to conclude from all this? It seems to me that the correct approach is this: choose priors using GHC; choose posterior rules to go with them using the dominance argument that Ray and I gave--that is, update by conditioning.

Comments

  1. This comment has been removed by a blog administrator.

    ReplyDelete
  2. hi, Your post is very helpful for me, If you want to know more about antivirus then you can visit our site Canon Printer ondersteuning belgie for help.

    ReplyDelete
  3. hi, Your post is very helpful for me, finally i found exactly what i want , If you want to know more about antivirus then you can visit our site Kaspersky contact belgie for help.

    ReplyDelete
  4. hi, your post is very helpful for me. Finally, I found exactly what I want. Sometimes the user of Webroot antivirus faces technical issues that can be very harmful to your device. If you want to get some help regarding Webroot antivirus then visit bellen webroot .

    ReplyDelete
  5. Hi, Thank you for sharing such a good and valuable information,It is very important for me. Gmail is the worldwide used email service but sometimes user faces some problems in it. If you want to get some information about the Gmail then you can visit Gmail asiakaspalvelunumero .

    ReplyDelete
  6. Unbelievable blog! This blog provides a brief introduction which is very helpful for me. Instagram is the most usable platform in the world because of its latest features but the user some time confronts some issues on Instagram. For more information, you can visit Instagram tuki yhteystiedot .

    ReplyDelete
  7. Your blog is very informative, finally, I found exactly what I want. Paypal is an excellent service for online payments but lots of its users confront issues while they access Paypal. If you want to resolve your problems then must visit Paypal klantenservice Nederland.

    ReplyDelete

Post a Comment

Popular Posts