Tuesday, 13 April 2021

What we together risk: three vignettes in search of a theory

For a PDF version of this post, see here.

Many years ago, I was climbing Sgùrr na Banachdich with my friend Alex. It's a mountain in the Black Cuillin, a horseshoe of summits that surround Loch Coruisk at the southern end of the Isle of Skye. It's a Munro---that is, it stands over 3,000 feet above sea level---but only just---it measures 3,166 feet. About halfway through our ascent, the mist rolled in and the rain came down heavily, as it often does near these mountains, which attract their own weather system. At that point, my friend and I faced a choice: to continue our attempt on the summit or begin our descent. Should we continue, there were a number of possible outcomes: we might reach the summit wet and cold but not injured, with the mist and rain gone and in their place sun and views across to Bruach na Frìthe and the distinctive teeth-shaped peaks of Sgùrr nan Gillean; or we might reach the summit without injury, but the mist might remain, obscuring any view at all; or we might get injured on the way and either have to descend early under our own steam or call for help getting off the mountain. On the other hand, should we start our descent now, we would of course have no chance of the summit, but we were sure to make it back unharmed, for the path back is good and less affected by rain.

Alex and I had climbed together a great deal that summer and the summer before. We had talked at length about what we enjoyed in climbing and what we feared. To the extent that such comparisons make sense and can be known, we both knew that we both gained exactly the same pleasure from reaching a summit, the same additional pleasure if the view was clear; we gained the same displeasure from injury, the same horror at the thought of having to call for assistance getting off a mountain. What's more, we both agreed exactly on how likely each possible outcome was: how likely we were to sustain an injury should we persevere; how likely that the mist would clear in the coming few hours; and so on. Nonetheless, I wished to turn back, while Alex wanted to continue.

How could that be? We both agreed how good or bad each of the options was, and both agreed how likely each would be were we to take either of the courses of action available to us. Surely we should therefore have agreed on which course of action would maximise our expected utility, and therefore agreed which would be best to undertake. Yes, we did agree on which course of action would maximise our expected utility. However, no, we did not therefore agree on which was best, for there are theories of rational decision-making that do not demand that you must rank options by their expected utility. These are the risk-sensitive decision theories, and they include John Quiggin's rank-dependent decision theory and Lara Buchak's risk-weighted expected utility theory. According to Quiggin's and Buchak's theories, what you consider best is not determined only by your utilities and your probabilities, but also by your attitudes to risk. The more risk-averse will give greater weight to the worst-case scenarios and less to the best-case ones than expected utility demands; the more risk-inclined will give greater weight to the best outcomes and less to the worst than expected utility does; and the risk-neutral person will give exactly the weights prescribed by expected utility theory. So, perhaps I preferred to begin our descent from Sgùrr na Banachdich while Alex preferred to continue upwards because I was risk-averse and he was risk-neutral or risk-seeking, or I was risk-neutral and he was risk-seeking. In any case, he must have been less risk-averse than I was.

Of course, as it turned out, we sat on a mossy rock in the rain and discussed what to do. We decided to turn back. Luckily, as it happened, for a thunderstorm hit the mountains an hour later at just the time we'd have been returning from the summit. But suppose we weren't able to discuss the decision. Suppose we'd roped ourselves together to avoid getting separated in the mist, and he'd taken the lead, forcing him to make the choice on behalf of both of us. In that case, what should he have done?

As I will do throughout these reflections, let me simply report by own reaction to the case. I think, in that case, Alex should have chosen to descend (and not only because that was my preference---I'd have thought the same had it been he who wished to descend and me who wanted to continue!). Had he chosen to continue---even if all had turned out well and we'd reached the summit unharmed and looked over the Cuillin ridge in the sun---I would still say that he chose wrongly on our behalf. This suggests the following principle (in joint work, Ittay Nissan Rozen and Jonathan Fiat argue for a version of this principle that applies in situations in which the individuals do not assign the same utilities to the outcomes):

Principle 1  Suppose two people assign the same utilities to the possible outcomes, and assign the same probabilities to the outcomes conditional on choosing a particular course of action. And suppose that you are required to choose between those courses of action on their behalf. Then you must choose whatever the more risk-averse of the two would choose.

However, I think the principle is mistaken. A few years after our unsuccessful attempt on Sgùrr na Banachdich, I was living in Bristol and trying to decide whether to take up a postdoctoral fellowship there or a different one based in Paris (a situation that seems an unimaginable luxury and privilege when I look at today's academic job market). Staying in Bristol was the safe bet; moving to Paris was a gamble. I already knew what it would be like to live in Bristol and what the department was like. I knew I'd enjoy it a great deal. I'd visited Paris, but I didn't know what it would be like to live there, and I knew the philosophical scene even less. I knew I'd enjoy living there, but I didn't know how much. I figured I might enjoy it a great deal more than Bristol, but also I might enjoy it somewhat less. The choice was complicated because my partner at the time would move too, if that's what we decided to do. Fortunately, just as Alex and I agreed on how much we valued the different outcomes that faced us on the mountain, so my partner and I agreed on how much we'd value staying in Bristol, how much we'd value living in Paris under the first, optimistic scenario, and how much we'd value living there under the second, more pessimistic scenario. We also agreed how likely the two Parisian scenarios were---we'd heard the same friends describing their experiences of living there, and we'd drawn the same conclusions about how likely we were to value the experience ourselves to different extents. Nonetheless, just as Alex and I had disagreed on whether or not to start our descent despite our shared utilities and probabilities, so my partner and I disagreed on whether or not to move to Paris. Again the more risk-averse of the two, I wanted to stay in Bristol, while he wanted to move to Paris. Again, of course, we sat down to discuss this. But suppose that hadn't been possible. Perhaps my partner had to make the decision for both of us at short notice and I was not available to consult. How should he have chosen?

In this case, I think either choice would have been permissible. My partner might have chosen Paris or he might have chosen Bristol and either of these would have been allowed. But of course this runs contrary to Principle 1.

So what is the crucial difference between the decision on Sgùrr na Banachdich and the decision whether to move cities? In each case, there is an option---beginning our descent or staying in Bristol---that is certain to have a particular level of value; and there is an alternative option---continuing to climb or moving to Paris---that might give less value than the sure thing, but might give more. And, in each case, the more risk-averse person prefers the sure thing to the gamble, while the more risk-inclined prefers the gamble. So why must someone choosing for me and Alex in the first case choose to descend, while someone choosing for me and my partner in the second case choose either Bristol or Paris?

Here's my attempt at a diagnosis: in the choice of cities, there is no risk of harm, while in the decision on the mountain, there is. In the first case, the gamble opens up a possible outcome in which we're harmed---we are injured, perhaps quite badly. In the second case, the gamble doesn't do that---we countenance the possibiilty that moving to Paris might not be as enjoyable as remaining in Bristol, but we are certain it won't harm us! This suggests the following principle:

Principle 2  Suppose two people assign the same utilities to the possible outcomes, and assign the same probabilities to the outcomes conditional on choosing a particular course of action. And suppose that you are required to choose between those courses of action on their behalf. Then there are two cases: if one of the available options opens the possibility of a harm, then you must choose whatever the more risk-averse of the two would choose; if neither of the available options opens the possibility of a harm, then you may choose an option if at least one of the two would choose it. 

So risk-averse preferences do not always take precedence, but they do when harms are involved. Why might that be?

A natural answer: to expose someone to the risk of a harm requires their consent. That is, when there is an alternative option that opens no possibility of harm, you are only allowed to choose an option that opens up the possibility of a harm if everyone affected would consent to being subject to that risk. So Alex should only choose to continue our ascent and expose us to the risk of injury if I would consent to that, and of course I wouldn't, since I'd prefer to descend. But my partner is free to choose the move to Paris even though I wouldn't choose that, because it exposes us to no risk of harm.

A couple of things to note: First, in our explanation, reference to risk-aversion, risk-neutrality, and risk-inclination have dropped out. What is important is not who is more averse to risk, but who consents to what. Second, our account will only work if we employ an absolute notion of harm. That is, I must say that there is some threshold and an option harms you if it causes your utility to fall below that threshold. We cannot use a relative notion of harm on which an option harms you if it merely causes your utility to fall. After all, using a relative notion of harm, the move to Paris will harm you should it turn out to be worse than staying in Bristol.

The problem with Principle 2 and the explanation we have just given is that it does not generalise to cases in which more than two people are involved. That is, the following principle seems false:

Principle 3  Suppose each member of a group of people assign the same utilities to the possible outcomes, and assign the same probabilities to the outcomes conditional on choosing a particular course of action. And suppose that you are required to choose between those courses of action on their behalf. Then there are two cases: if one of the available options opens the possibility of a harm, then you must choose whatever the most risk-averse of them would choose; if neither of the available options opens the possibility of a harm, then you may choose an option if at least one member of the group would choose it.

A third vignette might help to illustrate this.

I grew up between two power stations. My high school stood in the shadow of the coal-fired plant at Cockenzie, while the school where my mother taught stood in the lee of the nuclear plant at Torness Point. And I was born two years after the Three Mile Island accident and the Chernobyl tragedy happened as I started school. So the risks of nuclear power were somewhat prominent growing up. Now, let's imagine a community of five million people who currently generate their energy from coal-fired plants---a community like Scotland in 1964, just before its first nuclear plant was constructed. This community is deciding whether to build nuclear plants to replace its coal-fired ones. All agree that having a nuclear plant that suffered no accidents would be vastly preferable to having coal plants, and all agree that a nuclear plant that suffered an accident would be vastly worse than the coal plants. And we might imagine that they also all assign the same probability to the prospective nuclear plants suffering an accident---perhaps they all defer to a recent report from the country's atomic energy authority. But, while they agree on the utilities and the probabilities, they have don't all have the same attitudes to risk. In the end, 4.5million people prefer to build the nuclear facilities, while half a million, who are more risk-averse, prefer to retain the coal-fired alternatives. Principle 3 says that, for someone choosing on behalf of this population, the only option they can choose is to retain the coal-fired plants. After all, a nuclear accident is clearly a harm, and there are individuals who would suffer that harm who would not consent to being exposed to the risk. But surely that's wrong. Surely, despite such opposition, it would be acceptable to build the nuclear plant.

So, while Principle 2 might yet be true, Principle 3 is wrong. And I think my attempt to explain the basis of Principle 2 must be wrong as well, for if it were right, it would also support Principle 3. After all, in no other case I can think of in which a lack of consent is sufficient to block an action does that block disappear if there are sufficiently many people in favour of the action.

So what general principles underpin our reactions to these three vignettes? Why do the preferences of the more risk-averse individuals carry more weight when one of the outcomes involves a harm than when they don't, but not enough weight to overrule a significantly greater number of more risk-inclined individuals? That's the theory I'm in search of here.

Tuesday, 6 April 2021

Believing is said of groups in many ways

For a PDF version of this post, see here.

In defence of pluralism

Recently, after a couple of hours discussing a problem in the philosophy of mathematics, a colleague mentioned that he wanted to propose a sort of pluralism as a solution. We were debating the foundations of mathematics, and he wanted to consider the claim that there might be no single unique foundation, but rather many different foundations, no one of them better than the others. Before he did so, though, he wanted to preface his suggestion with an apology. Pluralism, he admitted, is unpopular wherever it is proposed as a solution to a longstanding philosophical problem. 

I agree with his sociological observation. Philosophers tend to react badly to pluralist solutions. But why? And is the reaction reasonable? This is pure speculative generalisation based on my limited experience, but I've found that the most common source of resistance is a conviction that there is a particular special role that the concept in question must play; and moreover, in that role, whether or not something falls under the concept determines some important issue concerning it. So, in the philosophy of mathematics, you might think that a proof of a mathematical proposition is legitimate just in case it can be carried out in the system that provides the foundation for mathematics. And, if you allow a plurality of foundations of differing logical strength, the legitimacy of certain proof becomes indeterminate---relative to some foundations, they're legit; relative to others, they aren't. Similarly, you might think that a person who accidentally poisons another person is innocent of murder if, and only if, they were justified in their belief that the liquid they administered was not poisonous. And, if you allow a plurality of concepts of justification, then whether or not the person is innocent might become indeterminate. 

I tend to respond to such concerns in two ways. First, I note that, while the special role that my interlocutor picks out for the concept we're discussing is certainly among the roles that this concept needs to play, it isn't the only one; and it is usually not clear why we should take it to be the most important one. One role for a foundation of mathematics is to test the legitimacy of proofs; but another is to provide a universal language that mathematicians might use, and that might help them discover new mathematical truths (see this paper by Jean-Pierre Marquis for a pluralist approach that takes both of these roles seriously).

Second, I note that we usually determine the important issues in question independently of the concept and then use our determinations to test an account of the concept, not the other way around. So, for instance, we usually begin by determining whether we think a particular proof is legitimate---perhaps by asking what it assumes and whether we have good reason for believing that those assumptions are true---and then see whether a particular foundation measures up by asking whether the proof can be carried out within it. We don't proceed the other way around. And we usually determine whether or not a person is innocent independently of our concept of justification---perhaps just by looking at the evidence they had and their account of the reasoning they undertook---and then see whether a particular account of justification measures up by asking whether the person is innocent according to it. Again, we don't proceed the other way around.

For these two reasons, I tend not to be very moved by arguments against pluralism. Moreover, while it's true that pluralism is often greeted with a roll of the eyes, there are a number of cases in which it has gained wide acceptance. We no longer talk of the probability of an event but distinguish between its chance of occurring, a particular individual's credence in it occurring, and perhaps even it's evidential probability relative to a body of evidence. That is, we are pluralists about probability. Similarly, we no longer talk of a particular belief being justified simpliciter, but distinguish between propositional, doxastic, and personal justification. We are, along some dimensions at least, pluralists about justification. We no longer talk of a person having a reason to choose one thing rather than another, but distinguish between their internal and external reasons

I want to argue that we should extend pluralism to so-called group beliefs or collective beliefs. Britain believes lockdowns are necessary to slow the virus. Scotland believes it would fare well economically as an independent country. The University believes the pension fund has been undervalued and requires no further increase in contributions in the near future to meet its obligations in the further future. In 1916, Russia believed Rasputin was dishonest. In each of these sentences, we seem to ascribe a belief to a group or collective entity. When is it correct to do this? I want to argue that there is no single answer. Rather, as Aristotle said of being, believing is said of groups in many ways---that is, a pluralist account is appropriate.

I've been thinking about this recently because I've been reading Jennifer Lackey's fascinating new book, The Epistemology of Groups (all page numbers in what follows refer to that). In it, Lackey offers an account of group belief, justified group belief, group knowledge, and group assertion. I'll focus here only on the first.

Lackey's treatment of group belief

Three accounts of group belief

Lackey considers two existing accounts of group belief as well as her own proposal. 

The first, due to Margaret Gilbert and with amendments by Raimo Tuomela, is a non-summative account that treats groups as having 'a mind of their own'. Lackey calls it the Joint Acceptance Account (JAA). I'll stick with the simpler Gilbert version, since the points I'll make don't rely on Tuomela's more involved amendment (24):

JAA  A group $G$ believes that $p$ iff it is common knowledge in $G$ that the members of $G$ individually have intentionally and openly expressed their willingness jointly to accept that $p$ with the other members of $G$.

The second, due to Philip Pettit, is a summative account that treats group belief as strongly linked to individual belief. Lackey calls it the Premise-Based Aggregation Account (PBAA) (29). Here's a rough paraphrase:

PBAA  A group $G$ believes that $p$ iff there is some collection of propositions $q_1, \ldots, q_n$ such that (i) it is common knowledge among the operative members of $G$ that $p$ is true iff each $q_i$ is true, (ii) for each operative member of $G$, they believe $p$ iff they believe each $q_i$, and (iii) for each $q_i$, the majority of operative members of $G$ believe $q_i$.

Lackey's own proposal is the Group Agent Account (GAA) (48-9):

GAA  A group $G$ believes that $p$ iff (i) there is a significant percentage of $G$'s operative members who believe that $p$, and (ii) are such that adding together the bases of their beliefs that $p$ yields a belief set that is not substantively incoherent.

Group lies (and bullshit) and judgment fragility: two desiderata for accounts of group belief

To distinguish between these three accounts, Lackey enumerates four desiderata for accounts of group belief that she takes to tell against JAA and PBAA and in favour of GAA. The first three are related to an objection to Gilbert's account of group belief that was developed by K. Brad Wray, A. W. M. Meijers, and Raul Hakli in the 2000s. According to this, JAA makes it too easy for groups to actively, consciously, and intentionally choose what they believe: all they need to do is intentionally and openly express their willingness jointly to accept the proposition in question. Lackey notes two consequences of this: (a) on such an account, it is difficult to give a satisfactory account of group lies (or group bullshit, though I'll focus on group lies); (b) on such an account, whether or not a group believes something at a particular time is sensitive to the group's situation at that time in a way that beliefs should not be sensitive.

So Lackey's first desideratum for an account of group belief is that it must be able to accommodate a plausible account of group lies (and the second that it accommodate group bullshit, but as I said I'll leave that for now). Suppose each member of a group strongly believes $p$ on the basis of excellent evidence that they all share, but they also know that the institution will be culpable of a serious crime if it is taken to believe $p$. Then they might jointly agree to accept $\neg p$. And, if they do, Gilbert must say that they do believe $\neg p$. But were they to assert $\neg p$, we would take the group to have lied, which would require that it believes $p$. The point is that, if a group's belief is so thorougly within its voluntary control, it can manipulate it whenever it likes in order to avoid ever lying in situations in which dishonesty would be subject to censure. 

Lackey's third desideratum for an account of group belief is that such belief should not be rendered sensitive in certain ways to the situation in which the group formed it. Suppose that, on the basis of the same shared evidence, a substantial majority of members of a group judge the horse Cisco most likely to win the race, the horse Jasper next most likely, and the horse Whiskey very unlikely to win. But, again on the basis of this same shared body of evidence, the remaining minority of members judge Whiskey most likely to win, Jasper next most likely, and Cisco very unlikely to win. The group would like a consensus before it reports its opinion, but time is short---the race is about to begin, say, and the group has been asked for its opinion before the starting gates open. So, in order to achieve something close to a consensus, it unanimously agrees to accept that Jasper will win, even though he is everyone's second favourite. Yet we might also assume that, had time not been short, the majority would have been able to persuade the minority of Cisco's virtues; and, in that case, they'd unanimously agree to accept that Cisco will win. So, according to Gilbert's account, under time pressure, the group believes Jasper will win, while with world enough and time, they would have believed that Cisco will win. Lackey holds that no account of group belief should make it sensitive to the situation in which it is formed in this way, and thus rejects JAA.

Lackey argues that any account of group belief must satisfy the two desiderata we've just considered. I agree that we need at least one account of group belief that satisfies the first desideratum, but I'm not convinced that all need do this---but I'll leave that for later, when I try to motivate pluralism. For now, I'd like to explain why I'm not convinced that any account needs to satisfy the second desideratum. After all, we know from various empirical studies in social psychology, as well as our experience as thinkers and reasoners and believers, that our ordinary beliefs as individuals are sensitive to the situation in which they're formed in just the sort of way that Lackey wishes to rule out for the beliefs of groups. One of the central theses of Amos Tversky and Daniel Kahneman's work is that we use a different reasoning system when we are forced to make a judgment under time pressure from the one we use when more time is available. So, when my implicit biases are mobilised under time pressure, I might come to believe that a particular job candidate is incompetent, while I might judge them to be competent were I to have more time to assess their track record and override my irrational hasty judgment. And, whenever we are faced with a complex body of evidence that, on the face of it, seems to point in one direction, but which, under closer scrutiny, points in the opposite direction, we will form a different belief if we must do so under time pressure than if we have greater leisure to unpick and balance the different components of the evidence. If individual beliefs can be sensitive to the situation in which they're formed in this way, I see no reason why group beliefs might not also be sensitive in this way.

Before moving on, I'd like to consider whether the PBAA---Pettit's premise-based aggregation account---satisfies Lackey's first desideratum. If it doesn't, it can't be for the same reason that Gilbert's JAA doesn't. After all, according to the PBAA, the group's belief is no more under its voluntary control than the beliefs of its individual members. If, for each $q_i$, a majority believes $q_i$, then the group believes $p$. The only way a group could manipulate its belief is by manipulating the beliefs of its members. But if that sort of manipulation rules out a group belief, Lackey's account is just as vulnerable.

So why does Lackey think that PBAA cannot adequately account for group lies. She considers a case in which the three board members of a tobacco company know that smoking is safe to health iff it doesn't cause lung cancer and it doesn't cause emphysema and it doesn't cause heart disease. The first member believes it doesn't cause lung cancer or heart disease, but believes it does cause emphysema, and so believes it is not safe to health; the second believes it doesn't cause emphysema or heart disease, but it does cause lung cancer, and so believes it is not safe to health; and the third believes it doesn't cause lung cancer or emphysema, but it does cause heart disease, and so believes it is not safe to health. The case is illustrated in Table 1. 


Then each board member believes it is not safe to health, but PBAA says that it is, because a majority (first and third) believe it doesn't cause lung cancer, a majority (second and third) believe it doesn't cause emphysema, and a majority (first and second) believe it doesn't cause heart disease. If the company then asserts that it is safe to health, then Lackey claims that it lies, while PBAA says that it believes the proposition it asserts and so does not lie.

I think this case is a bit tricky. I suspect our reaction to it is influenced by our knowledge of how the real-world version played out and the devastating effect it has had. So let us imagine that this group of three is not the board of a tobacco company, but the scientific committee of a public health organisation. The structure of the case will be exactly the same, and the nature of the organisation should not affect whether or not belief is present. Now suppose that, since the stakes are so high, each member would only come to believe of a specific putative risk that it is not present if their credence that it is not present is above 95%. That is, there is some pragmatic encroachment here to the extent that the threshold for belief is determined in part by the stakes involved. And suppose further that the first member of the scientific committee has credence 99% that smoking doesn't cause lung cancer, 99% that it doesn't cause heart disease, and 93% that it doesn't cause emphysema. And let's suppose that, by a tragic bout of bad luck that has bestowed on them very misleading evidence, the evidence available to them supports these credences. Then their credence that smoking is safe to health must be at most 93%---since the probability of a conjunction must be at most the probability of any of the conjuncts---and thus below 95%. So the first member doesn't believe it is safe to health. And suppose the same for the other two members of the committee, but for the other combinations of risks. So the second is 99% sure it doesn't cause emphysema and 99% sure it doesn't cause heart disease, but only 93% sure it doesn't cause lung cancer. And the third is 99% sure it doesn't cause lung cancer and 99% sure it doesn't cause emphysema, but only 93% sure it doesn't cause heart disease. So none of the three believe that smoking is safe to health. The case is illustrated in Table 2. 


However, just averaging the group's credences in each of the three specific risks, we might say that it is 97% sure that smoking doesn't cause lung cancer, 97% sure it doesn't cause emphysema, and 97% sure it doesn't cause heart disease ($\frac{0.99 + 0.99 + 0.93}{3} = 0.97$). And it is then possible that the group assigns a higher than 95% credence to the conjunction of these three. And, if it does, it seems to me, the PBAA may well get things right, and the group does not lie if it says that smoking carries no health risks.

Nonetheless, I think the PBAA cannot be right. In the example I just described, I noted that, just taking a straight average gives, for each specific risk, a credence of 97% that it doesn't exist. And I noted that it's then possible that the group credence that smoking is safe to health is above 95%. But of course, it's also possible that it's below 95%. This would happen, for instance, if the group were to take the three risks to be independent. Then the group credence that smoking is safe to health would be a little over 91%---too low for the group to believe it given the stakes. But PBAA would still say that the group believes that smoking is safe to health. The point is that PBAA is not sufficiently sensitive to the more fine-grained attitudes to the propositions that lie behind the beliefs in those propositions. Simply knowing what each member believes about the three putative risks is not sufficient to determine what the group thinks about them. You also need to look to their credences.

Of course, there are lots of reasons to dislike straight averaging as a means for pooling credences---it can't preserve judgments of independence, for instance---and lots of reasons to dislike the naive application of a threshold or Lockean view of belief that is in the background here---it gives rise to the lottery paradox. But it seems that, for any reasonable method of probablistic aggregation and any reasonable account of the relationship between belief and credence, there will be cases like this in which the PBAA says the group believes a proposition when it shouldn't. So I agree with Lackey that the PBAA sometimes gets things wrong, but I disagree about exactly when.

Base fragility: a further desideratum

Consider an area of science in which two theories vie for precedence, $T_1$ and $T_2$. Half of the scientists working in this area believe the following:

  • ($A_1$) $T_1$ is simpler than $T_2$,
  • ($B_1$) $T_2$ is more explanatory than $T_1$,
  • ($C_1$) simplicity always trumps explanatory power in theory choice.

These scientists consequently believe $T_1$. The other half of the scientists believe the following: 

  • ($A_2$) $T_2$ is simpler than $T_1$,
  • ($B_2$) $T_1$ is more explanatory than $T_2$,
  • ($C_2$) explanatory power always trumps simplicity in theory choice.

These scientists consequently believe $T_1$. So all scientists believe $T_1$. But they do so for diametrically opposed reasons. Indeed, all of their beliefs about the comparisons between $T_1$ and $T_2$ are in conflict, but because their views about theory choice are also in conflict, they end up believing the same theory. Does the scientific community believe $T_1$? Lackey says no. In order for a group to believe a proposition, the bases of the members' beliefs must not be substantively incoherent. In our example, for half of the members, the basis of their belief in $T_1$ is $A_1\ \&\ B_1\ \&\ C_1$, while for the other half, it's $A_2\ \&\ B_2\ \&\ C_2$. And $A_1$ contradicts $A_2$, $B_1$ contradicts $B_2$, and $C_1$ contradicts $C_2$. The bases are about as incoherent as can be. 

Is Lackey correct to say that the scientific community does not believe in this case? I'm not so sure. For one thing, attributing belief in $T_1$ would help to explain a lot of the group's behaviour. Why does the scientific community fund and pursue research projects that are of interest only if $T_1$ is true? Why does the scientific community endorse and teach from textbooks that give much greater space to expounding and explaining $T_1$? Why do departments in this area hire those with the mathematical expertise required to understand $T_1$ when that expertise is useless for understanding $T_2$? In each case, we might say: because the community believes $T_1$.

Lackey raises two worries about group beliefs based in incoherent bases: (i) they cannot be subject to rational evaluation; (ii) they cannot coherently figure in accounts of collective deliberation. On (ii), it seems to me that the group belief could figure in deliberation. Suppose the community is deliberating about whether to invite a $T_1$-theorist or a $T_2$-theorist to give the keynote address at the major conference in the area. It seems that the group's belief in the superiority of $T_1$ could play a role in the discussions: 'Yes, we want the speaker who will pose the greatest challenge intellectually, but we don't want to hear a string of falsehoods, so let's go with the $T_1$-theorist,' they might reason.

On (i): Lackey asks what we would say if the group were to receive new evidence that $T_1$ has greater simplicity and less explanatory power than we initially thought. For the first half of the group, this would make their belief in $T_1$ more justified; for the second half, it would make their belief less justified. What would it do to the group's belief? Without an account of justification for group belief, it's hard to say. But I don't think the incoherent bases rule out an answer. For instance, we might be reliabilists about group justification. And if we are, then we look at all the times that the members of the group have made judgments about simplicity and explanatory power that have the same pattern as they have time---that is, half one way, half the other---and we look at the proportion of those times that the group belief---formed by whatever aggregation method we favour---has been true. If it's high, then the belief is justified; if it's not, it's not. And we can do that for the group before and after this new evidence comes in. And by doing that, we can compare the level of justification for the group belief.

Of course, this is not to say that reliabilism is the correct account of justification for group beliefs. But it does suggest that incoherent bases don't create a barrier to such accounts.

Varieties of group belief

One thing that is striking when we consider different proposed accounts of group belief is how large the supervenience base might be; that is, how many different features of a group $G$ might partially determine whether or not it believes a proposition $p$. Here's a list, though I don't pretend that it's exhaustive:

(1) The beliefs of individual members of the group

(1a) Some accounts are concerned only with individual members' beliefs in $p$; others are interested in members' beliefs beyond that. For instance, a simple majoritarian account is interested only in members' beliefs in $p$. But Pettit's PBAA is interested instead in members' beliefs in each proposition from a set $q_1, \ldots, q_n$ whose conjunction is equivalent to $p$. And Lackey's GAA is interested in the members' beliefs in $p$ as well as the members' beliefs that form the bases for their belief in $p$ when they do believe $p$.

(1b) Some accounts are concerned with the individual beliefs of all members of the group, some only with so-called operative members. For instance, some will say that what determines whether a company believes $p$ is only whether or not members of their board believe $p$, while others will say that all employees of the company count.

(2) The credences of individual members of the group

There are distinctions corresponding to (1a) and (1b) here as well.

(3) The outcomes of discussions between the members of the group

(3a) Some will say that only discussions that actually take place make a difference---you might say that, before a discussion takes place, the members of the group each believe $p$, but after they discuss it and retain those beliefs, you can say that the group believes $p$; others will say that hypothetical discussions can also make a difference---if individual members would dramatically change their beliefs were they to discuss the matter, that might mean the group does not believe, even if all members do.

(3b) Some will say that it is not the individual members' beliefs after discussion that is important, but their joint decision to accept $p$ as the group's belief. (Margaret Gilbert's JAA is such an account.)

(4) Belief-forming structures within the group

(4a) Some groups are extremely highly structured, and some of these structures relate to group belief formation. Some accounts of group belief acknowledge this by talking of 'operative members' of groups, and taking their attitudes to have greater weight in determining the group's attitude. For instance, it is common to say that the operative members of a company are its board members; the operative members of a British university might be its senior management team; the operative members of a trade union might be its executive committee. But of course many groups have much more complex structures than these. For instance, many large organisations are concerned with complex problems that break down into smaller problems, each of which requires a different sort of expertise to understand. The World Health Organization (WHO) might be such an example, or the Intergovernmental Panel on Climate Change (IPCC), or Médecins san Frontières (MSF). In each case, there might be a rigid reporting structure whereby subcommittees report their findings to the main committee, but each subcommittee might form its own subcommittees that report to them; and there might be strict rules about how the findings of a subcommittee must be taken into account by the committee to which it reports before that committee itself reports upwards. In such a structure, the notion of operative members and their beliefs is too crude to capture what's necessary.

(5) The actions of the group 

(5a) Some might say that a group has a belief just in case it acts in a way that is best explained by positing a group belief. Why does the scientific community persist in appointing only $T_1$-theorists and no $T_2$-theorists? Answer: It believes $T_1$. (I think Kenny Easwaran and Reuben Stern take this view in their recent joint work.)

So, in the case of group beliefs, the disagreement between different accounts does not concern only the conditions on an agree supervenience base; it also concerns the extent of the supervenience base itself. Now, this might soften us up for pluralism, but it is hardly an argument. To give an argument, I'd like to consider a range of possible accounts and, for each, describe a role that group beliefs are typically taken to play and for which this account is best suited.

Group beliefs as summaries

One thing we do when we ascribe beliefs to groups is simply to summarise the views of the group. If I say that, in 1916, Russia believed that Rasputin was dishonest, I simply give a summary of the views of people who belong to the group to which 'Russia' refers in this sentence, namely, Russians alive in 1916. And I say roughly that a substantial majority believed that he was dishonest. 

For this role, a simple majoritarian account (SMA) seems best:

SMA  A group $G$ believes $p$ iff a substantial majority of members of $G$ believes $p$.

There is an interesting semantic point in the background here. Consider the sentence: 'At the beginning of negotiations at Brest-Litovsk in 1917-8, Russia believed Germany's demands would be less harsh than they turned out to be.' We might suppose that, in fact, this belief was not widespread in Russia, but it was almost universal among the Bolshevik government. Then we might nonetheless say that the sentence is true. At first sight, it doesn't seem that SMA can account for this. But it might do if 'Russia' refers to different groups in the two different sentences: to the whole population in 1916 in the first sentence; to the members of the Bolshevik government in the second. 

I'm tempted to think that this happens a lot when we discuss group beliefs. Groups are complex entities, and the name of a group might be used in one sentence to pick out some subset of its structure---just its members, for instance---and in another sentence some other subset of its structure---its members as well as its operative group, for instance---and in another sentence yet some further subset of its structure---its members, its operative group, and the rules by which the operative group abide when they are debating an issue.

Of course, this might look like straightforward synecdoche, but I'm inclined to think it's not, because it isn't clear that there is one default referent of the term 'Russia' such that all other terms are parasitic on that. Rather, there are just many many different group structures that might be picked out by the term, and we have to hope that context determines this with sufficient precision to evaluate the sentence.

Group beliefs as attitudes that play a functional role

An important recent development in our understanding of injustice and oppression has been the recognition of structural forms of racism, sexism, ableism, homophobia, transphobia, and so on. The notion is contested and there are many competing definitions, but to illustrate the point, let me quote from a recent article in the New England Journal of Medicine that considers structural racism in the US healthcare system:

All definitions [of structural racism] make clear that racism is not simply the result of private prejudices held by individuals, but is also produced and reproduced by laws, rules, and practices, sanctioned and even implemented by various levels of government, and embedded in the economic system as well as in cultural and societal norms (Bailey, et al. 2021).

The point is that a group---a university, perhaps, or an entire healthcare system, or a corporation---might act as if it holds racist or sexist beliefs, even though no majority of its members holds those beliefs. A university might pay academics who are women less, promote them less frequently, and so on, even while few individuals within the organisation, and certainly not a majority, believe that women's labour is worth less, and that women are less worthy of promotion. In such a case, we might wish to ascribe those beliefs to the institution as a whole. After all, on certain functionalist accounts of belief, to have a belief simply is to be in a state that has certain casual relationships with other states, including actions. And the state of a group is determined not only by the state of the individuals within it but also by the other structural features of the group, such as its laws, rules and practices. And if the states of the individuals within the group, combined with these laws, rules and practices give rise to the sort of behaviour that we would explain in a individual by positing a belief, it seems reasonable to do so in the group case as well. What's more, doing so helps to explain group behaviour in just the same way that ascribing beliefs to individuals helps to explain their behaviour. (As mentioned above, I take it that Kenny Easwaran and Reuben Stern take something like this view of group belief.)

Group beliefs as ascriptions that have legal standing

In her book, Lackey pays particular attention to cases of group belief that are relevant to corporate culpability and liability. In the 1970s, did the tobacco company Philip Morris believe that their product is hazardous to health, even while they repeatedly denied it? Between 1998 and 2014, did Volkswagen believe that their diesel emissions reports were accurate? In 2003, did the British government believe that Iraq could deploy biological weapons within forty-five minutes of an order to do so? Playing this role well is an important job for an account of group belief. It can have very significant real world consequences: Do those who trusted the assertions of tobacco companies and became ill as a result receive compensation? Do governments have a case against car manufacturers? Should a government stand down?

In fact, I think the consequences are often so large and, perhaps more importantly, so varied that the decision whether or not to put them in train should not depend on the applicability of a single concept with a single precise definition. Consider cases of corporate culpability. There are many ways in which this might be punished. We might fine the company. We might demand that it change certain internal policies or rules. We might demand that it change its corporate structure. We might do many things. Some will be appropriate and effective if the company believes a crucial proposition in one sense; some appropriate if it believes that proposition in some other sense. For instance, a fine does many things, but among them is this: it affects the wealth of the company's shareholders, who will react by putting pressure on the company's board. Thus, it might be appropriate to impose a fine if we think that the company believed the proposition that it denied in its public assertions in the sense that a substantial majority of its board believed it. On the other hand, demanding that the company change certain internal policies or rules would be appropriate if the company believes the proposition that it publicly denied in the sense that it is the outcome of applying its belief-forming rules and policies (such as, for instance, the nested set of subcommittees that I imagined for the WHO or the IPPC or MSF above).

The point is that our purpose in ascribing culpability and liability to a group is essentially pragmatic. We do it in order to determine what sort of punishment we might mete out. This is perhaps in contrast to cases of individual culpability and liability, where we are interested also in the moral status of the individual's action independent of how we respond to it. But, in many cases, such as when a corporation has lied, which punishment is appropriate depends on which of the many ways in which a group can believe the company believed the negation of the proposition it asserted in its lie.

So it seems to me that, even if this role were the only role that our concept of group belief had to play, pluralism would be appropriate. Groups are complex entities and there are consequently many ways in which we can seek to change them in order to avoid the sorts of harms that arise when they behave badly. We need different concepts of group belief in order to identify which is appropriate in a given case.

It's perhaps worth noting that, while Lackey's opens her book with cases of corporate culpability, and this is a central motivation for her emphasis on group lying, it isn't clear to me that her group agent account (GAA) can accommodate all cases of corporate lies. Consider the following situation. The board of a tobacco company is composed of eleven people. Each of them believes that tobacco is hazardous to health. However, some believe it for very different reasons from the others. They have all read the same scientific literature on the topic, but six of them remember it correctly and the other five remember it incorrectly. The six who remember it correctly remember that tobacco contains chemical A and remember that when chemical A comes into contact with tissue X in the human body, it causes cancer in that tissue; and they also remember that tobacco does not contain chemical B and they remember that, when chemical B comes into contact with tissue Y in the human body, it does not cause cancer in that tissue. The five who remember the scientific literature incorrectly believe that tobacco contains chemical B and believe that when chemical B comes into contact with tissue Y in the human body, it causes cancer in that tissue; and they also believe that tobacco does not contain chemical A and they believe that, when chemical A comes into contact with tissue X in the human body, it does not cause cancer in that tissue. So, all board members believe that smoking causes cancer. However, the bases of their beliefs forms an incoherent set. The two propositions on which the six base their belief directly contradict the two propositions on which the five base theirs. The board then issues a statement saying that tobacco does not cause cancer. The board is surely lying, but according to GAA, they are not because the bases of their beliefs conflict and so they do not believe that tobacco does cause cancer.

Sunday, 14 March 2021

Permissivism and social choice: a response to Blessenohl

In a recent paper discussing Lara Buchak's risk-weighted expected utility theory, Simon Blessenohl notes that the objection he raises there to Buchak's theory might also tell against permissivism about rational credence. I offer a response to the objection here.

In his objection, Blessenohl suggests that credal permissivism gives rise to an unacceptable tension between the individual preferences of agents and the collective preferences of the groups to which those agents belong. He argues that, whatever brand of permissivism about credences you tolerate, there will be a pair of agents and a pair of options between which they must choose such that both agents will prefer the first to the second, but collectively they will prefer the second to the first. He argues that this consequence tells against permissivism. I respond that this objection relies on an equivocation between two different understandings of collective preferences: on the first, they are an attempt to summarise the collective view of the group; on the second, they are the preferences of a third-party social chooser tasked with making decisions on behalf of the group. I claim that, on the first understanding, Blessenohl's conclusion does not follow; and, on the second, it follows but is not problematic.

It is well known that, if two people have difference credences in a given proposition, there is a sense in which the pair of them, taken together, is vulnerable to a sure loss set of bets.* That is, there is a bet that the first will accept and a bet that the second will accept such that, however the world turns out, they'll end up collectively losing money. Suppose, for instance, that Harb is 90% confident that Ladybug will win the horse race that is about to begin, while Jay is only 60% confident. Then Harb's credences should lead him to buy  a bet for £80 that will pay out £100 if Ladybug wins and nothing if she loses, while Jay's credences should lead him to sell that same bet for £70 (assuming, as we will throughout, that the utility of £$n$ is $n$). If Ladybug wins, Harb ends up £20 up and Jay ends up £30 down, so they end up £10 down  collectively. And if Ladybug loses, Harb ends up £80 down while Jay ends up £70 up, so they end up £10 down as a pair.

So, for individuals with different credences in a proposition, there seems to be a tension between how they would choose as individuals and how they would choose as a group. Suppose they are presented with a choice between two options: on the first, $A$, both of them enter into the bets just described; on the second, $B$, neither of them do. We might represent these two options as follows, where we assume that Harb's utility for receiving £$n$ is $n$, and the same for Jay:$$A = \begin{pmatrix}
20 & -80 \\
-30 & 70
\end{pmatrix}\ \ \
B = \begin{pmatrix}
0 & 0 \\
0 & 0
\end{pmatrix}$$The top left entry is Harb's winnings if Ladybug wins, the top right is Harb's winnings if she loses; the bottom left is Jay's winnings if she wins, and the bottom left is Jay's winnings if she loses. So, given a matrix $\begin{pmatrix} a & b \\ c & d \end{pmatrix}$, each row represents a gamble---that is, an assignment of utilities to each state of the world---and each column represents a utility distribution---that is, an assignment of utilities  to each individual. So $\begin{pmatrix} a & b \end{pmatrix}$ represents the gamble that the option bequeaths to Harb---$a$ if Ladybug wins, $b$ if she loses---while $\begin{pmatrix} c & d \end{pmatrix}$ represents the gamble bequeathed to Jay---$c$ if she wins, $d$ if she loses. And $\begin{pmatrix} a \\ c \end{pmatrix}$ represents the utility distribution if Ladybug wins---$a$ to Harb, $c$ to Jay---while $\begin{pmatrix} b \\ d \end{pmatrix}$ represents the utility distribution if she loses---$b$ to Harb, $d$ to Jay. Summing the entries in the first column gives the group's collective utility if Ladybug wins, and summing the entries in the second column gives their collective utility if she loses.

Now, suppose that Harb cares only for the utility that he will gain, and Jay cares only his own utility; neither cares at all about the other's welfare. Then each prefers $A$ to $B$. Yet, considered collectively, $B$ results in greater total utility for sure: for each column, the sum of the entries in that column in $B$ (that is, $0$) exceeds the sum in that column in $A$ (that is, $-10$). So there is a tension between what the members of the group unanimously prefer and what the group prefers.

Now, to create this tension, I assumed that the group prefers one option to another if the total utility of the first is sure to exceed the total utility of the second. But this is quite a strong claim. And, as Blessenohl notes, we can create a similar tension by assuming something much weaker.

Suppose again that Harb is 90% confident that Ladybug will win while Jay is only 60% confident that she will. Now consider the following two options:$$A' = \begin{pmatrix}
20 & -80 \\
0 & 0
\end{pmatrix}\ \ \
B' = \begin{pmatrix}
5 & 5 \\
25 & -75
\end{pmatrix}$$In $A'$, Harb pays £$80$ for a £$100$ bet on Ladybug, while in $B'$ he receives £$5$ for sure. Given his credences, he should prefer $A'$ to $B'$, since the expected utility of $A'$ is $10$, while for $B'$ it is $5$. And in $A'$, Jay receives £0 for sure, while in $B'$ he pays £$75$ for a £$100$ bet on Ladybug. Given his credences, he should prefer $A'$ to $B'$, since the expected utility of $A'$ is $0$, while for $B'$ it is $-15$. But again we see that $B'$ will nonetheless end up producing greater total utility for the pair---$30$ vs $20$ if Ladybug wins, and $-70$ vs $-80$ if Ladybug loses. But we can argue in a different way that the group should prefer $B'$ to $A'$. This different way of arguing for this conclusion is the heart of Blessenohl's result.

In what follows, we write $\preceq_H$ for Harb's preference ordering, $\preceq_J$ for Jay's, and $\preceq$ for the group's. First, we assume that, when one option gives a particular utility $a$ to Harb for sure and a particular utility $c$ to Jay for sure, then the group should be indifferent between that and the option that gives $c$ to Harb for sure and $a$ to Jay for sure. That is, the group should be indifferent between an option that gives the utility distribution  $\begin{pmatrix} a \\ c\end{pmatrix}$ for sure and an option that gives $\begin{pmatrix} c \\ a\end{pmatrix}$ for sure.  Blessenohl calls this Constant Anonymity:

Constant Anonymity  For any $a, c$,$$\begin{pmatrix}
a & a \\
c & c
\end{pmatrix} \sim
\begin{pmatrix}
c & c \\
a & a
\end{pmatrix}$$This allows us to derive the following:$$\begin{pmatrix}
20 & 20 \\
0 & 0
\end{pmatrix} \sim
\begin{pmatrix}
0 & 0 \\
20 & 20
\end{pmatrix}\ \ \ \text{and}\ \ \
\begin{pmatrix}
-80 & -80 \\
0 & 0
\end{pmatrix} \sim
\begin{pmatrix}
0 & 0 \\
-80 & -80
\end{pmatrix}$$And now we can introduce our second principle:

Preference Dominance  For any $a, b, c, d, a', b', c', d'$, if$$\begin{pmatrix}
a & a \\
c & c
\end{pmatrix} \preceq
\begin{pmatrix}
a' & a' \\
c' & c'
\end{pmatrix}\ \ \ \text{and}\ \ \
\begin{pmatrix}
b & b \\
d & d
\end{pmatrix} \preceq
\begin{pmatrix}
b' & b' \\
d' & d'
\end{pmatrix}$$then$$\begin{pmatrix}
a & b \\
c & d
\end{pmatrix} \preceq
\begin{pmatrix}
a' & b' \\
c' & d'
\end{pmatrix}$$Preference Dominance says that, if the group prefers obtaining the utility distribution  $\begin{pmatrix} a \\ c\end{pmatrix}$ for sure to obtaining the utility distribution  $\begin{pmatrix} a' \\ c'\end{pmatrix}$ for sure, and prefers obtaining the utility distribution  $\begin{pmatrix} b \\ d\end{pmatrix}$ for sure to obtaining the utility distribution  $\begin{pmatrix} b' \\ d'\end{pmatrix}$ for sure, then they prefer obtaining $\begin{pmatrix} a \\ c\end{pmatrix}$ if Ladybug wins and $\begin{pmatrix} b \\ d\end{pmatrix}$ if she loses to obtaining $\begin{pmatrix} a' \\ c'\end{pmatrix}$ if Ladybug wins and $\begin{pmatrix} b' \\ d'\end{pmatrix}$ if she loses.

Preference Dominance, combined with the indifferences that we derived from Constant Anonymity, gives$$\begin{pmatrix}
20 & -80 \\
0 & 0
\end{pmatrix} \sim
\begin{pmatrix}
0 & 0 \\
20 & -80
\end{pmatrix}$$And then finally we introduce a closely related principle: 

Utility Dominance  For any $a, b, c, d, a', b', c', d'$, if $a < a'$, $b < b'$, $c < c'$, and $d < d'$, then$$\begin{pmatrix}
a & b \\
c & d
\end{pmatrix} \prec
\begin{pmatrix}
a' & b' \\
c' & d'
\end{pmatrix}$$

This simply says that if one option gives more utility than another to each individual at each world, then the group should prefer the first to the second. So$$\begin{pmatrix}
0 & 0 \\
20 & -80
\end{pmatrix} \prec
\begin{pmatrix}
5 & 5 \\
25 & -75
\end{pmatrix}$$Stringing these together, we have$$A' = \begin{pmatrix}
20 & -80 \\
0 & 0
\end{pmatrix} \sim
\begin{pmatrix}
0 & 0 \\
20 & -80
\end{pmatrix} \prec
\begin{pmatrix}
5 & 5 \\
25 & -75
\end{pmatrix} = B'$$And thus, assuming that $\preceq$ is transitive, while Harb and Jay both prefer $A'$ to $B'$, the group prefers $B'$ to $A'$.

More generally, Blessenohl proves an impossibility result. Add to the principles we have already stated the following:

Ex Ante Pareto  If $A \preceq_H B$ and $A \preceq_J B$, then $A \preceq B$.

And also:

Egoism  For any $a, b, c, d, a', b', c', d'$,$$\begin{pmatrix}
a & b \\
c & d
\end{pmatrix} \sim_H \begin{pmatrix}
a & b \\
c' & d'
\end{pmatrix}\ \ \ \text{and}\ \ \
\begin{pmatrix}
a & b \\
c & d
\end{pmatrix} \sim_J \begin{pmatrix}
a' & b' \\
c & d
\end{pmatrix}$$That is, Harb cares only about the utilities he obtains from an option, and Jay cares only about the utilities that he obtains. And finally:

Individual Preference Divergence  There are $a, b, c, d$ such that$$\begin{pmatrix}
a & b \\
a & b
\end{pmatrix} \prec_H \begin{pmatrix}
c & d \\
c & d
\end{pmatrix}\ \ \ \text{and}\ \ \
\begin{pmatrix}
a & b \\
a & b
\end{pmatrix} \succ_J \begin{pmatrix}
c & d \\
c & d
\end{pmatrix}$$Then Blessenohl shows that there are no preferences $\preceq_H$, $\preceq_J$, and $\preceq$ that satisfy Individual Preference Divergence, Egoism, Ex Ante Pareto, Constant Anonymity, Preference Dominance, and Utility Dominance.** And yet, he claims, each of these is plausible. He suggests that we should give up Individual Preference Divergence, and with it permissivism and risk-weighted expected utility theory.

Now, the problem that Blessenohl identifies arises because Harb and Jay have different credences in the same proposition. But of course impermissivists agree that two rational individuals can have different credences in the same proposition. So why is this a problem specifically for permissivism? The reason is that, for the impermissivist, if two rational individuals have different credences in the same proposition, they must have different evidence. And for individuals with different evidence, we wouldn't necessarily want the group preference to preserve unanimous agreement between the individuals. Instead, we'd want the group to choose using whichever credences are rational in the light of the joint evidence obtained by pooling the evidence held by each individual in the group. And those might render one option preferable to the other even though each of the individuals, with their less well informed credences, prefer the second option to the first. So Ex Ante Pareto is not plausible when the individuals have different evidence, so impermissivism is safe.

To see this, consider the following example: There are two medical conditions, $X$ and $Y$, that affect racehorses. If they have $X$, they're 90% likely to win the race; if they have $Y$, they're 60% likely; if they have both, they're 10% likely to win. Suppose Harb knows that Ladybug has $X$, but has no information about whether she has $Y$; and suppose Jay knows Ladybug has $Y$ and no information about $X$. Then both are rational. And both prefer $A$ to $B$ from above. But we wouldn't expect the group to prefer $A$ to $B$, since the group should choose using the credence it's rational to have if you know both that Ladybug has $X$ and that she has $Y$; that is, the group should choose by pooling the individual's evidence to give the group evidence, and then choose using the probabilities relative to that. And, relative to that evidence, $B$ is preferable to $A$.

The permissivist, in contrast, cannot make this move. After all, for them it is possible for two rational individuals to disagree even though they have exactly the same evidence, and therefore the same pooled evidence. Blessenohl considers various ways the permissivist or the risk-weighted expected utility theorist might answer his objection, either by denying Ex Ante Pareto or Preference or Utility Dominance. He considers each response unsuccessful, and I tend to agree with his assessments. However, oddly, he explicitly chooses not to consider the suggestion that we might drop Constant Anonymity. I'd like to suggest that we should consider doing exactly that.

I think Blessenohl's objection relies on an ambiguity in what the group preference ordering $\preceq$ represents. On one understanding, it is no more than an attempt to summarise the collective view of the group; on another, it represents the preferences of a third party brought in to make decisions on behalf of the group---the social chooser, if you will. I will argue that Ex Ante Pareto is plausible on the first understanding, but Constant Anonymity isn't; and Constant Anonymity is plausible on the second understanding, but Ex Ante Pareto isn't.

Let's treat the first understanding of $\preceq$. On this, $\preceq$ represents the group's collective opinions about the options on offer. So just as we might try to summarise the scientific community's view on the future trajectory of Earth's average surface temperate or the mechanisms of transmission for SARS-CoV-2 by looking at the views of individual scientists, so might we try to summarise Harb and Jay's collective view of various options by looking at their individual views. Understood in this way, Constant Anonymity does not look plausible. Its motivation is, of course, straightforward. If $a < b$ and$$\begin{pmatrix}
a & a \\
b & b
\end{pmatrix} \prec
\begin{pmatrix}
b & b \\
a & a
\end{pmatrix}$$then the group's collective view unfairly and without justification favours Harb over Jay. And if$$\begin{pmatrix}
a & a \\
b & b
\end{pmatrix} \succ
\begin{pmatrix}
b & b \\
a & a
\end{pmatrix}$$then it unfairly and without justification favours Jay over Harb. So we should rule out both of these. But this doesn't entail that the group preference should be indifferent between these two options. That is, it doesn't entail that we should have$$\begin{pmatrix}
a & a \\
b & b
\end{pmatrix} \sim
\begin{pmatrix}
b & b \\
a & a
\end{pmatrix}$$After all, when you compare two options $A$ and $B$, there are four possibilities:

  1. $A \preceq B$ and $B \preceq A$---that is, $A \sim B$;
  2. $A \preceq B$ and $B \not \preceq A$---that is, $A \prec B$;
  3. $A \not \preceq B$ and $B \preceq A$---that is, $A \succ B$;
  4. $A \not \preceq B$ and $B \not \preceq A$---that is, $A$ and $B$ and not compatible.

The argument for Constant Anonymity rules out (2) and (3), but it does not rule out (4). What's more, it's easy to see that, if we weaken Constant Anonymity so that it requires (1) or (4) rather than requiring (1), then we see that all of the principles are consistent with it. So introduce Weak Constant Anonymity:

Weak Constant Anonymity  For any $a, c$, then either$$\begin{pmatrix}
a & a \\
c & c
\end{pmatrix} \sim
\begin{pmatrix}
c & c \\
a & a
\end{pmatrix}$$or$$\begin{pmatrix}
a & a \\
c & c
\end{pmatrix}\ \ \text{and}\ \
\begin{pmatrix}
c & c \\
a & a
\end{pmatrix}\ \  \text{are incomparable}$$

Then define the preference ordering $\preceq^*$ as follows:$$A \preceq^* B \Leftrightarrow \left ( A \preceq_H B\ \&\ A \preceq_J B \right )$$Then $\preceq^*$ satisfies Ex Ante Pareto, Weak Constant Anonymity, Preference Dominance, and Utility Dominance. And indeed $\preceq^*$ seems a very plausible candidate for the group preference ordering understood in this first way: where Harb and Jay disagree, it simply has no opinion on the matter; it has opinions only where Harb and Jay agree, and then it shares their shared opinion.

On the understanding of $\preceq$ as summarising the group's collective view, if  $\begin{pmatrix}
a & a \\
c & c
\end{pmatrix} \sim
\begin{pmatrix}
c & c \\
a & a
\end{pmatrix}$ then the group collectively thinks that this option $\begin{pmatrix}
a & a \\
c & c
\end{pmatrix}$ is exactly as good as this option $\begin{pmatrix}
c & c \\
a & a
\end{pmatrix}$. But the group absolutely does not think that. Indeed, Harb and Jay both explicitly deny it, though for opposing reasons. So Constant Anonymity is false.

Let's turn next to the second understanding. On this, $\preceq$ is the preference ordering of the social chooser. Here, the original, stronger version of Constant Anonymity seems more plausible. After all, unlike the group itself, the social chooser should have the sort of positive commitment to equality and fairness that the group definitively does not have. As we noted above, Harb and Jay unanimously reject the egalitarian assessment represented by $\begin{pmatrix}
a & a \\
c & c
\end{pmatrix} \sim
\begin{pmatrix}
c & c \\
a & a
\end{pmatrix}$. They explicitly both think that these two options are not equally good---if $a < c$, then Harb thinks the second is strictly better, while Jay thinks the first is strictly better. So, as we argued above, we take the group view to be that they are incomparable. But the social chooser should not remain so agnostic. She should overrule the unanimous rejection of the indifference relation between them and accept it. But, having thus overruled one unanimous view and taken a different one, it is little surprise that she will reject other unanimous views, such as Harb and Jay's unanimous view that $A'$ is better than $B'$ above. That is, it is little surprise that she should violate Ex Ante Pareto. After all, her preferences are not only informed by a value that Harb and Jay do not endorse; they are informed by a value that Harb and Jay explicitly reject, given our assumption of Egoism. This is the value of fairness, which is embodied in the social chooser's preferences in Constant Anonymity and rejected in Harb's and Jay's preferences by Egoism. If we require of our social chooser that they adhere to this value, we should not expect Ex Ante Pareto to hold.

* See Philippe Mongin's 1995 paper 'Consistent Bayesian Aggregation' for wide-ranging results in this area.

** Here's the trick: if$$\begin{pmatrix}
a & b \\
a & b
\end{pmatrix} \prec_H \begin{pmatrix}
c & d \\
c & d
\end{pmatrix}\ \ \ \text{and}\ \ \
\begin{pmatrix}
a & b \\
a & b
\end{pmatrix} \succ_J \begin{pmatrix}
c & d \\
c & d
\end{pmatrix}$$
Then let$$A' = \begin{pmatrix}
c & d \\
a & b
\end{pmatrix}\ \ \ \text{and}\ \ \
B' = \begin{pmatrix}
a & b \\
c & d
\end{pmatrix}$$Then $A' \succ_H B'$ and $A' \succ_J B'$, but $A' \sim B'$.

Wednesday, 6 January 2021

Life on the edge: a response to Schultheis' challenge to epistemic permissivism about credences

In their 2018 paper, 'Living on the Edge', Ginger Schultheis issues a powerful challenge to epistemic permissivism about credences, the view that there are bodies of evidence in response to which there are a number of different credence functions it would be rational to adopt. The heart of the argument is the claim that a certain sort of situation is impossible. Schultheis thinks that all motivations for permissivism must render situations of this sort possible. Therefore, permissivism must be false, or at least these motivations for it must be wrong.

Here's the situation, where we write $R_E$ for the set of credence functions that it is rational to have when your total evidence is $E$. 

  • Our agent's total evidence is $E$.
  • There is $c$ in $R_E$ that our agent knows is a rational response to $E$.
  • There is $c'$ in $R_E$ that our agent does not know is a rational response to $E$.

Schultheis claims that the permissivist must take this to be possible, whereas in fact it is impossible. Here are a couple of specific examples that the permissivist will typically take to be possible.

Example 1: we might have a situation in which the credences it is rational to assign to a proposition $X$ in response to evidence $E$ form the interval $[0.4, 0.7]$. But we might not be sure of quite the extent of the interval. For all we know, it might be $[0.41, 0.7]$ or $[0.39, 0.71]$. Or it might be $[0.4, 0.7]$. So we are sure that $0.5$ is a rational credence in $X$, but we're not sure whether $0.4$ is a rational credence in $X$. In this case, $c(X) = 0.5$ and $c'(X) = 0.4$.

Example 2: you know that Probablism is a rational requirement on credence functions, and you know that satisfying the Principle of Indifference is rationally permitted, but you don't know whether or not it is also rationally required. In this case, $c$ is the uniform distribution required by the Principle of Indifference, but $c'$ is any other probability function.

Schultheis then appeals to a principle called Weak Rationality Dominance. We say that one credence function $c$ rationally dominates another $c'$ if $c$ is rational in all worlds in which $c'$ is rational, and also rational in some worlds in which $c'$ is not rational. Weak Rationality Dominance says that it is irrational to adopt a rationally dominated credence function. The important consequence of this for Schultheis' argument is that, if you know that $c$ is rational, but you don't know whether $c'$ is, then $c'$ is irrational. As a result, in our example above, $c'$ is not rational, contrary to what the permissivist claims, because it is rationally dominated by $c$. So permissivism must be false.

If Weak Rationality Dominance is correct, then, it follows that the permissivist must say that, for any body of evidence $E$ and set $R_E$ of rational responses, the agent with evidence $E$ either must know of each credence function in $R_E$ that it is in $R_E$, or they must not know of any credence function in $R_E$ that it is in $R_E$. If they know of some credence functions in $R_E$ that they are in $R_E$ and not know of others in $R_E$ that they are in $R_E$, then they clash with Weak Rationality Dominance. But, whatever your reason for being a permissivist, it seems very likely that it will entail situations in which there are some credence functions that are rational responses to your evidence and that you know are such responses, while you are unsure about other credence functions that are, in fact, rational responses whether or not they are, in fact, rational responses. This is Schultheis' challenge.

I'd like to explore a response to Schultheis' argument that takes issue with Weak Rationality Dominance (WRD). I'll spell out the objection in general to begin with, and then see how it plays out for a specific motivation for permissivism, namely, the Jamesian motivation I sketched in this previous blogpost

One worry about WRD is that it seems to entail a deference principle of exactly the sort that I objected to in this blogpost. According to such deference principles, for certain agents in certain situations, if they learn of a credence function that it is rational, they should adopt it. For instance, Ben Levinstein claims that, if you are certain that you are irrational, and you learn that $c$ is rational, then you should adopt $c$ -- or at least you should have the conditional credences that would lead you to do this if you were to apply conditionalization. We might slightly strengthen Levinstein's version of the deference principle as follows: if you are unsure whether you are rational or not, and you learn that $c$ is rational, then you should adopt $c$. WRD entails this deference principle. After all, suppose you have credence function $c'$, and you are unsure whether or not it is rational. And suppose you learn that $c$ is rational (and don't thereby learn that $c'$ is as well). Then, according to Schultheis' principle, you are irrational if you stick with $c'$.

In the previous blogpost, I objected to Levinstein's deference principle, and others like it, because it relies on the assumption that all rational credence functions are better than all irrational credence functions. I think that's false. I think there are certain sorts of flaw that render you irrational, and lacking those flaws renders you rational. But lacking those flaws doesn't ensure that you're going to be better than someone who has those flaws. Consider, for instance, the extreme subjective Bayesian who justifies their position using an accuracy dominance argument of the sort pioneered by Jim Joyce. That is, they say that accuracy is the sole epistemic good for credence functions. And they say that non-probabilistic credence functions are irrational because, for any such credence function, there are probabilistic ones that accuracy dominate them; and all probabilistic credence functions are rational because, for any such credence function, there is no probabilistic one that accuracy dominates it. Now, suppose I have credence $0.91$ in $X$ and $0.1$ in $\overline{X}$. And suppose I am either sure that this is irrational, or I'm uncertain it is. I then learn that assigning credence $0.1$ to $X$ and $0.9$ to $\overline{X}$ is rational. What should I do? It isn't at all obvious to me that I should move from my credence function to the one I've learned is rational. After all, even from my slightly incoherent standpoint, it's possible to see that the rational one is going to be a lot less accurate than mine if $X$ is true, and I'm very confident that it is. 

So I think that the rational deference principle is wrong, and therefore any version of WRD that entails it is also wrong. But perhaps there is a more restricted version of WRD that is right. And one that is nonetheless capable of sinking permissivism. Consider, for instance, a restricted version of WRD that applies only to agents who have no credence function --- that is, it applies to your initial choice of a credence function; it does not apply when you have a credence function and you are deciding whether to adopt a new one. This makes a difference. The problem with a version that applies when you already have a credence function $c'$ is that, even if it is irrational, it might nonetheless be better than the rational credence function $c$ in some situation, and it might be that $c'$ assigns a lot of credence to that situation. So it's hard to see how to motivate the move from $c'$ to $c$. However, in a situation in which you have no credence function, and you are unsure whether $c'$ is rational (even though it is) and you're certain that $c$ is rational (and indeed it is), WRD's demand that you should not pick $c'$ seems more reasonable. You occupy no point of view such that $c'$ is less of a depature from that point of view than $c$ is. You know only that $c$ lacks the flaws for sure, whereas $c'$ might have them. Better, then, to go for $c$, is it not? And if it is, this is enough to defeat permissivism.

I think it's not quite that simple. I noted above that Levinstein's deference principle relies on the assumption that all rational credence functions are better than all irrational credence functions. Schultheis' WRD seems to rely on something even stronger, namely, the assumption that all rational credence functions are equally good in all situations. For suppose they are not. You might then be unsure whether $c'$ is rational (though it is) and sure that $c$ is rational (and it is), but nonetheless rationally opt for $c'$ because you know that $c'$ has some good feature that you know $c$ lacks and you're willing to take the risk of having an irrational credence function in order to open the possibility of having that good feature.

Here's an example. You are unsure whether it is rational to assign $0.7$ to $X$ and $0.3$ to $\overline{X}$. It turns out that it is, but you don't know that. On the other hand, you do know that it is rational to assign 0.5 to each proposition. But the first assignment and the second are not equally good in all situations. The second has the same accuracy whether $X$ is true or false; the first, in constrast, is better than the first if $X$ is true and worse than the first if $X$ is false. The second does not open up the possibility of high accuracy that the first does; though, to compensate, it also precludes the possibility of low accuracy, which the first doesn't. Surveying the situation, you think that you will take the risk. You'll adopt the first, even though you aren't sure whether or not it is rational. And you'll do this because you want the possibility of being rational and having that higher accuracy. This seems a rational thing to do. So, it seems to me, WRD is false.

Although I think this objection to WRD works, I think it's helpful to see how it might play out for a particular motivation for permissivism. Here's the motivation: Some credence functions offer the promise of great accuracy -- for instance, assigning 0.9 to $X$ and 0.1 to $\overline{X}$ will be very accurate if $X$ is true. However, those that do so also open the possibility of great inaccuracy -- if $X$ is false, the credence function just considered is very inaccurate. Other credence functions neither offer great accuracy nor risk great inaccuracy. For instance, assigning 0.5 to both $X$ and $\overline{X}$ guarantees the same inaccuracy whether or not $X$ is true. You might say that you are more risk-averse the lower is the maximum possible inaccuracy you are willing to risk. Thus, the options that are rational for you are those undominated options with maximum inaccuracy at most whatever the threshold is that you set. Now, suppose you use the Brier score to measure your inaccuracy -- so that the inaccuracy of the credence function $c(X) = p$ and $c(\overline{X}) = 1-p$ is $2(1-p)^2$ if $X$ is true and $2p^2$ if $X$ is false. And suppose you are willing to tolerate a maximum possible inaccuracy of $0.5$, which also gives you a mininum inaccuracy of $0.5$. In that case, only $c(X) = 0.5 = c(\overline{X})$ will be rational from the point of view of your risk attitudes --- since $2(1-0.5)^2 = 0.5 = 2(0.5^2)$. On the other hand, suppose you are willing to tolerate a maximum inaccuracy of $0.98$, which also gives you a minimum inaccuracy of $0.18$. In that case, any credence function $c$ with $0.3 \leq c(X) \leq 0.7$ and $c(\overline{X}) = 1-c(X)$ is rational from the point of view of your risk attitudes.

Now, suppose that you are in the sort of situation that Schultheis imagines. You are uncertain of the extent of the set $R_E$ of rational responses to your evidence $E$. On the account we're considering, this must be because you are uncertain of your own attitudes to epistemic risk. Let's say that the threshold of maximum inaccuracy that you're willing to tolerate is $0.98$, but you aren't certain of that --- you think it might be anything between $0.72$ and $1.28$. So you're sure that it's rational to assign anything between 0.4 and 0.6 to $X$, but unsure whether it's rational to assign $0.7$ to $X$ --- if your threshold turns out to be less than 0.98, then assigning $0.7$ to $X$ would be irrational, because it risks inaccuracy of $0.98$. In this situation, is it rational to assign $0.7$ to $X$? I think it is. Among the credence functions that you know for sure are rational, the ones that give you the lowest possible inaccuracy are the one that assigns 0.4 to $X$ and the one that assigns 0.6 to $X$. They have maximum inaccuracy of 0.72, and they open up the possibility of an inaccuracy of 0.32, which is lower than the lowest possible inaccuracy opened up by any others that you know to be rational. On the other hand, assigning 0.7 to $X$ opens up the possibility of an inaccuracy of 0.18, which is considerably lower. As a result, it doesn't seem irrational to assign 0.7 to $X$, even though you don't know whether it is rational from the point of view of your attitudes to risk, and you do know that assigning 0.6 is rational. 

There is another possible response to Schultheis' challenge for those who like this sort of motivation for permissivism. You might simply say that, if your attitudes to risk are such that you will tolerate a maximum inaccuracy of at most $t$, then regardlesss of whether you know this fact, indeed regardless of your level of uncertainty about it, the rational credence functions are precisely those that have maximum inaccuracy of at most $t$. This sort of approach is familiar from expected utility theory. Suppose I have credences in $X$ and in $\overline{X}$. And suppose I face two options whose utility is determined by whether or not $X$ is true or false. Then, regardless of what I believe about my credences in $X$ and $\overline{X}$, I should choose whichever option maximises expected utility from the point of view of my actual credences. The point is this: if what it is rational for you to believe or to do is determined by some feature of you, whether it's your credences or your attitudes to risk, being uncertain about those features doesn't change what it is rational for you to do. This introduces a certain sort of externalism to our notion of rationality. There are features of ourselves -- our credences or our attitudes to risk -- that determine what it is rational for us to believe or do, which are nonetheless not luminous to us. But I think this is inevitable. Of course, we might might move up a level and create a version of expected utility theory that appeals not to our first-order credences but to our credences concerning those first-order credences -- perhaps you use the higher-order credences to define a higher-order expected value for the first-order expected utilities, and you maximize that. But it simply pushes the problem back a step. For your higher-order credences are no more luminous than your first-order ones. And to stop the regress, you must fix some level at which the credences at that level simply determine the expectation that rationality requires you to maximize, and any uncertainty concerning those does not affect rationality. And the same goes in this case. So, given this particular motivation for permissivism, which appeals to your attitudes to epistemic risk, it seems that there is another reason why WRD is false. If $c$ is in $R_E$, then it is rational for you, regardless of your epistemic attitude to its rationality.

Monday, 4 January 2021

Using a generalized Hurwicz criterion to pick your priors

Over the summer, I got interested in the problem of the priors again. Which credence functions is it rational to adopt at the beginning of your epistemic life? Which credence functions is it rational to have before you gather any evidence? Which credence functions provide rationally permissible responses to the empty body of evidence? As is my wont, I sought to answer this in the framework of epistemic utility theory. That is, I took the rational credence functions to be those declared rational when the appropriate norm of decision theory is applied to the decision problem in which the available acts are all the possible credence functions, and where the epistemic utility of a credence function is measured by a strictly proper measure. I considered a number of possible decision rules that might govern us in this evidence-free situation: Maximin, the Principle of Indifference, and the Hurwicz criterion. And I concluded in favour of a generalized version of the Hurwicz criterion, which I axiomatised. I also described which credence functions that decision rule would render rational in the case in which there are just three possible worlds between which we divide our credences. In this post, I'd like to generalize the results from that treatment to the case in which there any finite number of possible worlds.

Here's the decision rule (where $a(w_i)$ is the utility of $a$ at world $w_i$).

Generalized Hurwicz Criterion  Given an option $a$ and a sequence of weights $0 \leq \lambda_1, \ldots, \lambda_n \leq 1$ with $\sum^n_{i=1} \lambda_i = 1$, which we denote $\Lambda$, define the generalized Hurwicz score of $a$ relative to $\Lambda$ as follows: if $$a(w_{i_1}) \geq a(w_{i_2}) \geq \ldots \geq a(w_{i_n})$$ then $$H^\Lambda(a) := \lambda_1a(w_{i_1}) + \ldots + \lambda_na(w_{i_n})$$That is, $H^\Lambda(a)$ is the weighted average of all the possible utilities that $a$ receives, where $\lambda_1$ weights the highest utility, $\lambda_2$ weights the second highest, and so on.

The Generalized Hurwicz Criterion says that you should order options by their generalized Hurwicz score relative to a sequence $\Lambda$ of weightings of your choice. Thus, given $\Lambda$,$$a \preceq^\Lambda_{ghc} a' \Leftrightarrow H^\Lambda(a) \leq H^\Lambda(a')$$And the corresponding decision rule says that you should pick your Hurwicz weights $\Lambda$ and then, having done that, it is irrational to choose $a$ if there is $a'$ such that $a \prec^\Lambda_{ghc} a'$.

Now, let $\mathfrak{U}$ be an additive strictly proper epistemic utility measure. That is, it is generated by a strictly proper scoring rule. A strictly proper scoring rule is a function $\mathfrak{s} : \{0, 1\} \times [0, 1] \rightarrow [-\infty, 0]$ such that, for any $0 \leq p \leq 1$, $p\mathfrak{s}(1, x) + (1-p)\mathfrak{s}(0, x)$ is maximized, as a function of $x$, uniquely at $x = p$. And an epistemic utility measure is generated by $\mathfrak{s}$ if, for any credence function $C$ and world $w_i$,$$\mathfrak{U}(C, w_i) = \sum^n_{j=1} \mathfrak{s}(w^j_i, c_j)$$where

  • $c_j = C(w_j)$, and
  • $w^j_i = 1$ if $j=i$ and $w^j_i = 0$ if $j \neq i$

In what follows, we write the sequence $(c_1, \ldots, c_n)$ to represent the credence function $C$.

Also, given a sequence $(\alpha_1, \ldots, \alpha_k)$ of numbers, let$$\mathrm{Av}((\alpha_1, \ldots, \alpha_k)) := \frac{\alpha_1 + \ldots  + \alpha_k}{k}$$That is, $\mathrm{av}(A)$ is the average of the numbers in $A$. And given $1 \leq k \leq n$, let $A|_k = (a_1, \ldots, a_k)$. That is, $A|_k$ is the truncation of the sequence $A$ that omits all terms after $a_k$. Then we say that $A$ does not exceed its average if, for each $1 \leq k \leq n$,$$\mathrm{av}(A) \geq \mathrm{av}(A|_k)$$That is, at no point in the sequence does the average of the numbers up to that point exceed the average of all the numbers in the sequence.

Theorem 1 Suppose $\Lambda = (\lambda_1, \ldots, \lambda_n)$ is a sequence of generalized Hurwicz weights. Then there is a sequence of subsequences $\Lambda_1, \ldots, \Lambda_m$ of $\Lambda$ such that

  1. $\Lambda = \Lambda_1 \frown \ldots \frown \Lambda_m$
  2. $\mathrm{av}(\Lambda_1) \geq \ldots \geq \mathrm{av} (\Lambda_m)$
  3. each $\Lambda_i$ does not exceed its average

Then, the credence function$$(\underbrace{\mathrm{av}(\Lambda_1), \ldots, \mathrm{av}(\Lambda_1)}_{\text{length of $\Lambda_1$}}, \underbrace{\mathrm{av}(\Lambda_2), \ldots, \mathrm{av}(\Lambda_2)}_{\text{length of $\Lambda_2$}}, \ldots, \underbrace{\mathrm{av}(\Lambda_m), \ldots, \mathrm{av}(\Lambda_m)}_{\text{length of $\Lambda_m$}})$$maximizes $H^\Lambda(\mathfrak{U}(-))$ among credence functions $C = (c_1, \ldots, c_n)$ for which $c_1 \geq \ldots \geq c_n$.

This is enough to give us all of the credence functions that maximise $H^\Lambda(\mathfrak{U}(-))$: they are the credence function mentioned together with any permutation of it --- that is, any credence function obtained from that one by switching around the credences assigned to the worlds.

Proof of Theorem 1. Suppose $\mathfrak{U}$ is a measure of epistemic value that is generated by the strictly proper scoring rule $\mathfrak{s}$. And suppose that $\Lambda$ is the following sequence of generalized Hurwicz weights $0 \leq \lambda_1, \ldots, \lambda_n \leq 1$ with $\sum^n_{i=1} \lambda_i = 1$.

First, due to a theorem that originates in Savage and is stated and proved fully by Predd, et al., if $C$ is not a probability function---that is, if $c_1 + \ldots + c_n \neq 1$---then there is a probability function $P$ such that $\mathfrak{U}(P, w_i) > \mathfrak{U}(C, w_i)$ for all worlds $w_i$. Thus, since GHC satisfies Strong Dominance, whatever maximizes $H^\Lambda(\mathfrak{U}(-))$ will be a probability function.

Now, since $\mathfrak{U}$ is generated by a strictly proper scoring rule, it is also truth-directed. That is, if $c_i > c_j$, then $\mathfrak{U}(C, w_i) > \mathfrak{U}(C, w_j)$. Thus, if $c_1 \geq c_2 \geq \ldots \geq c_n$, then$$H^\Lambda(\mathfrak{U}(C)) = \lambda_1\mathfrak{U}(C, w_1) + \ldots + \lambda_n\mathfrak{U}(C, w_n)$$This is what we seek to maximize. But notice that this is just the expectation of $\mathfrak{U}(C)$ from the point of view of the probability distribution $\Lambda = (\lambda_1, \ldots, \lambda_n)$.

Now, Savage also showed that, if $\mathfrak{s}$ is strictly proper and continuous, then there is a differentiable and strictly convex function $\varphi$ such that, if $P, Q$ are probabilistic credence functions, then
\begin{eqnarray*}
\mathfrak{D}_\mathfrak{s}(P, Q) & = & \sum^n_{i=1} \varphi(p_i) - \sum^n_{i=1} \varphi(q_i) - \sum^n_{i=1} \varphi'(q_i)(p_i - q_i) \\
& = & \sum^n_{i=1} p_i\mathfrak{U}(P, w_i) - \sum^n_{i=1} p_i\mathfrak{U}(Q, w_i)
\end{eqnarray*}
So $C$ maximizes $H^\Lambda(\mathfrak{U}(-))$ among credence functions $C$ with $c_1 \geq \ldots \geq c_n$ iff $C$ minimizes $\mathfrak{D}_\mathfrak{s}(\Lambda, -)$ among credence functions $C$ with $c_1 \geq \ldots \geq c_n$. We now use the KKT conditions to calculate which credence functions minimize $\mathfrak{D}_\mathfrak{s}(\Lambda, -)$ among credence functions $C$ with $c_1 \geq \ldots \geq c_n$.

Thus, if we write $x_n$ for $1 - x_1 - \ldots - x_{n-1}$, then
\begin{multline*}
f(x_1, \ldots, x_{n-1}) = \mathfrak{D}((\lambda_1, \ldots, \lambda_n), (x_1, \ldots, x_n)) = \\
\sum^n_{i=1} \varphi(\lambda_i) - \sum^n_{i=1} \varphi(x_i) - \sum^n_{i=1} \varphi'(x_i)(\lambda_i - x_i)
\end{multline*}
So
\begin{multline*}
\nabla f = \langle \varphi''(x_1) (x_1 - \lambda_1) - \varphi''(x_n)(x_n - \lambda_n), \\
\varphi''(x_2) (x_2 - \lambda_2) - \varphi''(x_n)(x_n - \lambda_n), \ldots \\
\varphi''(x_{n-1}) (x_{n-1} - \lambda_{n-1}) - \varphi''(x_n)(x_n - \lambda_n) )\rangle
\end{multline*}

Let $$\begin{array}{rcccl}
g_1(x_1, \ldots, x_{n-1}) & = & x_2 - x_1&  \leq & 0\\
g_2(x_1, \ldots, x_{n-1}) & = & x_3 - x_2&  \leq & 0\\
\vdots & \vdots & \vdots & \vdots & \vdots \\
g_{n-2}(x_1, \ldots, x_{n-1}) & = & x_{n-1} - x_{n-2}&  \leq & 0 \\
g_{n-1}(x_1, \ldots, x_{n-1}) & = & 1 - x_1 - \ldots - x_{n-2} - 2x_{n-1} & \leq & 0
\end{array}$$So,
\begin{eqnarray*}
\nabla g_1 & = & \langle -1, 1, 0, \ldots, 0 \rangle \\
\nabla g_2 & = & \langle 0, -1, 1, 0, \ldots, 0 \rangle \\
\vdots & \vdots & \vdots \\
\nabla g_{n-2} & = & \langle 0, \ldots, 0, -1, 1 \rangle \\
\nabla g_{n-1} & = & \langle -1, -1, -1, \ldots, -1,  -2 \rangle \\
\end{eqnarray*}
So the KKT theorem says that $x_1, \ldots, x_n$ is a minimizer iff there are $0 \leq \mu_1, \ldots, \mu_{n-1}$ such that$$\nabla f(x_1, \ldots, x_{n-1}) + \sum^{n-1}_{i=1} \mu_i \nabla g_i(x_1, \ldots, x_{n-1}) = 0$$That is, iff there are $0 \leq \mu_1, \ldots, \mu_{n-1}$ such that
\begin{eqnarray*}
\varphi''(x_1) (x_1 - \lambda_1) - \varphi''(x_n)(x_n - \lambda_n) - \mu_1 - \mu_{n-1} & = & 0 \\
\varphi''(x_2) (x_2 - \lambda_2) - \varphi''(x_n)(x_n - \lambda_n) + \mu_1 - \mu_2 - \mu_{n-1} & = & 0 \\
\vdots & \vdots & \vdots \\
\varphi''(x_{n-2}) (x_{n-2} - \lambda_{n-2}) - \varphi''(x_n)(x_n - \lambda_n) + \mu_{n-3} - \mu_{n-2} - \mu_{n-1}& = & 0 \\
\varphi''(x_{n-1}) (x_{n-1} - \lambda_{n-1}) - \varphi''(x_n)(x_n - \lambda_n)+\mu_{n-2} - 2\mu_{n-1} & = & 0
\end{eqnarray*}
By summing these identities, we get:
\begin{eqnarray*}
\mu_{n-1} &  = & \frac{1}{n} \sum^{n-1}_{i=1} \varphi''(x_i)(x_i - \lambda_i) - \frac{n-1}{n} \varphi''(x_n)(x_n - \lambda_n) \\
&= & \frac{1}{n} \sum^n_{i=1} \varphi''(x_i)(x_i - \lambda_i) - \varphi''(x_n)(x_n - \lambda_n) \\
& = & \sum^{n-1}_{i=1} \varphi''(x_i)(x_i - \lambda_i) - \frac{n-1}{n}\sum^n_{i=1} \varphi''(x_i)(x_i - \lambda_i)
\end{eqnarray*}
So, for $1 \leq k \leq n-2$,
\begin{eqnarray*}
\mu_k & = & \sum^k_{i=1} \varphi''(x_i)(x_i - \lambda_i) - k\varphi''(x_n)(x_n - \lambda_n) - \\
&& \hspace{20mm} \frac{k}{n}\sum^{n-1}_{i=1} \varphi''(x_i)(x_i - \lambda_i) + k\frac{n-1}{n} \varphi''(x_n)(x_n - \lambda_n) \\
& = & \sum^k_{i=1} \varphi''(x_i)(x_i - \lambda_i) - \frac{k}{n}\sum^{n-1}_{i=1} \varphi''(x_i)(x_i - \lambda_i) -\frac{k}{n} \varphi''(x_n)(x_n - \lambda_n) \\
&= & \sum^k_{i=1} \varphi''(x_i)(x_i - \lambda_i) - \frac{k}{n}\sum^n_{i=1} \varphi''(x_i)(x_i - \lambda_i)
\end{eqnarray*}
So, for $1 \leq k \leq n-1$,
$$\mu_k = \sum^k_{i=1} \varphi''(x_i)(x_i - \lambda_i) - \frac{k}{n}\sum^n_{i=1} \varphi''(x_i)(x_i - \lambda_i)$$
Now, suppose that there is a sequence of subsequences $\Lambda_1, \ldots, \Lambda_m$ of $\Lambda$ such that

  1. $\Lambda = \Lambda_1 \frown \ldots \frown \Lambda_m$
  2. $\mathrm{av}(\Lambda_1) \geq \ldots \geq \mathrm{av}(\Lambda_m)$
  3. each $\Lambda_i$ does not exceed its average.

And let $$P = (\underbrace{\mathrm{av}(\Lambda_1), \ldots, \mathrm{av}(\Lambda_1)}_{\text{length of $\Lambda_1$}}, \underbrace{\mathrm{av}(\Lambda_2), \ldots, \mathrm{av}(\Lambda_2)}_{\text{length of $\Lambda_2$}}, \ldots, \underbrace{\mathrm{av}(\Lambda_m), \ldots, \mathrm{av}(\Lambda_m)}_{\text{length of $\Lambda_m$}})$$Then we write $i \in \Lambda_j$ if $\lambda_i$ is in the subsequence $\Lambda_j$. So, for $i \in \Lambda_j$, $p_i = \mathrm{av}(\Lambda_j)$. Then$$\frac{k}{n}\sum^n_{i=1} \varphi''(p_i)(p_i - \lambda_i) = \frac{k}{n} \sum^m_{j = 1} \sum_{i \in \Lambda_j} \varphi''(\mathrm{av}(\Lambda_j))(\mathrm{av}(\Lambda_j) - \lambda_i) = 0 $$
Now, suppose $k$ is in $\Lambda_j$. Then
\begin{multline*}
\mu_k = \sum^k_{i=1} \varphi''(p_i)(p_i - \lambda_i) = \\
\sum_{i \in \Lambda_1} \varphi''(p_i)(p_i - \lambda_i) + \sum_{i \in \Lambda_2} \varphi''(p_i)(p_i - \lambda_i) + \ldots + \\
\sum_{i \in \Lambda_{j-1}} \varphi''(p_i)(p_i - \lambda_i) + \sum_{i \in \Lambda_j|_k} \varphi''(p_i)(p_i - \lambda_i) = \\
\sum_{i \in \Lambda_j|_k} \varphi''(p_i)(p_i - \lambda_i) = \sum_{i \in \Lambda_j|_k} \varphi''(\mathrm{av}(\Lambda_j)(\mathrm{av}(\Lambda_j) - \lambda_i)
\end{multline*}
So, if $|\Lambda|$ is the length of the sequence $\Lambda$,$$\mu_k \geq 0 \Leftrightarrow |\Lambda_j|_k|\mathrm{av}(\Lambda_j) - \sum_{i \in \Lambda_j|_k} \lambda_i \geq 0 \Leftrightarrow \mathrm{av}(\Lambda_j) \geq \mathrm{av}(\Lambda_j|_k)$$But, by assumption, this is true for all $1 \leq k \leq n-1$. So $P$ minimizes $H^\Lambda(\mathfrak{U}(-))$, as required.

We now show that there is always a series of subsequences that satisfy (1), (2), (3) from above.  We proceed by induction. 

Base Case  $n = 1$. Then it is clearly true with the subsequence $\Lambda_1 = \Lambda$.

Inductive Step  Suppose it is true for all sequences $\Lambda = (\lambda_1, \ldots, \lambda_n)$ of length $n$. Now consider a sequence $(\lambda_1, \ldots, \lambda_n, \lambda_{n+1})$. Then, by the inductive hypothesis, there is a sequence of sequences $\Lambda_1, \ldots, \Lambda_m$ such that

  1. $\Lambda \frown (\lambda_{n+1}) = \Lambda_1 \frown \ldots \frown \Lambda_m \frown (\lambda_{n+1})$
  2. $\mathrm{av}(\Lambda_1) \geq \ldots \geq \mathrm{av} (\Lambda_m)$
  3. each $\Lambda_i$ does not exceed its average.

Now, first, suppose $\mathrm{av}(\Lambda_m) \geq \lambda_{n+1}$. Then let $\Lambda_{m+1} = (\lambda_{n+1})$ and we're done.

So, second, suppose $\mathrm{av}(\Lambda_m) < \lambda_{n+1}$. Then we find the greatest $k$ such that$$\mathrm{av}(\Lambda_k) \geq \mathrm{av}(\Lambda_{k+1}\frown \ldots \frown \Lambda_m \frown (\lambda_{n+1}))$$Then we let $\Lambda^*_{k+1} = \Lambda_{k+1}\frown \ldots \frown \Lambda_m \frown (\lambda_{n+1})$. Then we can show that

  1. $(\lambda_1, \ldots, \lambda_n, \lambda_{n+1}) = \Lambda_1 \frown \Lambda_2 \frown \ldots \frown \Lambda_k \frown \Lambda^*_{k+1}$.
  2. Each $\Lambda_1, \ldots, \Lambda_k, \Lambda^*_{k+1}$ does not exceed average.
  3. $\mathrm{av}(\Lambda_1) \geq \mathrm{av}(\Lambda_2) \geq \ldots \geq \mathrm{av}(\Lambda_k) \geq \mathrm{av}(\Lambda^*_{k+1})$.

(1) and (3) are obvious. So we prove (2). In particular, we show that $\Lambda^*_{k+1}$ does not exceed average. We assume that each subsequence $\Lambda_j$ starts with $\Lambda_{i_j+1}$

  • Suppose $i \in \Lambda_{k+1}$. Then, since $\Lambda_{k+1}$ does not exceed average, $$\mathrm{av}(\Lambda_{k+1}) \geq \mathrm{av}(\Lambda_{k+1}|_i)$$But, since $k$ is the greatest number such that$$\mathrm{av}(\Lambda_k) \geq \mathrm{av}(\Lambda_{k+1}\frown \ldots \frown \Lambda_m \frown (\lambda_{n+1}))$$We know that$$\mathrm{av}(\Lambda_{k+2}\frown \ldots \frown \Lambda_m \frown (\lambda_{n+1})) > \mathrm{av}(\Lambda_{k+1})$$So$$\mathrm{av}(\Lambda_{k+1}\frown \ldots \frown \Lambda_m \frown (\lambda_{n+1})) > \mathrm{av}(\Lambda_{k+1})$$So$$\mathrm{av}(\Lambda_{k+1}\frown \ldots \frown \Lambda_m \frown (\lambda_{n+1})) > \mathrm{av}(\Lambda_{k+1}|_i)$$
  • Suppose $i \in \Lambda_{k+2}$. Then, since $\Lambda_{k+2}$ does not exceed average, $$\mathrm{av}(\Lambda_{k+2}) \geq \mathrm{av}(\Lambda_{k+2}|_i)$$But, since $k$ is the greatest number such that$$\mathrm{av}(\Lambda_k) \geq \mathrm{av}(\Lambda_{k+1}\frown \ldots \frown \Lambda_m \frown (\lambda_{n+1}))$$We know that$$\mathrm{av}(\Lambda_{k+3}\frown \ldots \frown \Lambda_m \frown (\lambda_{n+1})) > \mathrm{av}(\Lambda_{k+2})$$So$$\mathrm{av}(\Lambda_{k+1}\frown \ldots \frown \Lambda_m \frown (\lambda_{n+1})) > \mathrm{av}(\Lambda_{k+2}|_i)$$But also, from above,$$ \mathrm{av}(\Lambda_{k+1}\frown \ldots \frown \Lambda_m \frown (\lambda_{n+1})) > \mathrm{av}(\Lambda_{k+1})$$So$$\mathrm{av}(\Lambda_{k+1}\frown \ldots \frown \Lambda_m \frown (\lambda_{n+1})) > \mathrm{av}(\Lambda_{k+1} \frown \Lambda_{k+2}|_i)$$
  • And so on.

This completes the proof. $\Box$