Using a generalized Hurwicz criterion to pick your priors

Over the summer, I got interested in the problem of the priors again. Which credence functions is it rational to adopt at the beginning of your epistemic life? Which credence functions is it rational to have before you gather any evidence? Which credence functions provide rationally permissible responses to the empty body of evidence? As is my wont, I sought to answer this in the framework of epistemic utility theory. That is, I took the rational credence functions to be those declared rational when the appropriate norm of decision theory is applied to the decision problem in which the available acts are all the possible credence functions, and where the epistemic utility of a credence function is measured by a strictly proper measure. I considered a number of possible decision rules that might govern us in this evidence-free situation: Maximin, the Principle of Indifference, and the Hurwicz criterion. And I concluded in favour of a generalized version of the Hurwicz criterion, which I axiomatised. I also described which credence functions that decision rule would render rational in the case in which there are just three possible worlds between which we divide our credences. In this post, I'd like to generalize the results from that treatment to the case in which there any finite number of possible worlds.

Here's the decision rule (where a(wi) is the utility of a at world wi).

Generalized Hurwicz Criterion  Given an option a and a sequence of weights 0λ1,,λn1 with i=1nλi=1, which we denote Λ, define the generalized Hurwicz score of a relative to Λ as follows: if a(wi1)a(wi2)a(win) then HΛ(a):=λ1a(wi1)++λna(win)That is, HΛ(a) is the weighted average of all the possible utilities that a receives, where λ1 weights the highest utility, λ2 weights the second highest, and so on.

The Generalized Hurwicz Criterion says that you should order options by their generalized Hurwicz score relative to a sequence Λ of weightings of your choice. Thus, given Λ,aghcΛaHΛ(a)HΛ(a)And the corresponding decision rule says that you should pick your Hurwicz weights Λ and then, having done that, it is irrational to choose a if there is a such that aghcΛa.

Now, let U be an additive strictly proper epistemic utility measure. That is, it is generated by a strictly proper scoring rule. A strictly proper scoring rule is a function s:{0,1}×[0,1][,0] such that, for any 0p1, ps(1,x)+(1p)s(0,x) is maximized, as a function of x, uniquely at x=p. And an epistemic utility measure is generated by s if, for any credence function C and world wi,U(C,wi)=j=1ns(wij,cj)where

  • cj=C(wj), and
  • wij=1 if j=i and wij=0 if ji

In what follows, we write the sequence (c1,,cn) to represent the credence function C.

Also, given a sequence (α1,,αk) of numbers, letAv((α1,,αk)):=α1++αkkThat is, av(A) is the average of the numbers in A. And given 1kn, let A|k=(a1,,ak). That is, A|k is the truncation of the sequence A that omits all terms after ak. Then we say that A does not exceed its average if, for each 1kn,av(A)av(A|k)That is, at no point in the sequence does the average of the numbers up to that point exceed the average of all the numbers in the sequence.

Theorem 1 Suppose Λ=(λ1,,λn) is a sequence of generalized Hurwicz weights. Then there is a sequence of subsequences Λ1,,Λm of Λ such that

  1. Λ=Λ1Λm
  2. av(Λ1)av(Λm)
  3. each Λi does not exceed its average

Then, the credence function(av(Λ1),,av(Λ1)length of Λ1,av(Λ2),,av(Λ2)length of Λ2,,av(Λm),,av(Λm)length of Λm)maximizes HΛ(U()) among credence functions C=(c1,,cn) for which c1cn.

This is enough to give us all of the credence functions that maximise HΛ(U()): they are the credence function mentioned together with any permutation of it --- that is, any credence function obtained from that one by switching around the credences assigned to the worlds.

Proof of Theorem 1. Suppose U is a measure of epistemic value that is generated by the strictly proper scoring rule s. And suppose that Λ is the following sequence of generalized Hurwicz weights 0λ1,,λn1 with i=1nλi=1.

First, due to a theorem that originates in Savage and is stated and proved fully by Predd, et al., if C is not a probability function---that is, if c1++cn1---then there is a probability function P such that U(P,wi)>U(C,wi) for all worlds wi. Thus, since GHC satisfies Strong Dominance, whatever maximizes HΛ(U()) will be a probability function.

Now, since U is generated by a strictly proper scoring rule, it is also truth-directed. That is, if ci>cj, then U(C,wi)>U(C,wj). Thus, if c1c2cn, thenHΛ(U(C))=λ1U(C,w1)++λnU(C,wn)This is what we seek to maximize. But notice that this is just the expectation of U(C) from the point of view of the probability distribution Λ=(λ1,,λn).

Now, Savage also showed that, if s is strictly proper and continuous, then there is a differentiable and strictly convex function φ such that, if P,Q are probabilistic credence functions, then
Ds(P,Q)=i=1nφ(pi)i=1nφ(qi)i=1nφ(qi)(piqi)=i=1npiU(P,wi)i=1npiU(Q,wi)
So C maximizes HΛ(U()) among credence functions C with c1cn iff C minimizes Ds(Λ,) among credence functions C with c1cn. We now use the KKT conditions to calculate which credence functions minimize Ds(Λ,) among credence functions C with c1cn.

Thus, if we write xn for 1x1xn1, then
f(x1,,xn1)=D((λ1,,λn),(x1,,xn))=i=1nφ(λi)i=1nφ(xi)i=1nφ(xi)(λixi)
So
f=φ(x1)(x1λ1)φ(xn)(xnλn),φ(x2)(x2λ2)φ(xn)(xnλn),φ(xn1)(xn1λn1)φ(xn)(xnλn))

Let g1(x1,,xn1)=x2x10g2(x1,,xn1)=x3x20gn2(x1,,xn1)=xn1xn20gn1(x1,,xn1)=1x1xn22xn10So,
g1=1,1,0,,0g2=0,1,1,0,,0gn2=0,,0,1,1gn1=1,1,1,,1,2
So the KKT theorem says that x1,,xn is a minimizer iff there are 0μ1,,μn1 such thatf(x1,,xn1)+i=1n1μigi(x1,,xn1)=0That is, iff there are 0μ1,,μn1 such that
φ(x1)(x1λ1)φ(xn)(xnλn)μ1μn1=0φ(x2)(x2λ2)φ(xn)(xnλn)+μ1μ2μn1=0φ(xn2)(xn2λn2)φ(xn)(xnλn)+μn3μn2μn1=0φ(xn1)(xn1λn1)φ(xn)(xnλn)+μn22μn1=0
By summing these identities, we get:
μn1=1ni=1n1φ(xi)(xiλi)n1nφ(xn)(xnλn)=1ni=1nφ(xi)(xiλi)φ(xn)(xnλn)=i=1n1φ(xi)(xiλi)n1ni=1nφ(xi)(xiλi)
So, for 1kn2,
μk=i=1kφ(xi)(xiλi)kφ(xn)(xnλn)kni=1n1φ(xi)(xiλi)+kn1nφ(xn)(xnλn)=i=1kφ(xi)(xiλi)kni=1n1φ(xi)(xiλi)knφ(xn)(xnλn)=i=1kφ(xi)(xiλi)kni=1nφ(xi)(xiλi)
So, for 1kn1,
μk=i=1kφ(xi)(xiλi)kni=1nφ(xi)(xiλi)
Now, suppose that there is a sequence of subsequences Λ1,,Λm of Λ such that

  1. Λ=Λ1Λm
  2. av(Λ1)av(Λm)
  3. each Λi does not exceed its average.

And let P=(av(Λ1),,av(Λ1)length of Λ1,av(Λ2),,av(Λ2)length of Λ2,,av(Λm),,av(Λm)length of Λm)Then we write iΛj if λi is in the subsequence Λj. So, for iΛj, pi=av(Λj). Thenkni=1nφ(pi)(piλi)=knj=1miΛjφ(av(Λj))(av(Λj)λi)=0
Now, suppose k is in Λj. Then
μk=i=1kφ(pi)(piλi)=iΛ1φ(pi)(piλi)+iΛ2φ(pi)(piλi)++iΛj1φ(pi)(piλi)+iΛj|kφ(pi)(piλi)=iΛj|kφ(pi)(piλi)=iΛj|kφ(av(Λj)(av(Λj)λi)
So, if |Λ| is the length of the sequence Λ,μk0|Λj|k|av(Λj)iΛj|kλi0av(Λj)av(Λj|k)But, by assumption, this is true for all 1kn1. So P minimizes HΛ(U()), as required.

We now show that there is always a series of subsequences that satisfy (1), (2), (3) from above.  We proceed by induction. 

Base Case  n=1. Then it is clearly true with the subsequence Λ1=Λ.

Inductive Step  Suppose it is true for all sequences Λ=(λ1,,λn) of length n. Now consider a sequence (λ1,,λn,λn+1). Then, by the inductive hypothesis, there is a sequence of sequences Λ1,,Λm such that

  1. Λ(λn+1)=Λ1Λm(λn+1)
  2. av(Λ1)av(Λm)
  3. each Λi does not exceed its average.

Now, first, suppose av(Λm)λn+1. Then let Λm+1=(λn+1) and we're done.

So, second, suppose av(Λm)<λn+1. Then we find the greatest k such thatav(Λk)av(Λk+1Λm(λn+1))Then we let Λk+1=Λk+1Λm(λn+1). Then we can show that

  1. (λ1,,λn,λn+1)=Λ1Λ2ΛkΛk+1.
  2. Each Λ1,,Λk,Λk+1 does not exceed average.
  3. av(Λ1)av(Λ2)av(Λk)av(Λk+1).

(1) and (3) are obvious. So we prove (2). In particular, we show that Λk+1 does not exceed average. We assume that each subsequence Λj starts with Λij+1

  • Suppose iΛk+1. Then, since Λk+1 does not exceed average, av(Λk+1)av(Λk+1|i)But, since k is the greatest number such thatav(Λk)av(Λk+1Λm(λn+1))We know thatav(Λk+2Λm(λn+1))>av(Λk+1)Soav(Λk+1Λm(λn+1))>av(Λk+1)Soav(Λk+1Λm(λn+1))>av(Λk+1|i)
  • Suppose iΛk+2. Then, since Λk+2 does not exceed average, av(Λk+2)av(Λk+2|i)But, since k is the greatest number such thatav(Λk)av(Λk+1Λm(λn+1))We know thatav(Λk+3Λm(λn+1))>av(Λk+2)Soav(Λk+1Λm(λn+1))>av(Λk+2|i)But also, from above,av(Λk+1Λm(λn+1))>av(Λk+1)Soav(Λk+1Λm(λn+1))>av(Λk+1Λk+2|i)
  • And so on.

This completes the proof.



Comments

  1. And with these tips in hand you can with a bit of trial and error approach find your own optimum balance coupled with some practice and experimentation. But all in all, some over-arching principles will prove to be very helpful in your daily juicing routine when making juices with PortOBlend Review. So let’s hop into it.

    ReplyDelete
  2. Problem-solving is the capacity to discover an answer for a troublesome issue or concern. It includes identifying an issue, understanding the reason for the issue, finding an answer, and finding a way to correct the issue FOR MORE DETAILS

    ReplyDelete

Post a Comment