Comments on M-Phi: How should we measure accuracy in epistemology? A new result

Good point, Leszek! I should have been clearer in...

2014-04-08T17:21:11.069+01:00

Good point, Leszek! I should have been clearer in my gloss of DeGroot and Fienberg's theorem above. The point that underlies so much of this area is that, if $s$ is a strictly proper scoring rule, we can define a divergence as follows:$$d_s(x, y) = Exp_s(y | x) - Exp_s(x | x)$$DeG&F use that to define the measure of calibration corresponding to a given scoring rule as well as the measure of refinement. And then they prove that, if $s$ is strictly proper then the total score of a credence function $c$ at a world $w$ is the sum of the calibration of $c$ at $w$ and the refinement of $c$ at $w$.

Very interesting question! Were you thinking that...

2014-04-08T17:11:56.617+01:00

Very interesting question! Were you thinking that there might be situations in which $A_D$ and $C_D + R_D$ ought to come apart? Or were you thinking that there might be situations in which the underlying divergence $D$ is determined by features of the situation? I was certainly hoping that there wouldn't be any situations of the former sort -- the idea is that these are two different ways to get at the single notion of accuracy that's appropriate in credal epistemology. As I understand it, the latter sort of situation is considered quite a lot in the statistics literature. Different strictly proper scoring rules are suited to different situations. A really good place to start exploring that question is Imre Csiszar's 1991 paper 'Why Least Squares and Maximum Entropy?' in Annals of Statistics. Least Squares and Maximum Entropy are two different statistic inference methods that are based on maximizing or minimizing a Bregman divergence. Csiszar begins with an axiomatization of all Bregman divergences; he then extends the axiomatization in two ways -- the first gives a characterization of the Bregman divergence that gives rise to Least Squares; the second gives a characterization of the Bregman divergence that gives rise to Maximum Entropy.

Sorry -- a second reply! Thinking about this a bi...

2014-04-08T16:52:30.184+01:00

Sorry -- a second reply! Thinking about this a bit more, I see what happens. It turns out that there can be no divergence $D$ such that$$A_D(c, w) = XC_D(c, w) + YR_D(c, w)$$unless $X = Y = 1$. The reason is this: Suppose $D(x, y) = \sum_i d(x_i, y_i)$ and $A_D(c, w) = XC_D(c, w) + YR_D(c, w)$. Then it follows that $d$ has a particular form (the details of this are in the PDF I linked to that includes the proof; they're a bit buried, though): if we let $s$ be the scoring rule corresponding to $d$ (that is, $s(i, x) := d(i, x)$), then$$d(x, y) = \frac{1}{X} Exp_s(y | x) - \frac{Y}{X} Exp_s(x | x)$$(where $Exp_s(y | x) xs(1, y) + (1-x) s(0, y)$). But then we have$$d(1, x) = \frac{1}{X}s(1, x) - \frac{Y}{X}s(1, 1) = \frac{1}{X}d(1,x)$$so $X =1$. And we also have$$0 = d(x, x) = Exp_s(x | x) - YExp_s(x | x)$$. So $Y = 1$.

Thanks for a fascinating post! After reading it I ...

2014-04-08T13:18:59.238+01:00

Thanks for a fascinating post! After reading it I started wondering about the 'opposite' direction: for which scoring rules it is guaranteed that Alethic accuracy = Calibration + Refinement? Richard already addressed this in the 2nd comment. Now, alright, the Brier Score can be decomposed into calibration and refinement (this is I guess Murphy (1973)). Richard says "DeGroot and Fienberg (1983) [...] show that [...] strictly proper scoring rule can be decomposed into the sort of calibration and refinement measure that I give; and the decomposition involves the weightings (i.e. your X and Y) both being 1." Of course, I have to read DeGroot and Fienberg's paper carefully to see how the proof goes, but please allow me at this point to express my concerns as to whether this is the whole story. (And please correct me if what I write is completely misguided).

The "Alethic accuracy = Calibration + Refinement" seems to boil down to this: for your chosen distance between n-tuples of real numbers from [0,1], which we label D:

-D(v_w,c)=-D(c^w,c)-D(v_w,c^w)

Unfortunately I don't see yet how this recipe would be applicable for some proper scoring rules. Squared Euclidean distance can be employed for two n-tuples one of each consists entirely of 0s and 1s (is a representation of a possible world): it that case in can serve to form a scoring rule. But it also can serve to form a distance measure between any two arbitrary n-tuples of real numbers from [0,1]. Therefore all 3 expressions in the above inequality make sense if D is the squared Euclidean distance and we're thinking of the Brier Score as our scoring rule.

But there are other proper scoring rules for which I don't see how we could do this. Consider the logarithmic scoring rule; take a proposition A for which your credence is C(A); if in the given world A is true, take ln(C(A)); if it is false, take ln(1-C(A)). If we want to conceive of this as a distance measure between n-tuples of real numbers from [0,1], one of this n-tuples has to consist entirely of 0s and 1s. But the 'middle' expression of the above equality contains D(c^w,c), and it is by no means guaranteed that c^w will turn out to be like that. And so it would seem we cannot answer the question whether the equality holds if our scoring rule is the logarithmic one.

Since this seems to contradict what Richard wrote in the 2nd comment, and in particular the DG&F result, I must be not getting something. But the worry I express above seems to be pretty basic, so I'd hope people more familiar with the topic will point me in the right direction. Thanks in advance, and thanks again for a stimulating post!

It's very interesting, but are there cases in ...

2014-04-08T01:40:26.137+01:00

It's very interesting, but are there cases in which the methodologies based on these accounts of accuracy would give different results? Could you, for example, outline a hypothetical case in which the same evidence would lead to differing measures of accuracy depending on which account of accuracy is adopted?

Thanks very much for this excellent question! I&#...

2014-04-07T09:30:04.975+01:00

Thanks very much for this excellent question! I'm pretty sure that generalised alethic-calibration agreement wouldn't always give an additive Bregman divergence. The reason is the DeGroot and Fienberg 1983 result of which this is a sort of converse. They show that any strictly proper scoring rule can be decomposed into the sort of calibration and refinement measure that I give; and the decomposition involves the weightings (i.e. your X and Y) both being 1. So I have to argue that this weighting of the two is the only reasonable thing to do if I'm going to use my result to argue for strictly proper scoring rules. It's really useful to get that clear. Thanks!

This is really cool stuff! I wonder, have you inve...

2014-04-05T19:40:20.435+01:00

This is really cool stuff! I wonder, have you investigated what happens when you take accuracy not as Cd(c,w)+Rd(c,w) but X*Cd(c,w)+Y*Rd(c,w) for constants X, Y? It seems to me like in certain situations one might value one of those measures more than the other. Would all of the generalized alethic-calibration agreement give you an additive Bregman divergence and would the generated scoring rule still be strictly proper?