Diagonalization and the Diagonal Lemma

A couple of recent M-Phi posts by Catarina have discussed the topic of diagonalization and the Diagonal Lemma in Gödel's incompleteness results.
Because I can't be bothered to get out of bed, this post consists in some Sunday afternoon rambling about how diagonalization is usually understood and how the Diagonal Lemma is proved (i.e., how to define a fixed-point, or self-referential, sentence).
Gödel's original 1931 paper on incompleteness, "On Formally Undecidable Propositions in Principia Mathematica and Related Systems, I", is brimming with new ideas, of which four important ones are:
(i) the notion of what is now called a primitive recursive function;
(ii) arithmetization of syntax (aka, gödel coding);
(iii) the diagonalization operation on a string;
(iv) given any L-formula ϕ(x), the construction of a "fixed-point" or self-referential sentence θ equivalent to ϕ(θ).
There are short explanations of each of these below.

1. The (Usual) First-Order Language of Arithmetic

The language of arithmetic L is a first-order language with identity, and with signature {0,S,+,×} and is intended to talk about about the structure N=(N,0,S,+,×). The set of L-terms, Tm(L), is defined inductively, by starting with 0 and all variables, xi, and then closing under applying the function symbols S,+,×. Then, an atomic L-formula is defined to be of the form t=u, where t,uTm(L). And the set of L-formulas, Form(L), is defined inductively, by starting with atomic formulas t=u, and then closing under applying the unary connective ¬, the binary connective and quantifiers xi. (Other connectives, and , can be defined.)

It is useful sometimes to abuse notation and conflate the symbol 0 and its denotation, the number 0N; or the symbol S and its denotation, the function S:NN, etc. Context is usually sufficient to disambiguate. If one is considering a non-standard L-interpretation A (i.e., A is not isomorphic to N), then SA will be the denotation of S in A; +A the denotation of +, and so on.

Although taking 0,S,+ and × as undefined primitives is fairly standard, some presentations include the binary relation symbol x<y and a binary function symbol exp(x,y) (denoting the exponential function, xy) as additional primitives. But in fact both can be defined in N: for x<y, this is obvious, but for exp(x,y), it is not so obvious.


A bounded quantifier has the form (xt) or (xt), where t is a term, and xt is the formula z(x+z=t). A bounded formula ϕ is a formula in which all quantifiers are bounded. A Σ1-sentence has the form x1xkϕ, where ϕ is bounded.

2. Canonical Numerals for Numbers

Because of the presence of 0 and S in L, there is a canonical way of denoting numbers, using canonical numerals. Roughly, the number n is denoted by the term S...S0, with S repeated n times. More exactly, one defines a function
:NTm(L),
by primitive recursion as follows:
0:=0
n+1:=Sn
So, for example, the number 7 is denoted by 7, which is the term SSSSSSS0. By induction, it is easy to prove that (n)N=n.

3. Robinson Arithmetic: Q

A very important weak system of arithmetic is Robinson arithmetic, Q. Its axioms are:
(Q1)x(Sx0)
(Q2)xy(Sx=Syx=y)
(Q3)x(x0y(x=Sy))
(Q4)x(x+0=x)
(Q5)xy(x+Sy=S(x+y))
(Q6)x(x×0=0)
(Q7)xy(x×Sy=x×y+x)
It is noteworthy that Q does not have as an axiom scheme the induction scheme,
(ϕ(0)x(ϕ(x)ϕ(Sx)))xϕ(x)
Extending Q with the induction scheme gives Peano arithmetic.
The system Q is sound and complete for certain classes of sentences. That is,
Qϕ if and only if ϕ is true.
This holds when ϕ is a quantifier-free sentence, a bounded sentence or a Σ1-sentence.

In general, Q can correctly perform finite computations using successor, addition and multiplication. However, Q is not very good at algebra. For example, Q does not prove xy(x+y=y+x). Importantly, although Q has axioms for addition and multiplication, it does not have axioms for exponentiation.

3. Gödel Coding

Under gödel coding, each primitive symbol α of the alphabet of L is given a code number, c(α). Also, each finite sequence (n1,,nk) of numbers can be coded as a number, sometimes written as n1,,nk. For example, one might use the coding
n1,,nk:=(p1)n1+1××(pk)nk+1,
where p1,p2, are the primes in ascending order, beginning with 2. (And we assign 0 as the code of the empty string ().) For example, 10,0,1=211×3×52. Such numbers are then called sequence numbers.

Because L-strings are sequences of primitive symbols, the code of an L-string (α1,,αk) is defined as:
#(α1,,αk):=c(α1),,c(αk).
(One needs to be a bit careful as the code of a primitive symbol, say , is not the same as the code of the corresponding one-element string (). For example, if c()=10, say, then #()=211.)

Because each L-string ϕ now has a code, say n=#ϕ, we can think of the numeral n as a kind of name of ϕ. So, if n=#ϕ, then the numeral n is written: ϕ, and is sometimes called the gödel quotation name of ϕ.

So, using coding, the interpreted language of arithmetic is able to talk (indirectly) about the syntax of L. For each syntactical entity ϕL has a kind of name ϕ in L itself. Furthermore, special sets of (and relations amongst) syntactical entities become sets of numbers under coding. If A is a set of syntactical entities, then #(A) is {#(α):αA}, the corresponding set of numbers. For example, the variables of L are coded as a set of numbers; the set of terms is coded as a set of numbers. Etc. Then one can ask whether these sets (or relations) are "expressible" in the language of arithmetic.

In general, these syntactical sets and relations come out, under coding, as either primitive recursive, or recursive, or recursively enumerable.

4. Primitive Recursive Functions

The primitive recursive functions are nowadays defined a bit differently from Gödel's original approach. (See this SEP article for more details.) First,
The basic functions:
(i) the zero function Z:NN, by Z(n)=0.
(ii) the successor function S:NN, by S(n)=n+1.
(iii) the projection functions Pjk:NkN (with jk), by Pjk(n1,,nk)=nj.
One may then build up more functions by either composition, or by primitive recursion. That is, given functions g:NkN and h:Nk+2N, we say that f:Nk+1N is obtained by primitive recursion from g and h just if, for all x,y1,,ykN,
f(y1,,yk,0)=g(y1,,yk)
f(y1,,yk,x+1)=h(x,y1,,yk,f(y1,,yk,x))
The primitive recursive functions are then defined to be those functions obtainable from the basic functions by composition and primitive recursion.

There is a further operation on functions---minimization. Closing the primitive recursive functions under that operation then yields the recursive functions.

The notion of recursiveness can be extended to subsets of, and relations on, N by considering the characteristic function of the set or relation. Suppose AN. The characteristic function χA is defined by:
χA(n)=1 if nA, and 0 otherwise.
If χA is (primitive) recursive, we say that A is (primitive) recursive.
For a relation RNk, the definitions are analogous.

5. Representability in Q

There are two main notions of "expressing" a numerical function or relation. The first is semantical: given some set AN, simply whether there is an L-formula ϕ(x) such that
A={n:Nϕ(n)}.
If so, we say that ϕ(x) defines A.

The other notion is syntactical, and always involves a theory T in L. Given a function f:NkN, we say f is representable in T just when there is an L-formula ϕ(x1,,xk,y) such that, for all n1,,nk,
Ty(y=f(n1,,nk)ϕ(n1,,nk,y))
Given a theory T, a similar notion can be applied to subsets, and relations on, N. Suppose AN. We say that A is strongly representable in T just when there is some formula ϕ(x), such that for all nN,
(i) if nA then Tϕ(n)
(ii) if nA, then T¬ϕ(n)
We say that A is weakly representable in T just when condition (i) holds, but with "if ... then" replaced by "... iff ..." (thanks to Kai Wehmeier for the correction).
An analogous definition can be given for a relation RNk to be strongly or weakly representable.

The most important representation property for Q is:
Representability in Q
All recursive functions are representable in Q.
To prove this, one shows first that each of the basic functions is representable in Q; and then one shows that the representable functions are closed under composition, primitive recursion and minimization. Details of how to do this are given in standard textbooks, such as H. Enderton, A Mathematical Introduction to Logic. As Aldo Antonelli notes in the comments below, in order to show that functions obtained by primitive recursion are representable, one needs to show how sequence coding and decoding can be expressed in L, which only has S,+ and × as primitives. The crucial result required is Gödel's Beta Function Lemma, using the Chinese Remainder Theorem. These results need only to be proved in the metatheory, not in Q itself (which lacks induction).
The converse of the representability theorem is true as well: if f is representable in Q, then f is recursive.

A corollary of the representation theorem for Q is that all recursive sets and relations are strongly representable in Q. So, if RNk is recursive, there is a formula ϕ(x1,,xk) such that, for all n1,,nkN,
(i) if Rn1nk, then Qϕ(n1,,nk).
(ii) if ¬Rn1nk, then Q¬ϕ(n1,,nk).
As it turns out, for a standard axiomatic theory T, the set of (codes of) axioms is recursive (for example, a computer program can be written to decide of a given input string whether it is an axiom or not), indeed, primitive recursive. Furthermore, the proof-in-T relation, "d is a proof of ϕ (in T)", is primitive recursive. So, there are L-formulas which strongly represent (in Q) such relations. In particular, there is a formula PrfT(y,x) which strongly represents the proof-in-T relation: y is the code of the proof-in-T of formula with code x. Then the formula yPrfT(y,x) expresses "x is a theorem of T" and is written ProvT(x).

6. Diagonalization

Consider the binary syntactic operation on strings,
the result of substituting the quotation of string A for all free occurrences of a variable in string B.
Although this is binary, obviously, one can take this operation on the same argument, i.e.,
the result of substituting the quotation of string A for all free occurrences of a variable in string A.
Informally, this is (syntactic) diagonalization.
For example, the diagonalization of
x contains 23 symbols
is
"x contains 23 symbols" contains 23 symbols
If ϕ is a formula of the language of arithmetic, with at most the variable x free, then its diagonalization is defined to be:
ϕ(x/ϕ)
I.e., the result of substituting the gödel quotation ϕ for all free occurrence of x in ϕ.
An alternative, but equivalent (in predicate logic) is:
x(x=ϕϕ)
Clearly, this syntactic operation of diagonalization corresponds to a numerical function, d:NN, given by, for any L-string ϕ,
d(#ϕ)=#(ϕ(x/ϕ)) if ϕ is an L-formula with at most x free;
d(n)=0 if n is not the code of such an L-formula
One can show (the details are tedious) that d is primitive recursive.
Because all recursive functions are representable in Q, it follows that there is an L-formula Diag(x,y) such that, for all nN,
Qy(y=d(n)Diag(n,y))

7. The Diagonal (or Fixed Point) Lemma

Let ϕ(x) be an L-formula with at most x free. (If ϕ happens to have only one variables free, say y, but not x, then one can do a relabelling of variables to ensure that ϕ has only x free.) We want to find a formula θ that is in some sense equivalent to ϕ(θ).
First consider informally, the expression,
the diagonalization of x is ϕ
If one diagonalizes this, one gets,
the diagonalization of "the diagonalization of x is ϕ" is ϕ
Call this statement θ. Now θ says that a certain thing is ϕ. This thing is the diagonalization of "the diagonalization of x is ϕ"---but this is precisely the statement θ itself. Such a statement θ is called a fixed-point sentence for the original formula ϕ(x), and it says of itself that it is ϕ.

So, to get this into the formalized language L, we begin with some formula ϕ and we consider,
y(Diag(x,y)ϕ(y))
Call this formula ψ(x). Let θ be the diagonalization of ψ. That is, θ is ψ(ψ). So, trivially, we have,
θψ(ψ)
In particular, #(θ)=d(#(ψ)). So, the term θ is the numeral d(#(ψ)). And thus, trivially, we have
θ=d(#(ψ))
Furthermore, since Q represents d, we have,
Qy(y=θDiag(ψ,y))
Let us now reason about Q:
Qθψ(ψ)
Qθy(Diag(ψ,y)ϕ(y))
Qθy(y=θϕ(y))
Qθϕ(θ)
This establishes the
Diagonal Lemma
Given an L-formula ϕ(x), there is an L-sentence θ such that
Qθϕ(θ)

8. Diagonal Lemma using a Function Term

For the Diagonal Lemma, one needs that the external syntactic operation of diagonalization on a string be represented in the theory. Under coding, the diagonalization operation corresponds to a primitive recursive function, d:NN, the diagonal function.

Q has the property that all recursive functions are representable --- which means that for any recursive function f:NkN, there is a formula ϕf(x1,,xk,y) which represents it. Note that the choice of such a formula is not unique. But Q is weak, and it is not always true that Q proves that this formula is functional. In particular, Q may not prove x1xkyϕf(x1,,xk,y). For example, Q cannot prove the totality of exponentation.

However, if one extends Q with the induction scheme, then this situation changes. PA is strong enough to prove the totality of any primitive recursive function. In particular, one can choose the formula Diag(x,y) representing the diagonal function d such that,
PAxyDiag(x,y)
When this is so, one can introduce, by explicit definition, a new function symbol diag(x) as follows:
y=diag(x):=Diag(x,y)
Let PA+ be the result of adding this definition to PA. By the representablity of d, we have,
(*) PA+diag(ϕ)=ϕ(ϕ)
Returning to the Diagonal Lemma, suppose that one wishes to find the fixed-point for a given formula ϕ. As noted, one needs an L-formula expressing,
the diagonalization of x is ϕ
which one then diagonalizes to obtain θ.
Above, "the diagonalization of x is ϕ" was expressed with
y(Diag(x,y)ϕ(y))
But with a function term diag(x), one can use instead
ϕ(diag(x))
One may then diagonalize this, to obtain a fixed point sentence θ
θ:=ϕ(diag(ϕ(diag(x))))
Note that θ has the form ϕ(t), where t is the term diag(ϕ(diag(x))). To show that this works, note that we have from (*),
PA+diag(ϕ(diag(x)))=ϕ(diag(ϕ(diag(x))
and so,
PA+t=ϕ(t)
So, ϕ(t) is a self-referential sentence containing a term t provably (in PA) denoting the code of ϕ(t). And we have a modified version of the Diagonal Lemma,
Diagonal Lemma using a Term diag(x)
Let PA+ be PA extended with an explicit definition of a function term diag(x). Let ϕ(x) be an L-formula. Then there is a term t such that,
PA+t=ϕ(t)

9. Gödel Sentence

For an arithmetic theory T to which Gödel's incompleteness theorem applies, a gödel sentence G is usually defined to be a fixed-point of the formula ¬ProvT(x) where ProvT(x) is a formula expressing "x is a theorem of T". That is,
QG¬ProvT(G)
The construction above shows that, if we use PA instead of Q as our underlying syntax theory, G can be expressed rather vividly as follows:
G:=¬ProvT(diag(¬ProvT(diag(x))))
So, G has the form ¬ProvT(t), where t denotes the code of G.

Comments

  1. Super stuff Jeff: must have been a rainy Sunday afternoon. The more of them you get, the better for us but the worse for you!

    The nuts and bolts of my fixation about self-reference come out here. You don't get self-reference by diagonalising alone, nor in the fixed point lemma, but you do get it directly in your final section 8 function term way (though you don't need function terms to get the self-reference). One might say there is never literally self-reference in arithmetized syntax, always reference to numbers but since we are referring to abstract expression types who is to say these aren't numbers? (Whatever the existence of numbers and types comes to, let's not go there just now!).

    Thus in your

    Let θ be the diagonalization of ψ. That is, θ is ψ(ψ).

    the formula ψ(ψ) makes an ascription not to itself but to the referent of ψ namely ψ(x). And likewise the sentence on the right hand side of the fixed point lemma namely ϕ(θ) is not ascribing ϕ to itself but to the the (code of) θ. Whereas, as you say:

    ϕ(t) is a self-referential sentence containing a term t provably (in PA) denoting the code of ϕ(t).

    I've always thought these self-referential loops really neat. In the dear departed comment I'd said something to the effect that you can't do that just by (ordinary) quotation or just by Godel naming alone. No sentence ["blah" is F] can ascribe F to itself, if it is a predication applying to whatever the stuff inside the quotation marks is a token of. For the blah inside the quotation marks has to be a proper part of the whole sentence. Similarly under normal Godel-coding ϕ is F cannot ascribe F to itself since the sentence as a whole will have a larger Godel code than the code of the sub-string ϕ- a string of 'S's ending in zero whose code will be greater than the number, viz.the code of ϕ, which it stands for. (And to get a quine program you can't just take some source code blah and write print 'blah' because the initial [print '] and final ['] will be additional.)

    So Godel's trick of diagonalising on a sentence which contains a representation of diagonalising is really clever as is Quine's ["yields a falsehood when appended to its own quotation" yields a falsehood when appended to its own quotation]. As are those quine programs: I suppose the basic trick is to construct a routine "printwice" such that [printwice 'blah'] prints the blah stuff twice, the second time in quotes, i.e. outputs [blah 'blah'] and then [printwice 'printwice'] does the job. In the Quine case we have a context [is F] which is such that ["blah" is F] doesn't just attribute a property to the blah stuff inside the quotes but more importantly attributes falsehood to something generated from the stuff inside the quote marks, which happens- well doesn't just happen- to be the sentence itself.

    ReplyDelete
  2. Thanks, Alan!
    What you say on quotations is right; but when we have a formula with a *term* that denotes the formula itself, the term need not be the quotation. Rather the term may be something like "the diagonalization of "...." ". For example, in Smullyan's approach, he uses what he calls the norm of an expression. (See R. Smullyan, 1957, "Languages in Which Self-Reference is Possible", JSL 22: 55-67 (reprinted in Hintikka, 1969, Philosophy of Mathematics.)
    Quine's example is very clever, and I use it in my lectures.

    Jeff

    ReplyDelete
  3. Nicely done, Jeff, but the part where you non-chalantly point out that Q does not have a primitive symbol for exponentiation and then go on saying that it can nonetheless represent all (primitive) recursive functions -- well, that's a bit like being told that the murderer is the butler, but without being told what the murder weapon was.

    All the action lies precisely in bootstrapping addition and multiplication into a representation of all (primitive) recursive functions, which requires a coding of sequences (a feat famously pulled off by Gödel with his beta-function). Now, you probably don't want to go into the glorious detail of the Chinese Remainder Theorem, but your readers need to know precisely where the magic takes place.

    But I'm picking at nits -- I really liked your exposition.

    ReplyDelete
  4. Thanks, Aldo! Yeah, you're right, I confess, I was being a bit sneaky about how we prove that all recursive functions are representable in Q (using sequence coding/decoding, the beta function, Chinese Remainder Theorem, etc.), given that we just have successor, plus and times as primitives in L. Maybe I'll add an update, with a couple of links.

    Jeff

    ReplyDelete
  5. Hi Jeff! Neat post, thanks for that. Not that it matters to your exposition, but the definition of weak representability isn't quite right. You'll want to say that A is weakly representable if condition (i) holds with "iff" in place of "if... then". As it stands, every set of natural numbers is weakly representable, and by the same formula, at that (e.g. any truth-functional tautology).

    --Kai

    ReplyDelete
  6. Oops - yes, you're right, Kai. Will update it.
    Cheers, Jeff

    ReplyDelete
  7. Thanks for sharing it is important for me. I also searched for that from here. Visit our site Trend Micro belgie

    ReplyDelete

Post a Comment