A
couple of
recent M-Phi posts by Catarina have discussed the topic of diagonalization and the Diagonal Lemma in Gödel's incompleteness results.
Because I can't be bothered to get out of bed, this post consists in some Sunday afternoon rambling about how
diagonalization is usually understood and how the
Diagonal Lemma is proved (i.e., how to define a fixed-point, or self-referential, sentence).
Gödel's original 1931 paper on incompleteness, "On Formally Undecidable Propositions in
Principia Mathematica and Related Systems, I", is brimming with new ideas, of which four important ones are:
(i) the notion of what is now called a primitive recursive function;
(ii) arithmetization of syntax (aka, gödel coding);
(iii) the diagonalization operation on a string;
(iv) given any $\mathcal{L}$-formula $\phi(x)$, the construction of a "fixed-point" or self-referential sentence $\theta$ equivalent to $\phi(\ulcorner \theta
\urcorner)$.
There are short explanations of each of these below.
1.
The (Usual) First-Order Language of Arithmetic
The language of arithmetic $\mathcal{L}$ is a first-order language with identity, and with signature $\{\underline{0},S,+, \times\}$ and is intended to talk about about the structure $\mathbb{N} = (\mathbb{N}, 0, S, +, \times)$. The set of $\mathcal{L}$-terms, $Tm(\mathcal{L})$, is defined inductively, by starting with $\underline{0}$ and all variables, $x_i$, and then closing under applying the function symbols $S, +, \times$. Then, an atomic $\mathcal{L}$-formula is defined to be of the form $t=u$, where $t, u \in Tm(\mathcal{L})$. And the set of $\mathcal{L}$-formulas, $Form(\mathcal{L})$, is defined inductively, by starting with atomic formulas $t = u$, and then closing under applying the unary connective $\neg$, the binary connective $\rightarrow$ and quantifiers $\forall x_i$. (Other connectives, and $\exists$, can be defined.)
It is useful sometimes to abuse notation and conflate the symbol $\underline{0}$ and its denotation, the number $0 \in \mathbb{N}$; or the symbol $S$ and its denotation, the function $S: \mathbb{N} \rightarrow \mathbb{N}$, etc. Context is usually sufficient to disambiguate. If one is considering a non-standard $\mathcal{L}$-interpretation $\mathcal{A}$ (i.e., $\mathcal{A}$ is not isomorphic to $\mathbb{N}$), then $S^{\mathcal{A}}$ will be the denotation of $S$ in $\mathcal{A}$; $+^{\mathcal{A}}$ the denotation of $+$, and so on.
Although taking $\underline{0},S, +$ and $\times$ as undefined primitives is fairly standard, some presentations include the binary relation symbol $x < y$ and a binary function symbol $exp(x,y)$ (denoting the exponential function, $x^y$) as additional primitives. But in fact both can be defined in $\mathbb{N}$: for $x < y$, this is obvious, but for $exp(x,y)$, it is not so obvious.
A bounded quantifier has the form $(\forall x \leq t)$ or $(\exists x \leq t)$, where $t$ is a term, and $x \leq t$ is the formula $\exists z(x + z = t)$. A bounded formula $\phi$ is a formula in which all quantifiers are bounded. A $\Sigma_1$-sentence has the form $\exists x_1 \dots \exists x_k \phi$, where $\phi$ is bounded.
2.
Canonical Numerals for Numbers
Because of the presence of $\underline{0}$ and $S$ in $\mathcal{L}$, there is a canonical way of denoting numbers, using canonical numerals. Roughly, the number $n$ is denoted by the term $S...S\underline{0}$, with $S$ repeated $n$ times. More exactly, one defines a function
$\underline{}: \mathbb{N} \rightarrow Tm(\mathcal{L})$,
by primitive recursion as follows:
$\underline{0}: = \underline{0}$
$\underline{n+1} : = S\underline{n}$
So, for example, the number 7 is denoted by $\underline{7}$, which is the term $SSSSSSS\underline{0}$. By induction, it is easy to prove that $(\underline{n})^{\mathbb{N}} = n$.
3.
Robinson Arithmetic: $Q$
A very important weak system of arithmetic is Robinson arithmetic, $Q$. Its axioms are:
$(Q_1) \forall x(Sx \neq \underline{0})$
$(Q_2) \forall x \forall y(Sx = Sy \rightarrow x = y)$
$(Q_3) \forall x(x \neq \underline{0} \rightarrow \exists y(x = Sy))$
$(Q_4) \forall x(x + \underline{0} = x)$
$(Q_5) \forall x \forall y(x + Sy = S(x+y))$
$(Q_6) \forall x(x \times \underline{0} = \underline{0})$
$(Q_7) \forall x \forall y(x \times Sy = x \times y + x)$
It is noteworthy that $Q$ does not have as an axiom scheme the induction scheme,
$(\phi(\underline{0}) \wedge \forall x(\phi(x) \rightarrow \phi(Sx))) \rightarrow \forall x \phi(x)$
Extending $Q$ with the induction scheme gives Peano arithmetic.
The system $Q$ is sound and complete for certain classes of sentences. That is,
$Q \vdash \phi$ if and only if $\phi$ is true.
This holds when $\phi$ is a quantifier-free sentence, a bounded sentence or a $\Sigma_1$-sentence.
In general, $Q$ can correctly perform finite computations using successor, addition and multiplication. However, $Q$ is not very good at algebra. For example, $Q$ does not prove $\forall x \forall y(x + y = y + x)$. Importantly, although $Q$ has axioms for addition and multiplication, it does not have axioms for exponentiation.
3.
Gödel Coding
Under gödel coding, each
primitive symbol $\alpha$ of the alphabet of $\mathcal{L}$ is given a code number, $c(\alpha)$. Also, each finite sequence $(n_1, \dots, n_k)$ of numbers can be coded as a number, sometimes written as $\langle n_1, \dots, n_k \rangle$. For example, one might use the coding
$\langle n_1, \dots, n_k \rangle := (p_1)^{n_1 + 1}\times \dots \times (p_k)^{n_{k} + 1}$,
where $p_1, p_2, \dots$ are the primes in ascending order, beginning with 2. (And we assign $0$ as the code of the empty string $()$.) For example, $\langle 10, 0, 1 \rangle = 2^{11}\times 3 \times 5^2$. Such numbers are then called sequence numbers.
Because $\mathcal{L}$-strings are sequences of primitive symbols, the code of an $\mathcal{L}$-string $(\alpha_1, \dots, \alpha_k)$ is defined as:
$\#(\alpha_1, \dots, \alpha_k):= \langle c(\alpha_1), \dots, c(\alpha_k) \rangle$.
(One needs to be a bit careful as the code of a primitive symbol, say $\forall$, is not the same as the code of the corresponding one-element string $(\forall)$. For example, if $c(\forall) = 10$, say, then $\#(\forall) = 2^{11}$.)
Because each $\mathcal{L}$-string $\phi$ now has a code, say $n = \#\phi$, we can think of the numeral $\underline{n}$ as a kind of name of $\phi$. So, if $n = \# \phi$, then the numeral $\underline{n}$ is written: $\ulcorner \phi \urcorner$, and is sometimes called the gödel quotation name of $\phi$.
So, using coding, the interpreted language of arithmetic is able to talk (indirectly) about the syntax of $\mathcal{L}$. For each syntactical entity $\phi \in \mathcal{L}$ has a kind of name $\ulcorner \phi \urcorner$ in $\mathcal{L}$ itself. Furthermore, special sets of (and relations amongst) syntactical entities become sets of numbers under coding. If $A$ is a set of syntactical entities, then $\#(A)$ is $\{\#(\alpha) : \alpha \in A \}$, the corresponding set of numbers. For example, the variables of $\mathcal{L}$ are coded as a set of numbers; the set of terms is coded as a set of numbers. Etc. Then one can ask whether these sets (or relations) are "expressible" in the language of arithmetic.
In general, these syntactical sets and relations come out, under coding, as either primitive recursive, or recursive, or recursively enumerable.
4.
Primitive Recursive Functions
The primitive recursive functions are nowadays defined a bit differently from Gödel's original approach. (See
this SEP article for more details.) First,
The basic functions:
(i) the zero function $Z : \mathbb{N} \rightarrow \mathbb{N}$, by $Z(n) = 0$.
(ii) the successor function $S : \mathbb{N} \rightarrow \mathbb{N}$, by $S(n) = n+1$.
(iii) the projection functions $P^k_j : \mathbb{N}^k \rightarrow \mathbb{N}$ (with $j \leq k$), by $P^k_j(n_1, \dots, n_k) = n_j$.
One may then build up more functions by either composition, or by primitive recursion. That is, given functions $g : \mathbb{N}^k \rightarrow \mathbb{N}$ and $h : \mathbb{N}^{k+2} \rightarrow \mathbb{N}$, we say that $f : \mathbb{N}^{k+1} \rightarrow \mathbb{N}$ is obtained by primitive recursion from $g$ and $h$ just if, for all $x, y_1, \dots, y_k \in \mathbb{N}$,
$f(y_1, \dots, y_k, 0) = g(y_1, \dots, y_k)$
$f(y_1, \dots, y_k, x+1) = h(x, y_1, \dots, y_k, f(y_1, \dots, y_k, x))$
The primitive recursive functions are then defined to be those functions obtainable from the basic functions by composition and primitive recursion.
There is a further operation on functions---minimization. Closing the primitive recursive functions under that operation then yields the recursive functions.
The notion of recursiveness can be extended to subsets of, and relations on, $\mathbb{N}$ by considering the characteristic function of the set or relation. Suppose $A \subseteq \mathbb{N}$. The characteristic function $\chi_A$ is defined by:
$\chi_A(n) = 1$ if $n \in A$, and 0 otherwise.
If $\chi_A$ is (primitive) recursive, we say that $A$ is (primitive) recursive.
For a relation $R \subseteq \mathbb{N}^k$, the definitions are analogous.
5.
Representability in $Q$
There are two main notions of "expressing" a numerical function or relation. The first is
semantical: given some set $A \subseteq \mathbb{N}$, simply whether there is an $\mathcal{L}$-formula $\phi(x)$ such that
$A = \{n : \mathbb{N} \models \phi(\underline{n})\}$.
If so, we say that $\phi(x)$ defines $A$.
The other notion is
syntactical, and always involves a theory $T$ in $\mathcal{L}$. Given a function $f : \mathbb{N}^k \rightarrow \mathbb{N}$, we say $f$ is
representable in $T$ just when there is an $\mathcal{L}$-formula $\phi(x_1, \dots, x_k, y)$ such that, for all $n_1, \dots, n_k$,
$T \vdash \forall y(y = \underline{f(n_1, \dots, n_k)} \leftrightarrow \phi(\underline{n_1}, \dots, \underline{n_k}, y))$
Given a theory $T$, a similar notion can be applied to subsets, and relations on, $\mathbb{N}$. Suppose $A \subseteq \mathbb{N}$. We say that $A$ is
strongly representable in $T$ just when there is some formula $\phi(x)$, such that for all $n \in \mathbb{N}$,
(i) if $n \in A$ then $T \vdash \phi(\underline{n})$
(ii) if $n \notin A$, then $T \vdash \neg \phi(\underline{n})$
We say that $A$ is
weakly representable in $T$ just when condition (i) holds, but with "if ... then" replaced by "... iff ..." (thanks to Kai Wehmeier for the correction).
An analogous definition can be given for a relation $R \subseteq \mathbb{N}^k$ to be strongly or weakly representable.
The most important representation property for $Q$ is:
Representability in $Q$
All recursive functions are representable in $Q$.
To prove this, one shows first that each of the basic functions is representable in $Q$; and then one shows that the representable functions are closed under composition, primitive recursion and minimization. Details of how to do this are given in standard textbooks, such as H. Enderton,
A Mathematical Introduction to Logic. As Aldo Antonelli notes in the comments below, in order to show that functions obtained by primitive recursion are representable, one needs to show how sequence coding and decoding can be expressed in $\mathcal{L}$, which only has $S, +$ and $\times$ as primitives. The crucial result required is Gödel's Beta Function Lemma, using the Chinese Remainder Theorem. These results need only to be proved in the metatheory, not in $Q$ itself (which lacks induction).
The converse of the representability theorem is true as well: if $f$ is representable in $Q$, then $f$ is recursive.
A corollary of the representation theorem for $Q$ is that all recursive sets and relations are strongly representable in $Q$. So, if $R \subseteq \mathbb{N}^k$ is recursive, there is a formula $\phi(x_1, \dots, x_k)$ such that, for all $n_1, \dots, n_k \in \mathbb{N}$,
(i) if $Rn_1\dots n_k$, then $Q \vdash \phi(\underline{n_1}, \dots, \underline{n_k})$.
(ii) if $\neg Rn_1 \dots n_k$, then $Q \vdash \neg \phi(\underline{n_1}, \dots, \underline{n_k})$.
As it turns out, for a standard axiomatic theory $T$, the set of (codes of) axioms is recursive (for example, a computer program can be written to decide of a given input string whether it is an axiom or not), indeed, primitive recursive. Furthermore, the proof-in-$T$ relation, "$d$ is a proof of $\phi$ (in $T$)", is primitive recursive. So, there are $\mathcal{L}$-formulas which strongly represent (in $Q$) such relations. In particular, there is a formula $Prf_T(y,x)$ which strongly represents the proof-in-$T$ relation: $y$ is the code of the proof-in-$T$ of formula with code $x$. Then the formula $\exists y Prf_T(y,x)$ expresses "$x$ is a theorem of $T$" and is written $Prov_T(x)$.
6.
Diagonalization
Consider the binary syntactic operation on strings,
the result of substituting the quotation of string $A$ for all free occurrences of a variable in string $B$.
Although this is binary, obviously, one can take this operation on the same argument, i.e.,
the result of substituting the quotation of string $A$ for all free occurrences of a variable in string $A$.
Informally, this is (syntactic) diagonalization.
For example, the diagonalization of
$x$ contains 23 symbols
is
"$x$ contains 23 symbols" contains 23 symbols
If $\phi$ is a formula of the language of arithmetic, with at most the variable $x$ free, then its diagonalization is defined to be:
$\phi(x/\ulcorner \phi \urcorner)$
I.e., the result of substituting the gödel quotation $\ulcorner \phi \urcorner$ for all free occurrence of $x$ in $\phi$.
An alternative, but equivalent (in predicate logic) is:
$\exists x(x = \ulcorner \phi \urcorner \wedge \phi)$
Clearly, this syntactic operation of diagonalization corresponds to a numerical function, $d : \mathbb{N} \rightarrow \mathbb{N}$, given by, for any $\mathcal{L}$-string $\phi$,
$d(\#\phi) = \#(\phi(x/\ulcorner \phi \urcorner))$ if $\phi$ is an $\mathcal{L}$-formula with at most $x$ free;
$d(n) = 0$ if $n$ is not the code of such an $\mathcal{L}$-formula
One can show (the details are tedious) that $d$ is primitive recursive.
Because all recursive functions are representable in $Q$, it follows that there is an $\mathcal{L}$-formula $Diag(x,y)$ such that, for all $n \in \mathbb{N}$,
$Q \vdash \forall y(y = \underline{d(n)} \leftrightarrow Diag(\underline{n}, y))$
7.
The Diagonal (or Fixed Point) Lemma
Let $\phi(x)$ be an $\mathcal{L}$-formula with at most $x$ free. (If $\phi$ happens to have only one variables free, say $y$, but not $x$, then one can do a relabelling of variables to ensure that $\phi$ has only $x$ free.) We want to find a formula $\theta$ that is in some sense equivalent to $\phi(\ulcorner \theta \urcorner)$.
First consider informally, the expression,
the diagonalization of $x$ is $\phi$
If one diagonalizes this, one gets,
the diagonalization of "the diagonalization of $x$ is $\phi$" is $\phi$
Call this statement $\theta$. Now $\theta$ says that a certain thing is $\phi$. This thing is the diagonalization of "the diagonalization of $x$ is $\phi$"---but this is precisely the statement $\theta$ itself. Such a statement $\theta$ is called a fixed-point sentence for the original formula $\phi(x)$, and it says of itself that it is $\phi$.
So, to get this into the formalized language $\mathcal{L}$, we begin with some formula $\phi$ and we consider,
$\exists y(Diag(x, y) \wedge \phi(y))$
Call this formula $\psi(x)$. Let $\theta$ be the diagonalization of $\psi$. That is, $\theta$ is $\psi(\ulcorner \psi \urcorner)$. So, trivially, we have,
$\vdash \theta \leftrightarrow \psi(\ulcorner \psi \urcorner)$
In particular, $\#(\theta) = d(\# (\psi))$. So, the term $\ulcorner \theta \urcorner$ is the numeral $\underline{d(\# (\psi))}$. And thus, trivially, we have
$\vdash \ulcorner \theta \urcorner = \underline{d(\# (\psi))}$
Furthermore, since $Q$ represents $d$, we have,
$Q \vdash \forall y (y = \ulcorner \theta \urcorner \leftrightarrow Diag(\ulcorner \psi \urcorner, y))$
Let us now reason about $Q$:
$Q \vdash \theta \leftrightarrow \psi(\ulcorner \psi \urcorner)$
$Q \vdash \theta \leftrightarrow \exists y(Diag(\ulcorner \psi \urcorner, y) \wedge \phi(y))$
$Q \vdash \theta \leftrightarrow \exists y(y = \ulcorner \theta \urcorner \wedge \phi(y))$
$Q \vdash \theta \leftrightarrow \phi(\ulcorner \theta \urcorner)$
This establishes the
Diagonal Lemma
Given an $\mathcal{L}$-formula $\phi(x)$, there is an $\mathcal{L}$-sentence $\theta$ such that
$Q \vdash \theta \leftrightarrow \phi(\ulcorner \theta \urcorner)$
8.
Diagonal Lemma using a Function Term
For the Diagonal Lemma, one needs that the external syntactic operation of diagonalization on a string be represented in the theory. Under coding, the diagonalization operation corresponds to a primitive recursive function, $d : \mathbb{N} \rightarrow \mathbb{N}$, the diagonal function.
$Q$ has the property that all recursive functions are representable --- which means that for any recursive function $f : \mathbb{N}^k \rightarrow \mathbb{N}$, there is a formula $\phi_f(x_1, \dots, x_k, y)$ which represents it. Note that the choice of such a formula is not unique. But $Q$ is weak, and it is not always true that $Q$ proves that this formula is functional. In particular, $Q$ may not prove $\forall x_1 \dots \forall x_k \exists y \phi_f(x_1, \dots, x_k, y)$. For example, $Q$ cannot prove the totality of exponentation.
However, if one extends $Q$ with the induction scheme, then this situation changes. $PA$ is strong enough to prove the totality of any primitive recursive function. In particular, one can choose the formula $Diag(x,y)$ representing the diagonal function $d$ such that,
$PA \vdash \forall x \exists y Diag(x, y)$
When this is so, one can introduce, by explicit definition, a new function symbol $diag(x)$ as follows:
$y = diag(x): = Diag(x,y)$
Let $PA^+$ be the result of adding this definition to $PA$. By the representablity of $d$, we have,
(*) $PA^+ \vdash diag(\ulcorner \phi \urcorner) = \ulcorner \phi(\ulcorner \phi \urcorner) \urcorner$
Returning to the Diagonal Lemma, suppose that one wishes to find the fixed-point for a given formula $\phi$. As noted, one needs an $\mathcal{L}$-formula expressing,
the diagonalization of $x$ is $\phi$
which one then diagonalizes to obtain $\theta$.
Above, "the diagonalization of $x$ is $\phi$" was expressed with
$\exists y(Diag(x, y) \wedge \phi(y))$
But with a function term $diag(x)$, one can use instead
$\phi(diag(x))$
One may then diagonalize this, to obtain a fixed point sentence $\theta$
$\theta: = \phi(diag(\ulcorner \phi(diag(x)) \urcorner))$
Note that $\theta$ has the form $\phi(t)$, where $t$ is the term $diag(\ulcorner \phi(diag(x)) \urcorner)$. To show that this works, note that we have from (*),
$PA^+ \vdash diag(\ulcorner \phi(diag(x)) \urcorner) = \ulcorner \phi(diag(\ulcorner \phi(diag(x) \urcorner) \urcorner$
and so,
$PA^+ \vdash t = \ulcorner \phi(t) \urcorner$
So, $\phi(t)$ is a self-referential sentence containing a term $t$ provably (in $PA$) denoting the code of $\phi(t)$. And we have a modified version of the Diagonal Lemma,
Diagonal Lemma using a Term $diag(x)$
Let $PA^+$ be $PA$ extended with an explicit definition of a function term $diag(x)$. Let $\phi(x)$ be an $\mathcal{L}$-formula. Then there is a term $t$ such that,
$PA^+ \vdash t = \ulcorner \phi(t) \urcorner$
9.
Gödel Sentence
For an arithmetic theory $T$ to which Gödel's incompleteness theorem applies, a gödel sentence $G$ is usually defined to be a fixed-point of the formula $\neg Prov_{T}(x)$ where $Prov_{T}(x)$ is a formula expressing "$x$ is a theorem of $T$". That is,
$Q \vdash G \leftrightarrow \neg Prov_{T}(\ulcorner G \urcorner)$
The construction above shows that, if we use $PA$ instead of $Q$ as our underlying syntax theory, $G$ can be expressed rather vividly as follows:
$G: = \neg Prov_{T}(diag(\ulcorner \neg Prov_{T}(diag(x))\urcorner))$
So, $G$ has the form $\neg Prov_{T}(t)$, where $t$ denotes the code of $G$.