## Thursday, 28 March 2013

### Speech Communities and Natural Languages

Let's suppose, with Chomsky, that the basic explanatory entities of scientific linguistics are individual languages spoken by individual speakers. These are idiolects, or micro-idiolects, or Chomskyan I-languages, and are somehow "cognized" or implemented by the cognitive systems of individuals.

Then one wonders how to make sense of the collective notions of:
• "speech community"
• "natural language".
It seems that these notions are intimately connected: for a natural language, such as Punjabi, French, Middle English, Euskadi, etc., is usually identified in terms of some speech community of its speakers.

Here are two attempted explications:
A set $C$ of agents is a speech community if and only if, for any pair $x,y \in C$, the speech behaviour of $x$ and $y$ is mutually interpretable.
$L$ is a natural language if and only if there is a speech community $C$ each member of which uses/cognizes $L$.
The first uses the notion of "speech behaviour being pairwise mutually interpretable", and this has genuine empirical content, in terms of the observable overall ease of social co-operation between agents. The second uses the notion of "speech community" and the hard to define, but crucial, notion of "using/cognizing a language".

Both notions are vague. Mutual interpretability is a rather vague and context-dependent matter; consequently, what sets count as speech communities will inherit this vagueness. And even when there is a speech community, it is usually the case that it is somewhat heterogeneous, and therefore for no idiolect pair $L_1,L_2$ do we have $L_1 = L_2$, strict dictu. So, only at some idealized level is there some single "external" language $L$ that all members of $C$ speak. Rather, this $L$ is an idealization that somehow approximates the varying idiolects $L_1, L_2$, etc., spoken within the speech community.

There are some refinements that might be introduced to the above rough ideas (see the comments below). The most obvious would be treat the "mutual interpretability" relation as a matter of degree. Instead of being modelled by a graph (nodes representing speakers; edges representing mutual interpretability), in which a speech community is a maximal clique, a collection of speakers might be modelled by a weighted graph; and speech community is then like a kind of weighted clique.

Finally, probably one ought to be sceptical about attempts to make precise the notions of "natural language" or "speech community". A similar view can be found in Chomsky's work. For example, in his "Knowledge of Language" (1975):
The notion of ‘language’ as a common property of a speech community is a construct, perfectly legitimate, but a higher-order construct. In the real world, there are no homogeneous speech communities, and no doubt every speaker controls several grammars, in the strict sense in which a grammar is a formal system meeting certain fixed conditions.
[UPDATE (29th March): I've added two paragraphs at the end.]

1. Begin with replacing 'if and only if' with notion of mutual interpretability that admits of degree.

2. Maybe one can join together vague notions, with a rather tight connection? A is vague and B is vague; but we use A when and only we use B.
Alternatively, define speech communities as fuzzy sets. But doing so would still invoke "if and only if", as in:

x is in C to degree r iff [... r ....]

But introducing fuzzy sets is probably a bit premature ... The rough idea is that members of speech community understands each other, to some approximation.

Jeff

1. I don't know why you need fuzzy sets or a 'speech-community'; why not just make do with

C(x, y) = n

where C the degree to which x and y communicate, with n being in the closed interval 0,1. Of course that is extremely crude, as C will itself be vague; you'd have at least to speak of C in various respects.

2. Yes, that sounds like the right thing to consider. Then the whole thing is going to be like a weighted graph, with the speakers as nodes, and the weights between the nodes representing the "degree of mutual interpretability".

For an ordinary graph with nodes and edges, a speech community is roughly a maximal clique (a set of nodes which are all edge-related). But for a weighted graph, I'm not sure what the corresponding notion is (maybe some sort of "fuzzy clique").

But, I guess then that being a speech community is relative to a parameter in [01,], say, r, and every pair in the community would communicate to degree greater than or equal to r.

Something like this seems the right way to remove some of the vagueness in the simplistic definition of a speech community that I give in the post.

Jeff

3. Yes, progress!

4. Thanks! And then, the thing is, the notion of a natural language inherits the parameter too - so, there's no such thing as, say, English simpliciter. (Unless one fixes some minimum value, say, 0.95, but this is a bit arbitrary.) To avoid values less than 1, one ends up "squeezing" the communities to individuals, and then one has only idiolects. (Which is grist to my mill!)

Still, I want to get the initial starting points (how to make sense of "speech community" and "natural language") a bit clearer, before adding the bells and whistles we're discussing - which is progress, yes.

Jeff

3. One obvious question here is why one thinks one needs these kinds of notions. Chomsky pretty clearly thinks one doesn't. To my mind, the philosophical force behind them lies in certain normative conceptions we associate with language. So the question is whether and if so how one can explain those, from within a broadly Chomskyan perspective. I take on part of that problem in a paper entitled "Idiolects", which is on my website.

1. Many thanks, Richard
Seems like the right question and that's an interesting paper!

Jeff