The Intelligent Design (ID) movement proposes that it is possible to detect whether something has been designed by inspecting it, assessing whether it might have been produced by either necessity or chance alone, and, if the answer is negative, concluding it must have been designed. This means there is no need to make any commitments about the nature of the designer.

This approach relies heavily on the concept of specification. The proponents of ID have made various attempts to define specification. A recent attempt is in a paper written by William Dembski in 2005 which is clearly intended to supersede previous attempts. This essay examines this revised definition of specification and highlights some issues in Dembski's paper. It also proposes that our intuitive understanding of when an outcome is implausible is much better explained by a comparison of the likelihoods of different hypotheses. Finally the essay considers some of Dembski's objections to the comparison of likelihoods and Bayesian approaches in general.

Introduction

There are a large number of refutations of the Intelligent Design movement on the Internet, in academic journals, and in books. Many of the authors have better credentials than I do. So it is reasonable to ask -- why another one? There are two reasons:

William Dembski wrote a new paper in June 2005 (Dembski 2005a) which supersedes his previous explanations of concepts such as specification. As far as I know, no one has made a new assessment based on this paper. Quite possibly earlier critics have grown tired of pointing out the obvious, but I am new to this game and remain motivated.
I have tried hard to make this essay readable for someone with little knowledge of statistics or probability. The downside of this is that those who know a bit will find parts childishly simple.

I also see this as an opportunity to learn more myself and welcome comments. My e-mail address for this purpose is: [email protected].

Body

Suppose you buy a new computer game that plays poker with you. It is meant to deal you a poker hand at random [1] and then play a game with you. You start it up and on the very first round it deals you these five cards:

A♠, K♠, Q♠, J♠, 10♠

(For those who don't know the rules of Poker - this hand is known as a Royal Flush and it's the highest Hand You can get).

What would your reaction be? Do you accept that you were lucky, or do you decide that the program did not deal your hand randomly, it is not performing as advertised, and you return it to the shop with an angry note? Almost anyone with a small knowledge of poker would make the second choice. The chances of getting a Royal Flush in spades if the cards are dealt randomly are about 2.5 million to 1. This is roughly the same as the odds of you being killed by lightning in the coming year.

Now suppose instead the computer deals you this hand:

2♣, 3♦, 7♥, 10♠, Q♦

There is nothing special about this hand in poker -- in fact it has a very low score. Most poker players would accept that this was a random deal without a second thought. But the probability of getting this hand with a random deal is the same as the probability of getting the first hand. In fact the probability of getting any named five cards is the same.

These examples crop up repeatedly in this essay so I will give them names. I will call the first one Hand X and the second one Hand Y.

Why do we count Hand X as an incredible coincidence -- so much so that we dismiss the possibility of a random deal -- while we accept Hand Y as quite normal? Hand X is in some sense special; whereas Hand Y is ordinary.

This is an important question in statistics, and there is some dispute over the answer. It is also an important question for the intelligent design movement and its proponents believe they have the answer. They would claim the first hand is not just improbable but also that it is specified. That is, it conforms to a pattern and this is what makes it so special. (They then go on to say that some living systems are also specified and that the conventional explanation of evolution based on random mutation and natural selection can be dismissed, just as the random deal can be dismissed as an explanation of the Royal Flush.)

The leading figure in explaining the intelligent design concept of specification is William Dembski. He has written about the subject many times as he tries to refine and justify the concept. His most recent explanation is "Specification: The Pattern That Signifies Intelligence" (Dembski, 2005) a paper to be found on his web site. He says in addendum 1, referring to previous explanations of specification in his books, that: "The changes in my account of these concepts here should be viewed as a simplification, clarification, extension, and refinement of my previous work..." So it seems we are pretty safe as treating this paper as superseding all previous explanations of specification and representing the definitive view of the intelligent design movement.

Dembski's paper is 41 pages long, and his definition of specification is complex, so I will try to extract the key points. He approaches specification in stages, starting with Fisherian significance testing. This form of testing a hypothesis is a useful stepping stone to his concept of specification because it is familiar to many readers and is similar to specification in some important respects. It is an alternative way of looking at an event and deciding whether that event could plausibly be the result of a given hypothesis. Fisherian significance testing guides us to make this decision based on whether the observed data are extreme [2]. For example, suppose a car is rated by the government as having an average fuel consumption of 40 miles per gallon (mpg). A consumer organisation wishes to check this figure and monitors the fuel consumption of a sample of 10 cars from different owners. They find that the average fuel consumption of the 10 cars is 30 mpg. It is possible that the government is right and that this was just a sample of cars that for various reasons (driving habits, poor servicing, urban environment) had a worse than average fuel consumption. However, the more the measured fuel consumption of the sample differs from 40 mpg the more we would be inclined to say that the government was wrong and the hypothesis that the average fuel consumption is 40 mpg can be rejected. This applies if the sample fuel consumption is very high or very low. This is similar to the example of Hand X in that a hypothesis (random deal, 40 mpg average fuel consumption) is rejected because the results (Royal Flush, very high or low sample average fuel consumption) are very unlikely given the hypothesis. But in the case of the vehicle fuel consumption there is no specification. High or low sample averages are not special in the way that a Royal Flush is. They are just extreme. They are at the edges of what is possible given the hypothesis that the average fuel consumption is 40 mpg.

Dembksi also includes an extension of this concept. Suppose that all 10 cars in the sample returned a fuel consumption that was exactly 40 mpg within the accuracy of our measurement system. He would claim that this is also evidence against the hypothesis because the observed data are too good [3]. This is a curious observation and not, as far as I know, part of Fisher's approach to significance testing. I will come back to it later in the essay.

Dembski then goes on to treat specification as an extension of classical hypothesis testing, only now the reason for rejecting a hypothesis is not defined in terms of the extremity of the observed data; instead he defines specification in terms of conforming to patterns. In the two examples of poker hands above, Hand X could be said to conform to a pattern and that is the reason it is special. In fact Hand X conforms to many patterns. Here are just a few of them:

Patterns which Hand X (A♠, K♠, Q♠, J♠, 10♠) conforms to

Pattern	An example of another hand that conforms to the same pattern
All 5 cards have different values	2♣, 3♦, 7♥, 10♠, Q♦
All 5 cards are in a numerical sequence	7♣, 8♦, 9♥, 10♠, J♦
All 5 cards are of the same suit	2♣, 3♣, 7♣, 10♣, Q♣
Is a Royal Flush	A♣, K♣, Q♣, J♣, 10♣
Is a Royal Flush in the suit of spades	no other hand matches this pattern

To get to his definition of specification Dembski concentrates on two properties of a pattern. One property is "the probability that the chance hypothesis under investigation will match the pattern". He doesn't have a word for this but you could say that some patterns are more demanding of the hypothesis than others, in the sense that the hypothesis is less likely to produce a result that matches these patterns than others. So the pattern "All 5 cards have different values" is not very demanding of a random deal because there are many, many different hands that conform to that pattern. The pattern "Is a Royal Flush" is extremely demanding as only four hands match the pattern.

The other property is the simplicity of describing the pattern. This is a subtler property. It has to do with the minimum number of concepts that are needed to describe the pattern. For example, the string of numbers:

2, 2, 2, 2, 2, 2, 2, 2, 2, 2

can be described as "repeat 2 ten times". However, it is hard to see how to describe the following string so succinctly:

2, 7, 132, 41, 13, 1006, 18, 25, 99, 7

Dembski defines this idea more formally on page 16 of the paper. I don't think the concept of simplicity is as clear as he suggests -- but it makes rough intuitive sense that the pattern "Is a Royal Flush" is simpler than the pattern "Includes a 2, 3, 7, 10 and Q".

Dembski then defines the specification of an observed event in terms of the simplest pattern which that event conforms to and the list of all equally simple or simpler patterns that it might have conformed to, but didn't. He calls the number of such patterns the specificational resources of the event. This is something of a mouthful so I will try to expand on it. Hand X conforms to the pattern "Is a Royal Flush". Let us assume that we are clear about what "simple" means and that "Is a Royal Flush" is the simplest pattern that this hand matches. Then the specificational resources associated with Hand X is the number of other patterns that a poker hand might conform to that are equally simple or simpler. For example, the pattern "comprises 2, 3, 4, 5, 6 all of the same suit" would appear to be equally simple and should be included in the list of patterns that go to make up the specificational resources of Hand X. It seems that the intuition that Dembski is trying to express more precisely is that when we say "what a coincidence" on receiving a Royal Flush we should recognise that there are a number of other hands we might have received that would have produced a similar level of surprise. Before ruling out the hypothesis of a random deal we should consider the total probability of producing an outcome that is at least as surprising.

Actually there is further refinement to the concept of specificational resources which creeps into Dembski's paper without comment. A pattern may be simpler than another but also less demanding, i.e. easier to achieve by chance. The pattern "contains no aces" is extremely simple -- surely at least as simple as a Royal Flush -- but there is a high probability of a random deal having no aces. On page 18 of the paper Dembski appears to define specificational resources as including any pattern at least as simple as the observed pattern. This would mean the specificational resources of the Royal Flush included patterns such as "has no aces" which are not at all demanding. Then on page 19 he switches to the probability of matching any pattern which is simpler and more demanding than the observed pattern. This makes sense in terms of satisfying our intuitive feeling that a Royal Flush is an extraordinary coincidence although there is no attempt to justify the revised definition. Let us, however, accept the amended definition: the specificational resources of an event is a number which is calculated by counting the simplest pattern which the event conforms to plus all the other patterns it might have conformed to which are at least as simple and no more probable.

His final step is to define the specificity of the outcome by multiplying the specificational resources (the number of possible patterns of equal or greater simplicity) by the probability of the observed simplest pattern and taking the negative logarithm to base 2 of this figure. The less mathematically inclined can ignore the last part -- the important thing is that specificity is defined in terms of the probability of matching the observed pattern and the number of possible patterns of similar simplicity. The final bit of mathematical manipulation just makes specificity a number that increases as the probability gets smaller and ranges between zero and infinity rather than one and zero.

There are a lot of issues with this paper -- for example, how in practice do you determine the specificational resources for a real event, such as the development of the immune system, as opposed to the controlled world of poker hands and coin tosses. However, I want to focus on one particular concern. What is the justification for Dembski's approach? Why should we reject a hypothesis because the data conform to a simple but demanding pattern? Dembski justifies his approach by essentially saying it is an extension of Fisherian significance testing. So let's return to hypothesis testing.

Dembski describes Fisherian significance testing, but nowadays this is not common practice for hypothesis testing which owes more to Neyman and Pearson -- who were strongly opposed to Fisher's approach. If you open almost any introductory statistics text book and turn to the section on hypothesis testing you will see that the student is told that they should define two hypotheses -- the null hypothesis (the one being tested) and an alternative hypothesis. In fact hypothesis testing is often considered to be a balance of two risks -- the risk of rejecting the null hypothesis when it is true versus the risk of rejecting the alternative hypothesis when it is true.

The need to do this can be illustrated by the common question of whether to use one-tailed or two-tailed testing. Let's go back to the fuel consumption test, but let us now assume that the 40 mpg average fuel consumption claim came from the manufacturer and it is a key part of their advertising campaign because they have been under heavy pressure to produce a vehicle with a better fuel consumption. In this context we would dismiss the possibility that the average fuel consumption was actually better than stated because in that case the manufacturer would certainly have said so. The only alternative hypothesis we would entertain is that the manufacturer is exaggerating its claim and the average fuel consumption is worse than advertised. So we would accept extremely low miles per gallon as evidence for dismissing the 40 mpg claim, but not extremely high miles per gallon. In statistical parlance we would use a one-tailed test and we are only going to reject the null hypothesis if the results fall into one extreme -- but not if they fall into the other. The reason is clear. When the results are extremely low the alternative hypothesis provides a better explanation of the results than the null hypothesis. When the results are extremely high they are still extreme but the null hypothesis provides a better explanation than the alternative hypothesis.

With a little imagination it is possible to devise even more compelling illustrations of the role of the alternative hypothesis. Suppose that the problem now is that the old models of the vehicle had an average mpg that varied between 30 and 50 mpg and on rare occasions even more -- although the overall average of all vehicles was 40 mpg. The manufacturer introduces an innovation that it claims will give a much more consistent mpg that will typically remain between 42 and 45 mpg (yes I know it is an engineering miracle). The consumer organisation wants to verify that this is true. It takes a sample of five recent models and finds the average mpg for all five models is between 42 and 45 mpg. These results are not at all extreme according to the old hypothesis and had we not had an alternative hypothesis in mind we would probably have accepted the similarity of the figures as a fairly unremarkable coincidence. However, the alternative hypothesis provides a much better explanation for the figures and therefore we would consider them as strong evidence against the null hypothesis that the fuel consumption pattern had not changed. This is nothing to do with extremes. If the data had all been very high or very low the null hypothesis would have been the best explanation and we would not have rejected it.

Dembski himself provides a further example when he writes of some data being too good. Returning to our original example -- suppose we are determining if the average fuel consumption is 40 mpg and all the cars in the sample return almost exactly 40 mpg. He would say this is too good a fit and the null hypothesis should be rejected. But why? Again the answer appears obvious. We know a bit about human nature and it seems likely that someone is either fiddling the data or unconsciously altering it. Alternatively the sample may not be truly random -- maybe the owners all had very similar lifestyles. There are a number of plausible explanations that explain the data better than the null hypothesis -- even if they have not been explicitly stated.

How does this approach extend to specification and the example of the Royal Flush? Most people would accept that the Royal Flush is a literally incredible coincidence. But there is a much simpler explanation of why it is incredible than using the concept of specification. There are alternative hypotheses that provide a better explanation for meeting this pattern than a random deal. They include:

The writer of the program knows the rules of poker and fixed it so the first deal was not random but was the highest possible hand to encourage users of the program to use it. This would mean the observed outcome was virtually certain.
There is a bug in the programme so that under certain conditions instead of dealing randomly it chooses a card at random and then the subsequent four cards in the same suit. This would have a 1 in 13 chance of producing a Royal Flush.

The two hands that began this paper:

Hand X: A♠, K♠, Q♠, J♠, 10♠

and

Hand Y: 2♣, 3♦, 7♥, 10♠, Q

are equally probable using a random deal with a probability of 1 in 2.5 million. But they are not equally probable under either of the hypotheses above. In fact Hand Y is impossible under both hypotheses and the Hand X has a probability of 1 under the first hypothesis and 1/13 under the second.

This approach explains why in many circumstances, such as dealing playing cards, outcomes that conform to simple patterns suggest a human designer might be involved. It is a matter of human psychology that we are more interested in producing outcomes that conform to simple patterns than outcomes that are complex and conform to no recognisable pattern. So if we consider the very generic hypothesis "someone designed this outcome" then under that hypothesis outcomes that conform to simple patterns are more likely than outcomes that do not conform to such patterns.

It might appear that I am arguing that there are patterns in nature that suggest design. In a sense that is true. Some patterns do suggest human design. But the key point is that this is based on a comparison of possible explanations of the observed results and what we know about humans as designers. Some of the explanations include design (and take into account, among other things, the human desire and ability to conform to patterns) and others do not and each has to be evaluated as an alternative. Dembski and the intelligent design movement generally cannot accept this approach because it requires comparing the designer explanation of living systems with chance explanations and this means saying something about who the designer is, how the designer implements their design, and why the designer is doing it. It is fundamental to the ID thesis that it is possible to detect design simply by looking at the chance hypothesis and the observed outcome with saying anything about the designer.

So far we have established that the use of specifications to reject chance hypothesis has some problems of interpretation and has no justification, while comparing likelihoods seems to account for our intuitions and is justified. Dembski is well aware of the likelihood approach and has tried to refute it by raising a number of objections elsewhere, notably in chapter 33 of his book "The Design Revolution" which is reproduced on his web site (Dembksi 2005b). But there is one objection that he raises which he considers the most damning of all and which he repeats virtually word for word in the more recent paper. He believes that the approach of comparing likelihoods presupposes his own account of specification.

He illustrates his objection with another well worn example in this debate -- the case of the New Jersey election commissioner Nicholas Caputo who is accused of rigging ballot lines. It was Caputo's task to decide which candidate comes first on a ballot paper in an election and he is meant to do this without bias towards one party or another. Dembski does not have the actual data but assumes a hypothetical example where the party of the first candidate on the ballot paper follows this pattern for 41 consecutive elections (where D is democrat and R is republican)

DDDDDDDDDDDDDDDDDDDDDDRDDDDDDDDDDDDDDDDDD

This is clearly conforms to a pattern which is very demanding for the hypothesis that Caputo was equally likely to make a Republican or Democrat first candidate. In fact it conforms to a number of such patterns for 41 elections, for example:

There is only one republican as first candidate.
One party is only represented once.
There are two or less republicans.
There is just one republican and it is between the 15th and 30th election.
Includes 40 or more Democrats.

And so on.

Dembski has decided that the relevant pattern is the last one. (This is interesting in itself as it is a single-tailed test and assumes the hypothesis that Caputo was biased towards Democrats. Another alternative might simply have been that Caputo was biased -- direction unknown -- in which case the pattern should have been "one party is represented at least 40 times"). His argument is that when comparing the likelihoods of two hypotheses (Caputo was biased towards Democrats or Caputo was unbiased) generating this sequence, we would not compare the probability of the two hypotheses generating this specific event but the probability of the two hypotheses generating an event which conforms to the pattern. And we have to use his concept of a specification to know what the pattern is. But this just isn't true. We can justify the choice of pattern simply by saying "this is a set of outcomes which are more probable under the alternative hypothesis (Caputo is biased towards Democrats) than under the hypothesis that Caputo is unbiased". There is no reference to specification or even patterns in this statement.

This is clearer if we consider a different alternative hypothesis. Suppose that instead of suspecting Caputo of favouring one party or another we suspect him of being lazy and simply not changing the order from one election to another -- with the occasional exception. The "random" hypothesis remains the same - he selects the party at random each time. The same outcome:

DDDDDDDDDDDDDDDDDDDDDDRDDDDDDDDDDDDDDDDDD

counts against the random hypothesis but for a different reason -- it has only two changes of party. The string:

DDDDDDDDDDDDDDDDDDDDDDRRRRRRRRRRRRRRRRRRRR

would now count even more heavily against the random hypothesis - whereas it would have been no evidence for Caputo being biased.

So now we have two potential patterns that the outcome matches and could be used against the random hypothesis. How do we decide which one to use? On the basis of the alternative hypothesis that might better explain the outcomes that conform to the pattern.

The comparison of likelihoods approach is so compelling that Dembski himself inadvertently uses it elsewhere in the same chapter of The Design Revolution. When trying to justify the use of specification he writes "If we can spot an independently given pattern.... in some observed outcome and if possible outcomes matching that pattern are, taken jointly, highly improbable ...., then it's more plausible that some end-directed agent or process produced the outcome by purposefully conforming it to the pattern than that it simply by chance ended up conforming to the pattern."

We do come across incredible coincidences (like Hand X) and the rational thing to do is to reject the underlying chance hypothesis (like a random deal). However, this decision is not based on the convoluted and loosely defined concept of specification. It is based on the simple fact there are better explanations.

References

Dembski, W. A, 2005a Specification: The Pattern That Signifies Intelligence. http://www.designinference.com/documents/2005.06.Specification.pdf (accessed 1/6/2006)

Dembski, W. A., 2005b Design by Elimination versus Design by Comparision http://www.designinference.com/documents/2005.09.Fisher_vs_Bayes.pdf (accessed 4/6/2006)

[1] A common fault in the ID debate is to use the word "random" without further definition as though it were obvious what it means. So I shall be more explicit. In this context "random deal" means that as each of the 5 cards that make up the hand is chosen, it is equally likely to be any of the cards in the pack.

[2] Technically speaking Dembski talks about rejecting a chance hypothesis when the observed outcome falls into any parts of the probability density function (pdf) that are below a defined value. I use the word "extreme" as shorthand for this.

[3] Technically speaking he writes about classical hypothesis testing rejecting a hypothesis when the outcome falls into a part of the pdf which are above a defined value.

Discussion

Poker Probabilities , by Igo, Rob [05-Jul-06]

Poker Probabilities , by Frank, Mark [06-Jul-06]

Poker Probabilities , by TalkReason , [06-Jul-06]