(Null) hypothesis = "The tested chemical has no effect."

Test statistic = # of mutated colonies on several Petri dishes

The study is phony because **each laboratory
assumes** (perhaps because they are told) that they, and only
they, are testing the new mutagen; all **the other laboratories
are merely controls**, testing water. In reality, every
laboratory is testing inert, healthy, water.

Nonetheless, there is a certain rate of mutations even among
control colonies, and it has a random element, so some labs will
report more mutations than others. At the end of the experiment, put
yourself in the shoes of laboratory L_{1} and ask what is the
*p*-value for the null hypothesis.

Lab L_{1} came up with some test statistic value – say 7942.

*P*-value means –
What is the chance to observe such a high test statistic, if the null
hypothesis is really true?

Restated: What is the chance to observe 7942 mutant colonies if the chemical has no effect?

Since water has no effect, and "chance" means "probability" that is the same as asking:

What is the probability to observe 7942 mutant colonies when the bacteria are treated with water?

That is: What is the probability to observe 7942 mutant colonies,
*given exactly the experiment that was performed 1000 times, by
laboratories L _{2}, L_{3}, ...,
L_{1001}*? In other words, from
L

Since "probability" means long-run frequency given repeated
trials, and each laboratory's work can be regarded as a trial, that's
essentially the same as asking: What % of the 1000 test statistics
reported are greater than or equal to 7942?^{(1)}

Imagine that we arrange all the test statistics in order of size, and assign ordinal ranks from largest to smallest:

Lab: | L_{211} |
L_{592} |
L_{88} |
... | L_{1} |
... | L_{916} |
L_{147} |
L_{18} |
L_{666} |

Test stat: | 7 | 120 | 249 | ... | 7942 | ... | 12122 | 14229 | 21000 | 92929 |

Ordinal position: | 1000 | 999 | 998 | ... | 108 | ... | 3 | 2 | 1 | 0 |

p-value: |
p=1 |
.999 | .998 | ... | .108 | ... | .003 | .002 | .001 | p=0 |

In this way we get an empirical estimate *p*=0.108 as the
*p*-value for L_{1}'s score. It just means that
L_{1}'s score lies at the 10.8%ile mark (counting from
0=largest), among a large set of "control" scores (scores obtained or
expected assuming the null hypothesis).

**But what goes for L _{1} goes
for every other lab as well.** Remember, they all used
water. Each of those labs ends up with a

Five percent of the labs occupy the 5% extreme right end of the
picture, and all of these labs therefore necessarily have a
*p*-value≤5%. So, assuming the null hypothesis, a
*p*-value≤5% occurs 5% of the time – which is what was to be
shown. (If the null hypothesis is false, then small *p*-values
occur even more often.)

Of course, what is true for 5% is equally true for any other
number. When the null hypothesis is true, the % of labs that will
report a *p*-value *x* is exactly *x*, for any
probability *x. *Or in symbols,

1. To avoid ascertainment bias, we shouldn't count the lab itself as one of the labs with a "greater or equal test statistic."

*
Go to home page of Charles H. Brenner*