Statistics, population, sample, parameter, statistic, estimate, expected, hypothesis, test statistic, p-value
Homework problem
Errata

Statistical Concepts

Lecture series by Charles Brenner x5300, visitor in Dept of Genetics

LG26, 9:30am, Thursdays

Prospectus

The series will begin by discussing basic terms and issues, such as: population, parameter, sample, statistic, probability, p-value, hypothesis testing, likelihood ratio, comparing hypotheses, bias.

I will not be so interested in teaching a catalogue of statistical tests, but prefer to discuss questions like:

What should one think or do when p<5%? when p>5%? How should a statistical conclusion be phrased?
Why are measurement errors normally distributed?
How can you recognize a standard deviation on sight?

Beyond these simple ideas, there are several options. One is to discuss probability, with a view to coming to an understanding of (a) probability, and (b) the "probability" or so-called "exact" tests. This in turn forms a foundation for answering questions like:

What is the right test for a given situation? What's wrong with the wrong test?
Another reason to (a) understand probability, is think about likelihood ratios in order to consider
What is the meaning of rejecting a hypothesis? What do we then accept?

Definitions

Thursday, 2 September 1999

Statistics

Another site, funnelweb.utcc.utk.edu (defunct) turned up something that looks more sensible.

"Statistics is [the theory and method of analyzing quantitative data obtained from samples of observations in order to study and compare sources of variation of phenomena, to help make decisions to accept or reject hypothesized relations between phenomena, and to aid in] making [reliable] inferences from empirical observations" (Kerlinger, 1986, p. 175).

Let's condense that to

making inferences about populations from samples

If the mean height of people in the sample is 2m, the mean height of people in the population is close to 2m.

population – a set of objects, of interest. (may be infinite or otherwise unobservable)

Population of people, of haploid cells, of 100-item samples, of measurements of a person

sample – an observable subset of a population.

50 people, 60 haploids, 70 100-person samples, 80 repeated measurements

parameter

Comment: 2N

statistic

Robbins example: An experiment has the possible outcomes E₁, E₂, ... with unknown probabilities p₁, p₂, ... . In n independent trials suppose that E_i occurs x_i times. How can we "estimate" u, the total probability of unobserved outcomes? (The quotation marks appear because u is not a parameter in the usual statistical sense.)
Comment (and homework): What does Robbins' parenthetical statement mean?

estimate – infer a parameter from a sample

Answer – Perform an n+1^st trial. Note the proportion of outcomes (out of n+1) that occurred one time. The proportion (in the population) of outcomes unobserved in the n-sample, is the expected proportion of once-observed outcomes in the n+1-sample.

expected – average (over a specified range)

hypothesis – an assumption about population(s), from which parameters can be inferred. In effect, an assumption about parameters.

Comment: a declarative sentence!

test statistic – a statistic calculated with a view to deciding a hypothesis

p-value – "probability"-value of a test statistic. Probability to have so large (small, extreme) a test statistic if the hypothesis is true. Occasional small p-values are unavoidable.

Hypothesis: The universe is half male, half female.

Sample: 10000 individuals, of whom 5100 are female.

Test statistic: chi² = 4

p-value = 0.04. (Two tailed test)

Comment: if 5200 female, p=0.0001. If 60/100 female, p=0.04

Probability

Thursday, 9 September 1999

Discussion: accept/reject paradigm

Example: DNA forensics analysts are happy if the population is in Hardy-Weinberg equilibrium. A test statistic is calculated on a population sample, and converted to a p-value. If the p-value is small, e.g. < 0.05, that tends to indicate that the population may not be in HWE.

An analyst proudly testified that out of a large number of such population studies, in only 1% was p<0.05. What's wrong with that?

I said that there must be publication bias. He said, no, the lack of low p-values was perhaps due to the samples being rather small.

What's wrong with that?

condition
hypothesis
repeatable experiment

conceptually repeatable experiment

We must remember, that the probability of an event is not a property of the event itself, but a mere name for the degree of ground which we, or someone else, have for expecting it. ... Every event is in itself certain, not probable: if we knew all, we should either know positively that it will happen, or positively that it will not. But its probability to us means the degree of expectation of its occurrence, which we are warranted in entertaining by our present evidence. — J.S. Mill

a probability is a summary of whatever information we may possess

Some experiments

Flip a coin
Chance of rain
Life on mars
Is there a dog?

"Two kinds of probability"