Chi-Square Analysis- Introduction to Statistics

A simple Chi-square calculator for two samples is here (Tutorial)

A Chi-square calculator for 2-6 samples is here (Tutorial)


Some processes in biology are clear-cut and work out nicely. If you're sick with the flu, then a blood test will show that you have the flu virus. Other things are not so clear-cut and your understanding of the data requires a statistical analysis to determine how probable your observations are. As an example, assume that you flip a coin 5 times and get the following results:


Flip 1

Flip 2

Flip 3

Flip 4

Flip 5

Hmmmmm... four heads and one tail. The question you might ask is "Is this a fair coin or is it a trick coin that is weighted to come up mostly heads?". You might decide to cut the coin open to see if there's a weight in the coin that causes it to come up mostly heads or you might want to run a statistical test first to determine if four out of five heads is really that improbable. A simple test you could run is known as a Chi-square test. Before we get to that though, you need a little background in statistics.

Hypothesis: A hypothesis is a statistical claim. In it's simplest form any statistical test will have two hypotheses; a null hypothesis (H0) and an alternative hypothesis (H1). The null hypothesis is the hypothesis of no difference, in this case that a 4:1 heads:tails ratio is no different than a 50:50 chance (the coin is fair)  while the alternative hypotheses states that the observed four:one ration is different enough that it can't be due to chance alone (the coin is not fair). In every statistical test you are always testing the null hypothesis, even if you don't believe it to be true. As an example you might test if the height of adult males and females in your class is equal (the null hypothesis of no difference) versus the obviously true alternative hypothesis that there is a difference in the height of males and females. To show there is a difference in height you must statistically disprove or reject the null hypothesis to accept the alternative hypothesis. A simple statistical test you could use is a Chi-square analysis.

The formula for a Chi-square test is:

This isn't nearly as bad as it looks. The biggest problem is figuring out our expected values. Since H0 expects that there is no difference in the number of heads and tails, the expected value for each is 2.5 (half of five flips). Obviously we can't have 2 1/2 heads or tails in the real world, but statisticians and mathematicians have no problems with their reality. So, plugging the numbers into the formula we get:

The X 2 value of 1.8 is compared to a table to determine if we should accept or reject H0. Here's the table....

df

P = 0.05

P = 0.01

P = 0.001

1

3.84 6.64 10.83

2

5.99 9.21 13.82

3

7.82 11.35 16.27

4

9.49 13.28 18.47

5

11.07 15.09 20.52

To enter the table, you need to know what row to use under "df" (degrees of freedom). The number of independent pieces of information that are used to estimate a parameter is the degrees of freedom. For most situations it is simply N-1 where N is the number of independent scores (or attributes). In this example, the number of attributes is 2 (heads or tails), so the degrees of freedom is 1 (df=N-1). So, you enter the table at df=1 and read across the row. To reject the null hypothesis, your Chi-square value must be more than 3.84, 6.64, or 10.83, with each increasing value representing successively higher significance levels (yes, I know the P values actually get smaller). The significance levels indicate the probability of rejecting a null hypothesis if it is true. Thus, for P=0.05, there is a 5% chance that you'll accept the alternative hypothesis (that there is a difference in your groups) over the true null hypothesis. For this example, our Chi-square of 1.8 is smaller than all the values for one degree of freedom, so we accept the null hypothesis that the 4:1 head:tail result is not inconsistent with a 1:1 ratio.

What happens if we flip the coin more times and get the following results?


Flip 1

Flip 2

Flip 3

Flip 4

Flip 5

Flip 6

Flip 7

Flip 8

Flip 9

Flip 10

Flip 11

Flip 12

Flip 13

Flip 14

Flip 15

Flip 16

Flip 17

Flip 18

Flip 19

Flip 20

Sixteen heads to four tails. That still works out to a four:one ratio, but what happens if we run the Chi square again? In this case, we would expect 10 heads and 10 tails. Run the Chi-square again, this time with the new values:

A quick check with our table shows that the results of the analysis are now significant at P<0.01 and we should reject the null hypothesis and accept the alternative hypothesis (e.g. the coin is NOT fair and the ratio of heads to tails is not 1:1). The reason why this changed is because we increased our sample size (the number of separate runs or experiments. Generally, the larger the sample size, the better your confidence in the analysis.

Let's apply this to the results of a Punnett Square. We'll make this a dihybrid cross between two individuals, both of which are heterozygous for the brown eye color and vestigial wing and have the genotype BrbrVgvg. The brown eye color (br) is recessive to the normal wild-type eye which is reddish in color (wild-type=Br). The small vestigial wing allele (vg) is recessive to the normal wing allele (Vg).  After mating your flies (BrbrVgvg X BrbrVgvg) you return after a few weeks and count the offspring. Here's what you got:

Checking your Punnett Square you discover that you should have the familiar 9:3:3:1 ratio (9 wild: 3 Wild eye-vestigial wing: 3 brown eyed, normal winged: 1 brown-eyed, vestigial wing fly). The Punett Square is shown below.

With a total of 155 flies you would therefore expect 87.19 wild-eyed, normal-winged flies [(9/16)*155], 29.06 wild-eyed, vestigial wing flies [(3/16)*155], 29.06 brown-eyed, normal wing flies [(3/16)*155], and 9.69 double mutant flies [(1/16)*155]. Your null hypothesis is that your counted flies fit a 9:3:3:1 ratio. The Chi-square calculations follow:

With 3 degrees of freedom (four fly types and df=N-1), we consult our table and find that 21.99 exceeds all Chi-square table values for 3 df and that P<.001, so we reject our null hypothesis. A quick eye-balling of the data suggests that we have too many of the brown-eyed, vestigial wing flies and too few of the double mutants. Perhaps this should be looked at in more detail to determine why we got our skewed ratios.


Bush supporters vs. Kerry: Who's more confused and less informed?

Question Bush Kerry exp Chi2 p value
Iraq had WMDs 72 26 49 21.59 <0.001
Duelfer Report concluded Iraq had WMD 56 18 37 19.51 <0.001
Evidence of 9/11 support found 63 30 46.5 11.71 <0.001
Most experts believe al Queda connection 60 30 45 10.00 <0.01
Foreign countries support the war 66 25 45.5 18.47 <0.001
Foreign countries want Bush re-elected 57 33 45 6.40 <0.05
Admin claimed Iraq had WMD 82 84 83 0.02 ns
Admin claimed Iraq had ties with al Qaeda
 
75 74 74.5 0.01 ns
         
Admin supports...          
Comprehensive Test Ban Treaty 69        
Land mine treaty 72        
Kyoto Protocol 51        
International Criminal Court 66