» Rstudio GeneBio – Genética e Bioinformática

chi-square test

The chi-square test statistic is designed to test the null hypothesis that there is no association between the rows and columns of a contingency table. For example, to determine whether there is an association between a particular SNP variant and phenotype (case/control) might collect data that could be assembled into a 2×2 table. In this case, the two columns could be defined by whether the subject have a disease (case) or not (control), while the rows represent the two variant of an allele SNP. The cells of the table would contain the number of observations or patients as defined by these two variables.

For every SNP, the chi-square test statistic builds a 2×2 contingency table by counting the number of times each possible allele SNP appears in a case or control sample. We check if there is difference between the allele proportion presence on the phenotype variable (case and control).

This statistic is calculated by the sum of observed minus expected count squared and divided by the expected. When the observed number of events deviates significantly from the expected counts, then it is unlikely that the null hypothesis is true, and it is likely that there is a row-column association. Conversely, a small chi-square value indicates that the observed values are similar to the expected values leading us to conclude that the null hypothesis is plausible.

In terms of pvalues, a chi-square probability of .05 or less is interpreted as justification for rejecting the null hypothesis that the row variable is unrelated to the column variable.

2×2 contingency table
Columns represent phenotype, rows genotype

Case Control Total
allele A a b a+b
allele T c d c+d
Total a+c b+d n

HINT: When there is a small number of counts in the table, the use of the chi-square test statistic may not be appropriate. Specifically, it has been recommended that this test not be used if any cell in the table has an expected count of less than one, or if 20 percent of the cells have an expected count that is greater than five. Under this scenario, the Fisher’s exact test is recommended for conducting tests of hypothesis.

Teste de associação de SNP – Phenotype
1. Use of contingency tables
2. Studying association between genotype and phenotype
3. Sample sizes are small

AA <- c(40, 15) AT <- c(25,10) TT <- c(20, 30) SNP <- matrix(c(AA+AT, TT), nrow = 2, dimnames = list(Phenotype = c(“ASD”, “Normal”), Truth = c(“AA/AT”, “TT”))) SNP fisher.test(SNP, alternative = “two.sided”) SNP <- matrix(c(AA, AT+TT), nrow = 2, dimnames = list(Phenotype = c(“ASD”, “Normal”), Truth = c(“AA”, “TT/AT”))) SNP fisher.test(SNP, alternative = “two.sided”) SNP <- matrix(c(AA, AT, TT), nrow = 2, dimnames = list(Phenotype = c(“ASD”, “Normal”), Truth = c(“AA”, “AT”, “TT”))) fisher.test(SNP, alternative = “greater”) SNP

M	T	W	T	F	S	S

				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31