Multiple Testing

Next comes the crucial question regarding which p-values can be considered significant in a GWA study. In contrast to hypothesis-driven candidate gene association studies, GWA studies are hypothesis-free. This means that every SNP is tested without any assumption regarding genetic association with disease and hence, a huge multiple testing correction needs to be applied. For example, with 500,000 SNPs tested for association the standard significance level of 0.05 would already yield 25,000 false-positive results, assuming independence between all SNPs. Furthermore, in addition to the large number of SNPs a multiple testing correction should also be applied if multiple phenotypes rs11153730

rs11153730

p- e.jéïtiw

J 11> 1J ^ilL

EL

118100

118800 119500

Chromosome 6 position (Kb)

h60|

118100

118800 119500

Chromosome 6 position (Kb)

Fig. 29.4 Regional association plot for the SLC35F1/c6orf204/PLN locus on chromosome 6. Shown is the region extending to 500 kb either side of the most associated SNP rs11153730. The SNPs are illustrated on —log10(P) scale as a function of chromosomal position (NCBI build36.3). The sentinel SNP is illustrated in blue. Surrounding SNPs are colored according to their r2 with rs11153730 (red indicates an r2 >0.8, orange an r2 of 0.5-0.8, yellow an r2 of 0.2-0.5, and gray an r2 of less than 0.2). (From Nolte et al, 2009)

or multiple genetic models are tested. However, as there is correlation between SNPs, between phenotypes, and between models, it is difficult to assess the true number of independent tests. Correction techniques for multiple comparisons based on the original Bonferroni criterion are in general too conservative (Manly et al, 2004). New procedures based on the false discovery rate effectively control the proportion of false discoveries without sacrificing the power to discover (Benjamini et al, 2001).

Recently, 5 x 10-8 has emerged as the consensus threshold for declaration of genome-wide significance (International HapMap Consortium, 2005). This threshold maintains a 5% genome-wide type I error rate based on estimations of the number of independent tests for common sequence variation (1 million tests), at least in Caucasians (Dudbridge and Gusnanto, 2008; Pe'er et al, 2008). Stricter thresholds are needed for populations with lower LD. For example, the genome-wide testing burden in Africans was estimated to be 2 million independent tests, which translates into a genome-wide significance threshold of 2.5 x 10-8 (Pe'er et al, 2008).

The downside of setting a stringent threshold for the type 1 error in order to avoid false-positive findings is that the probability of missing a true positive association (type II error or false-negative finding) becomes large, especially in small samples. Therefore a SNP is only considered a true positive result when it is replicated in other samples. In practice, a more lenient significance threshold (e.g., 10-6) is often used for those SNPs taken forward for replication. Alternatively, the top 100 or 1000 SNPs are selected for follow-up and can be genotyped cost-effectively using custom-made chips.

0 0

Post a comment