Continuous Quantitative Traits

For individual diallelic polymorphisms such as SNPs, genotype is the unit of analysis and serves as the independent variable. Covariates and additional predictors of the dependent variable may also be incorporated. Within a regression framework, the most general model for genetic effects at a single locus includes a term for linear effects of a given allele and an additional parameter for the deviation from this linear effect, i.e., a dominance term (Cordell and Clayton, 2005). The general regression framework for a diallelic locus is given by

Y = a + PaA + PdD + PcC + e where Y is a quantitative trait, a is the baseline mean of Y, A and D are dummy variables reflecting coding for linear (additive) and nonlinear (dominance) effects of the underlying genotype at a single locus, C represents other covariates such as age or sex, and e is a residual error term assumed to be normally distributed.

For the linear term, genotypes (e.g., GG, GC, and CC) are assumed to function in an additive manner and the corresponding regression variable A is coded as 0, 1, and 2 reflecting dose of the C allele. The associated beta estimate is the additive effect of the C allele. This linear model alone predicts that the mean of the heterozygotes (GC) will be located at the midpoint between the means of the two types of homozygotes (GG, CC); however, in practice, this may not be the case. Deviation of the mean of the heterozygotes from the midpoint between the means of the homozygotes suggests that one allele is dominant over the other. To quantify this effect, an additional regression term is necessary. Specifically, the regression variable D for the dominance effect is coded 0, 1, and 0 with the associated beta estimate reflecting deviation of the heterozygotes from the midpoint of the two homozygous groups. One degree of freedom (df) is required to test each of the linear and nonlinear terms. In addition to the additive model (D = 0), other specific disease models with only 1 df are the (completely) dominant model (i.e., the effect is the same for GC and CC: D = A) and the recessive model (i.e., only CC is at increased risk: D = -A). In practice, when no hypothesis exists on the nature of the trait model, the 2 df model is often tested first and subsequently, the most appropriate and more powerful 1 df model is applied. Simply testing all 1 df models is inefficient because of the multiple testing penalty (see Section 4.4).

In genetic association studies of quantitative traits, assuming a simple additive model (i.e., no dominance effect), effect size of a locus is a function of mean trait differences between homozygotes (e.g., the CC versus GG genotype) and allele frequency (Blangero, 2004). It is usually described by the coefficient of determination R2, in a regression analysis that is the percentage of variance explained by the genetic variant. An R2 value >5% for a single gene is considered a large effect in genetic epidemiology, for complex diseases R2 <2% are expected for each contributing gene.

0 0

Post a comment