In the design stage, one needs to select a test statistic, significance level, desired power, and the sample size required to adequately test the hypothesis.

A simple approach is to choose a statistical test according to the type of data to be collected. We will discuss this in greater detail in the next section. In short, to compare the lifespans of two mouse lines from our example an independent t-test is appropriate if the lifespans of both mouse lines are approximately normally distributed or can be transformed to produce a normal distribution and the sample variances for the two lines are approximately equal.

An investigator always runs the risk of observing significant-looking results by chance alone. The significance level reflects the investigator's tolerance for committing a type I error, which occurs when a true null hypothesis is rejected by chance resulting in a false positive test. That chance is expressed in terms of probability and is conventionally preset at 0.05 or 0.01. In other words, the researcher has sufficient control over other aspects of the study in order to set the significance level (sometimes called alpha (a) level) to 0.05 or 0.01.

It is also possible to accept the null hypothesis when the alternative is true. This is known as a type II error or false negative. The probability of committing a type II error is called the beta (/) level. One minus / is the power of a statistical test, which represents the probability of finding a true difference when the alternative hypothesis is true (i.e., finding a true difference when there actually is one). In general, with all other factors held constant, the smaller a is, the higher the power is; and the larger a is, the lower the power is.

Sometimes the true type I error rate of certain statistical tests may not be equal to the preset a level. When it is smaller than the preset a level, we say that the statistical test is conservative, and, as a result, the type II error rate increases and the power of the test decreases. When the type I error rate is larger than the a level, which may be due to a violation of the test assumptions, we may consider the statistical test to be invalid. Readers can find further discussion of these issues in Statistics for Maximum Lifespan.

3. Sample size calculation and power analysis Sample size calculation is a procedure for calculating the sample size required to achieve a desirable degree of statistical power. The calculation is based upon a selected statistical test, an estimate of the variation in the population, an established significance level, and an expected effect size. Power analysis estimates the power of a study given the sample size used, the selected statistical test, the significance level, the variance of the outcome variables, and the effect size. If the researcher has limited resources or has any other reason to conduct research with a fixed number of subjects, the researcher will need to conduct a power analysis to determine whether the study would be adequately powered. If it is determined that the study would have low power (typically / < 0.80) the researcher may decide to alter the significance level and/or the expected effect size to increase the power.

In practice, we usually calculate sample size instead. When we calculate the sample size, the statistical power is chosen arbitrarily anywhere between 0.8 and 0.95 and the type I error rate is set at 0.05 (or 0.01). Sometimes, deciding on the effect size to be detected is challenging. For example, holding all other factors constant, a test with very good power to detect a 20% difference in mean lifespan between mouse lines A and B may have very low power to detect a 10% difference using the same sample size. A practical guideline is to choose the effect size according to biological or clinical importance. If the researcher thinks a 10% difference in average lifespan is not important or does not care if the study cannot detect the 10% difference, a smaller sample size can be used to detect a larger difference. If a 10% difference is biologically or clinically meaningful, the sample size has to be recalculated according to the 10% effect size.

As previously stated, in order to calculate sample size, a researcher will need to have reasonable variance estimates for the outcome variables of interest. While they may be based on pilot studies or other sources, such estimates can be highly inaccurate and can thus result in underpowered studies. Suppose, for example, that a group of researchers have underestimated the population variance at the outset of their study. Consequently, they would then be apt to underestimate the sample size required to detect the difference for which they are looking. Suppose further that there truly was a difference, yet the results of their underpowered study yielded p-values close to, but still greater than, the significance level. As a result of their insufficient sample, the researchers were not successful in detecting the true difference. Therefore, it may be necessary, in general, to adjust the sample size during and/or at the proposed end of a study using information from the observed sample.

Researchers can recalculate the sample size based on conditional power. This method uses the power of the proposed statistical tests, calculated conditionally on the data from the current sample, to suggest the extension of the study to collect a larger and/or more comprehensive sample. The original data can then be combined with the new data to retest the hypothesis.

Under certain circumstances, more advanced sampling methods can be useful and cost-effective. Some studies require the use of expensive animals, such as nonhuman primates, and investigators may not have the resources to recruit the entire sample at one time. Other studies are longitudinal, and data will be collected two or more times on each subject followed over time. In cases such as these, it is a common practice to do interim data analysis or group sequential power analysis to determine if the study may be terminated early due to obtaining enough evidence either to accept or reject the null hypothesis.

It is important to note that both conditional power analysis and group sequential analysis are complicated methods in which one must control for the inflated type I error rate resulting from the multiple comparisons. Motivated readers are referred to texts by Proschan and Hunsberger (1995); Brannath and Bauer (2004); DeMets and Lan (1994); and Jennison and Turnbull (1999).

Bootstrapping—a resampling with replacement procedure—can be a useful approach to studying the properties (e.g., variance) of estimates and test statistics. In the bootstrap approach, a sample of size m is drawn with replacement from the observed sample of size n drawn originally from the population of interest. This sample may then be combined with the original sample, and the m + n observations may be analyzed together. If this resampling procedure is reiterated a sufficient number of times, one can obtain empirical estimates of the power of proposed tests based on samples of size m + n from the population while controlling for type I errors. For more information and instruction on bootstrapping, the interested reader is referred to Mooney and Duval (1993).

Many formulas, tables, nomograms (for example, Machin et al., 1987; Schoenfeld and Richter, 1982) and software packages are available for power and sample size calculations. For example, when the type I error rate is a and expected power is 1 — p, the sample size needed for each group in order to detect a difference in average lifespan of size, 5, between two groups with equal sample size, n, and equal standard deviation, s, can be calculated as n 2s

where ta/2 and tp are critical values from the t-distribution. Some of these equations are self-explanatory, while some of them are not so straightforward. Applied researchers are encouraged to consult with statisticians when necessary.

Was this article helpful?

Discover The Secrets To Staying Young Forever. Discover How To Hinder The Aging Process On Your Body And In Your Life. Do you feel left out when it comes to trying to look young and keeping up with other people your age? Do you feel as though your body has been run down like an old vehicle on its last legs? Those feelings that you have not only affect you physically, but they can also affect you mentally. Thats not good.

## Post a comment