STATISTICAL PAGES Year : 2015  Volume : 1  Issue : 2  Page : 185188 Commonly used ttests in medical research RM Pandey Department of Biostatistics, All India Institute of Medical Sciences, New Delhi, India Correspondence Address: Student's ttest is a method of testing hypotheses about the mean of a small sample drawn from a normally distributed population when the population standard deviation is unknown. In 1908 William Sealy Gosset, an Englishman publishing under the pseudonym Student, developed the ttest. This article discusses the types of T test and shows a simple way of doing a T test.
Introduction To draw some conclusion about a population parameter (true result of any phenomena in the population) using the information contained in a sample, two approaches of statistical inference are used, that is, confidence interval (range of results likely to be obtained, usually, 95% of the times) and hypothesis testing, to find how often the observed finding could be due to chance alone, reported by P value which is the probability of obtaining the result as extreme as observed under null hypothesis. Statistical tests used for hypothesis testing are broadly classified into two groups, that is, parametric tests and nonparametric tests. In parametric tests, some assumption is made about the distribution of population from which the sample is drawn. In all parametric tests, the distribution of quantitative variable in the population is assumed to be normally distributed. As one does not have access to the population values to say normal or nonnormal, assumption of normality is made based on the sample values. Nonparametric statistical methods are also known as distributionfree methods or methods based on ranks where no assumptions are made about the distribution of variable in the population. The family of ttests falls in the category of parametric statistical tests where the mean value(s) is (are) compared against a hypothesized value. In hypothesis testing of any statistic (summary), for example, mean or proportion, the hypothesized value of the statistic is specified while the population variance is not specified, in such a situation, available information is only about variability in the sample. Therefore, to compute the standard error (measure of variability of the statistic of interest which is always in the denominator of the test statistic), it is considered reasonable to use sample standard deviation. William Sealy Gosset, a chemist working for a brewery in Dublin Ireland introduced the tstatistic. As per the company policy, chemists were not allowed to publish their findings, so Gosset published his mathematical work under the pseudonym “Student,” his pen name. The Student's ttest was published in the journal Biometrika in 1908.[1],[2] In medical research, various ttests and Chisquare tests are the two types of statistical tests most commonly used. In any statistical hypothesis testing situation, if the test statistic follows a Student's ttest distribution under null hypothesis, it is a ttest. Most frequently used ttests are: For comparison of mean in single sample; two samples related; two samples unrelated tests; and testing of correlation coefficient and regression coefficient against a hypothesized value which is usually zero. In onesample location test, it is tested whether or not the mean of the population has a value as specified in a null hypothesis; in two independent sample location test, equality of means of two populations is tested; to compare the mean delta (difference between two related samples) against hypothesized value of zero in a null hypothesis, also known as paired ttest or repeatedmeasures ttest; and, to test whether or not the slope of a regression line differs significantly from zero. For a binary variable (such as cure, relapse, hypertension, diabetes, etc.,) which is either yes or no for a subject, if we take 1 for yes and 0 for no and consider this as a score attached to each study subject then the sample proportion (p) and the sample mean would be the same. Therefore, the approach of ttest for mean can be used for proportion as well. The focus here is on describing a situation where a particular ttest would be used. This would be divided into ttests used for testing: (a) Mean/proportion in one sample, (b) mean/proportion in two unrelated samples, (c) mean/proportion in two related samples, (d) correlation coefficient, and (e) regression coefficient. The process of hypothesis testing is same for any statistical test: Formulation of null and alternate hypothesis; identification and computation of test statistics based on sample values; deciding of alpha level, onetailed or twotailed test; rejection or acceptance of null hypothesis by comparing the computed test statistic with the theoretical value of “t” from the tdistribution table corresponding to given degrees of freedom. In hypothesis testing, P value is reported as P < 0.05. However, in significance testing, the exact P value is reported so that the reader is in a better position to judge the level of statistical significance. ttest for one sample: For example, in a random sample of 30 hypertensive males, the observed mean body mass index (BMI) is 27.0 kg/m 2 and the standard deviation is 4.0. Also, suppose it is known that the mean BMI in nonhypertensive males is 25 kg/m 2. If the question is to know whether or not these 30 observations could have come from a population with a mean of 25 kg/m 2. To determine this, one sample ttest is used with the null hypothesis H0: Mean = 25, against alternate hypothesis of H1: Mean ≠ 25. Since the standard deviation of the hypothesized population is not known, therefore, ttest would be appropriate; otherwise, Ztest would have been usedttest for two related samples: Two samples can be regarded as related in a pre and postdesign (selfpairing) or in two groups where the subjects have been matched on a third factor a known confounder (artificial pairing). In a pre and post–design, each subject is used as his or her own control. For example, an investigator wants to assess effect of an intervention in reducing systolic blood pressure (SBP) in a pre and postdesign. Here, for each patient, there would be two observations of SBP, that is, before and after. Here instead of individual observations, difference between pairs of observations would be of interest and the problem reduces to onesample situation where the null hypothesis would be to test the mean difference in SBP equal to zero against the alternate hypothesis of mean SBP being not equal to zero. The underlying assumption for using paired ttest is that under the null hypothesis the population of difference in normally distributed and this can be judged using the sample values. Using the mean difference and the standard error of the mean difference, 95% confidence interval can be computed. The other situation of the two sample being related is the two group matched design. For example, in a case–control study to assess association between smoking and hypertension, both hypertensive and nonhypertensive are matched on some third factor, say obesity, in a pairwise manner. Same approach of paired analysis would be used. In this situation, cases and controls are different subjects. However, they are related by the factorttest for two independent samples: To test the null hypothesis that the means of two populations are equal; Student's ttest is used provided the variances of the two populations are equal and the two samples are assumed to be random sample. When this assumption of equality of variance is not fulfilled, the form of the test used is a modified ttest. These tests are also known as twosample independent ttests with equal variance or unequal variance, respectively. The only difference in the two statistical tests lies in the denominator, that is, in determining the pooled variance. Prior to choosing ttest for equal or unequal variance, very often a test of variance is carried out to compare the two variances. It is recommended that this should be avoided.[3] Using a modified ttest even in a situation when the variances are equal, has high power, therefore, to compare the means in the two unrelated groups, using a modified ttest is sufficient.[4] When there are more than two groups, use of multiple ttest (for each pair of groups) is incorrect because it may give falsepositive result, hence, in such situations, oneway analysis of variance (ANOVA), followed by correction in P value for multiple comparisons (posthoc ANOVA), if required, is used to test the equality of more than two means as the null hypothesis, ensuring that the total P value of all the pairwise does not exceed 0.05ttest for correlation coefficient: To quantify the strength of relationship between two quantitative variables, correlation coefficient is used. When both the variables follow normal distribution, Pearson's correlation coefficient is computed; and when one or both of the variables are nonnormal or ordinal, Spearman's rank correlation coefficient (based on ranks) are used. For both these measures, in the case of no linear correlation, null value is zero and under null hypothesis, the test statistic follows tdistribution and therefore, ttest is used to find out whether or not the Pearson's/Spearman's rank correlation coefficient is significantly different from zeroRegression coefficient: Regression methods are used to model a relationship between a factor and its potential predictors. Type of regression method to be used depends on the type of dependent/outcome/effect variable. Three most commonly used regression methods are multiple linear regression, multiple logistic regression, and Cox regression. The form of the dependent variable in these three methods is quantitative, categorical, and time to an event, respectively. A multiple linear regression would be of the form Y = a + b1X1 + b2X2 +..., where Y is the outcome and X's are the potential covariates. In logistic and Cox regression, the equation is nonlinear and using transformation the equation is converted into linear equation because it is easy to obtain unknowns in the linear equation using sample observations. The computed values of a and b vary from sample to sample. Therefore, to test the null hypothesis that there is no relationship between X and Y, ttest, which is the coefficient divided by its standard error, is used to determine the P value. This is also commonly referred to as Wald ttest and using the numerator and denominator of the Wald tstatistic, 95% confidence interval is computed as coefficient ± 1.96 (standard error of the coefficient). The above is an illustration of the most common situations where ttest is used. With availability of software, computation is not the issue anymore. Any software where basic statistical methods are provided will have these tests. All one needs to do is to identify the ttest to be used in a given situation, arrange the data in the manner required by the particular software, and use mouse to perform the test and report the following: Number of observations, summary statistic, P value, and the 95% confidence interval of summary statistic of interest. Using an Online Calculator to Compute T Statistics In addition to the statistical software, you can also use online calculators for calculating the tstatistics, P values, 95% confidence interval, etc., Various online calculators are available over the World Wide Web. However, for explaining how to use these calculators, a brief description is given below. A link to one of the online calculator available over the internet is http://www.graphpad.com/quickcalcs/. Step 1: The first screen that will appear by typing this URL in address bar will be somewhat as shown in [Figure 1].Step 2: Check on the continuous data option as shown in [Figure 1] and press continueStep 3: On pressing the continue tab, you will be guided to another screen as shown in [Figure 2].Step 4: For calculating the onesample tstatistic, click on the onesample ttest. Compare observed and expected means option as shown in [Figure 2] and press continue. For comparing the two means as usually done in the paired ttest for related samples and twosample independent ttest, click on the ttest to compare two means option.Step 5: After pressing the continue tab, you will be guided to another screen as shown in [Figure 3]. Choose the data entry format, like for the BMI and hypertensive males' example given for the onesample ttest, we have n, mean, and standard deviation of the sample that has to be compared with the hypothetical mean value of 25 kg/m 2. Enter the values in the calculator and set the hypothetical value to 25 and then press the calculate now tab. Refer to [Figure 3] for detailsStep 6: On pressing the calculate now tab, you will be guided to next screen as shown in [Figure 4], which will give you the results of your onesample ttest. It can be seen from the results given in [Figure 4] that the P value for our onesample ttest is 0.0104. 95% confidence interval is 0.51–3.49 and onesample tstatistics is 2.7386.{Figure 1}{Figure 2}{Figure 3}{Figure 4} Similarly online ttest calculators can be used to calculate the paired ttest (ttest for two related samples) and ttest for two independent samples. You just need to look that in what format you are having the data and a basic knowledge of in which condition which test has to be applied and what is the correct form for entering the data in the calculator. Financial support and sponsorship Nil. Conflicts of interest There are no conflicts of interest. References


