|Year : 2015 | Volume
| Issue : 2 | Page : 185-188
Commonly used t-tests in medical research
Department of Biostatistics, All India Institute of Medical Sciences, New Delhi, India
|Date of Web Publication||30-Sep-2015|
Dr. R M Pandey
Department of Biostatistics, All India Institute of Medical Sciences, New Delhi
Source of Support: None, Conflict of Interest: None
Student's t-test is a method of testing hypotheses about the mean of a small sample drawn from a normally distributed population when the population standard deviation is unknown. In 1908 William Sealy Gosset, an Englishman publishing under the pseudonym Student, developed the t-test. This article discusses the types of T test and shows a simple way of doing a T test.
Keywords: Student's T test, method, William Gosset
|How to cite this article:|
Pandey R M. Commonly used t-tests in medical research. J Pract Cardiovasc Sci 2015;1:185-8
| Introduction|| |
To draw some conclusion about a population parameter (true result of any phenomena in the population) using the information contained in a sample, two approaches of statistical inference are used, that is, confidence interval (range of results likely to be obtained, usually, 95% of the times) and hypothesis testing, to find how often the observed finding could be due to chance alone, reported by P value which is the probability of obtaining the result as extreme as observed under null hypothesis. Statistical tests used for hypothesis testing are broadly classified into two groups, that is, parametric tests and nonparametric tests. In parametric tests, some assumption is made about the distribution of population from which the sample is drawn. In all parametric tests, the distribution of quantitative variable in the population is assumed to be normally distributed. As one does not have access to the population values to say normal or nonnormal, assumption of normality is made based on the sample values. Nonparametric statistical methods are also known as distribution-free methods or methods based on ranks where no assumptions are made about the distribution of variable in the population.
The family of t-tests falls in the category of parametric statistical tests where the mean value(s) is (are) compared against a hypothesized value. In hypothesis testing of any statistic (summary), for example, mean or proportion, the hypothesized value of the statistic is specified while the population variance is not specified, in such a situation, available information is only about variability in the sample. Therefore, to compute the standard error (measure of variability of the statistic of interest which is always in the denominator of the test statistic), it is considered reasonable to use sample standard deviation. William Sealy Gosset, a chemist working for a brewery in Dublin Ireland introduced the t-statistic. As per the company policy, chemists were not allowed to publish their findings, so Gosset published his mathematical work under the pseudonym “Student,” his pen name. The Student's t-test was published in the journal Biometrika in 1908.,
In medical research, various t-tests and Chi-square tests are the two types of statistical tests most commonly used. In any statistical hypothesis testing situation, if the test statistic follows a Student's t-test distribution under null hypothesis, it is a t-test. Most frequently used t-tests are: For comparison of mean in single sample; two samples related; two samples unrelated tests; and testing of correlation coefficient and regression coefficient against a hypothesized value which is usually zero. In one-sample location test, it is tested whether or not the mean of the population has a value as specified in a null hypothesis; in two independent sample location test, equality of means of two populations is tested; to compare the mean delta (difference between two related samples) against hypothesized value of zero in a null hypothesis, also known as paired t-test or repeated-measures t-test; and, to test whether or not the slope of a regression line differs significantly from zero. For a binary variable (such as cure, relapse, hypertension, diabetes, etc.,) which is either yes or no for a subject, if we take 1 for yes and 0 for no and consider this as a score attached to each study subject then the sample proportion (p) and the sample mean would be the same. Therefore, the approach of t-test for mean can be used for proportion as well.
The focus here is on describing a situation where a particular t-test would be used. This would be divided into t-tests used for testing: (a) Mean/proportion in one sample, (b) mean/proportion in two unrelated samples, (c) mean/proportion in two related samples, (d) correlation coefficient, and (e) regression coefficient. The process of hypothesis testing is same for any statistical test: Formulation of null and alternate hypothesis; identification and computation of test statistics based on sample values; deciding of alpha level, one-tailed or two-tailed test; rejection or acceptance of null hypothesis by comparing the computed test statistic with the theoretical value of “t” from the t-distribution table corresponding to given degrees of freedom. In hypothesis testing, P value is reported as P < 0.05. However, in significance testing, the exact P value is reported so that the reader is in a better position to judge the level of statistical significance.
- t-test for one sample: For example, in a random sample of 30 hypertensive males, the observed mean body mass index (BMI) is 27.0 kg/m 2 and the standard deviation is 4.0. Also, suppose it is known that the mean BMI in nonhypertensive males is 25 kg/m 2. If the question is to know whether or not these 30 observations could have come from a population with a mean of 25 kg/m 2. To determine this, one sample t-test is used with the null hypothesis H0: Mean = 25, against alternate hypothesis of H1: Mean ≠ 25. Since the standard deviation of the hypothesized population is not known, therefore, t-test would be appropriate; otherwise, Z-test would have been used
- t-test for two related samples: Two samples can be regarded as related in a pre- and post-design (self-pairing) or in two groups where the subjects have been matched on a third factor a known confounder (artificial pairing). In a pre- and post–design, each subject is used as his or her own control. For example, an investigator wants to assess effect of an intervention in reducing systolic blood pressure (SBP) in a pre- and post-design. Here, for each patient, there would be two observations of SBP, that is, before and after. Here instead of individual observations, difference between pairs of observations would be of interest and the problem reduces to one-sample situation where the null hypothesis would be to test the mean difference in SBP equal to zero against the alternate hypothesis of mean SBP being not equal to zero. The underlying assumption for using paired t-test is that under the null hypothesis the population of difference in normally distributed and this can be judged using the sample values. Using the mean difference and the standard error of the mean difference, 95% confidence interval can be computed. The other situation of the two sample being related is the two group matched design. For example, in a case–control study to assess association between smoking and hypertension, both hypertensive and nonhypertensive are matched on some third factor, say obesity, in a pair-wise manner. Same approach of paired analysis would be used. In this situation, cases and controls are different subjects. However, they are related by the factor
- t-test for two independent samples: To test the null hypothesis that the means of two populations are equal; Student's t-test is used provided the variances of the two populations are equal and the two samples are assumed to be random sample. When this assumption of equality of variance is not fulfilled, the form of the test used is a modified t-test. These tests are also known as two-sample independent t-tests with equal variance or unequal variance, respectively. The only difference in the two statistical tests lies in the denominator, that is, in determining the pooled variance. Prior to choosing t-test for equal or unequal variance, very often a test of variance is carried out to compare the two variances. It is recommended that this should be avoided. Using a modified t-test even in a situation when the variances are equal, has high power, therefore, to compare the means in the two unrelated groups, using a modified t-test is sufficient. When there are more than two groups, use of multiple t-test (for each pair of groups) is incorrect because it may give false-positive result, hence, in such situations, one-way analysis of variance (ANOVA), followed by correction in P value for multiple comparisons (post-hoc ANOVA), if required, is used to test the equality of more than two means as the null hypothesis, ensuring that the total P value of all the pair-wise does not exceed 0.05
- t-test for correlation coefficient: To quantify the strength of relationship between two quantitative variables, correlation coefficient is used. When both the variables follow normal distribution, Pearson's correlation coefficient is computed; and when one or both of the variables are nonnormal or ordinal, Spearman's rank correlation coefficient (based on ranks) are used. For both these measures, in the case of no linear correlation, null value is zero and under null hypothesis, the test statistic follows t-distribution and therefore, t-test is used to find out whether or not the Pearson's/Spearman's rank correlation coefficient is significantly different from zero
- Regression coefficient: Regression methods are used to model a relationship between a factor and its potential predictors. Type of regression method to be used depends on the type of dependent/outcome/effect variable. Three most commonly used regression methods are multiple linear regression, multiple logistic regression, and Cox regression. The form of the dependent variable in these three methods is quantitative, categorical, and time to an event, respectively. A multiple linear regression would be of the form Y = a + b1X1 + b2X2 +..., where Y is the outcome and X's are the potential covariates. In logistic and Cox regression, the equation is nonlinear and using transformation the equation is converted into linear equation because it is easy to obtain unknowns in the linear equation using sample observations. The computed values of a and b vary from sample to sample. Therefore, to test the null hypothesis that there is no relationship between X and Y, t-test, which is the coefficient divided by its standard error, is used to determine the P value. This is also commonly referred to as Wald t-test and using the numerator and denominator of the Wald t-statistic, 95% confidence interval is computed as coefficient ± 1.96 (standard error of the coefficient).
The above is an illustration of the most common situations where t-test is used. With availability of software, computation is not the issue anymore. Any software where basic statistical methods are provided will have these tests. All one needs to do is to identify the t-test to be used in a given situation, arrange the data in the manner required by the particular software, and use mouse to perform the test and report the following: Number of observations, summary statistic, P value, and the 95% confidence interval of summary statistic of interest.
| Using an Online Calculator to Compute T -Statistics|| |
In addition to the statistical software, you can also use online calculators for calculating the t-statistics, P values, 95% confidence interval, etc., Various online calculators are available over the World Wide Web. However, for explaining how to use these calculators, a brief description is given below. A link to one of the online calculator available over the internet is http://www.graphpad.com/quickcalcs/.
- Step 1: The first screen that will appear by typing this URL in address bar will be somewhat as shown in [Figure 1].
- Step 2: Check on the continuous data option as shown in [Figure 1] and press continue
- Step 3: On pressing the continue tab, you will be guided to another screen as shown in [Figure 2].
- Step 4: For calculating the one-sample t-statistic, click on the one-sample t-test. Compare observed and expected means option as shown in [Figure 2] and press continue. For comparing the two means as usually done in the paired t-test for related samples and two-sample independent t-test, click on the t-test to compare two means option.
- Step 5: After pressing the continue tab, you will be guided to another screen as shown in [Figure 3]. Choose the data entry format, like for the BMI and hypertensive males' example given for the one-sample t-test, we have n, mean, and standard deviation of the sample that has to be compared with the hypothetical mean value of 25 kg/m 2. Enter the values in the calculator and set the hypothetical value to 25 and then press the calculate now tab. Refer to [Figure 3] for details
- Step 6: On pressing the calculate now tab, you will be guided to next screen as shown in [Figure 4], which will give you the results of your one-sample t-test. It can be seen from the results given in [Figure 4] that the P value for our one-sample t-test is 0.0104. 95% confidence interval is 0.51–3.49 and one-sample t-statistics is 2.7386.
Similarly online t-test calculators can be used to calculate the paired t-test (t-test for two related samples) and t-test for two independent samples. You just need to look that in what format you are having the data and a basic knowledge of in which condition which test has to be applied and what is the correct form for entering the data in the calculator.
Financial support and sponsorship
Conflicts of interest
There are no conflicts of interest.
| References|| |
Mankiewicz R. The Story of Mathematics (Paperback ed.). Princeton, NJ: Princeton University Press; 2004. p. 158.
Fisher Box J. Guinness, Gosset, Fisher, and small samples. Stat Sci 1987;2:45-52.
Markowski CA, Markowski EP. Conditions for the effectiveness of a preliminary test of variance. Am Stat 1990;44:322-6.
Moser BK, Stevens GR. Homogeneity of variance in the two sample t
-test. Am Stat 1992;46:19-21.
[Figure 1], [Figure 2], [Figure 3], [Figure 4]