STATISTICAL PAGES Year : 2015  Volume : 1  Issue : 1  Page : 6971 Chisquare test and its application in hypothesis testing Rakesh Rana, Richa Singhal Statistical Section, Central Council for Research in Ayurvedic Sciences, Ministry of AYUSH, GOI, New Delhi, India Correspondence Address: In medical research, there are studies which often collect data on categorical variables that can be summarized as a series of counts. These counts are commonly arranged in a tabular format known as a contingency table. The chisquare test statistic can be used to evaluate whether there is an association between the rows and columns in a contingency table. More specifically, this statistic can be used to determine whether there is any difference between the study groups in the proportions of the risk factor of interest. Chisquare test and the logic of hypothesis testing were developed by Karl Pearson. This article describes in detail what is a chisquare test, on which type of data it is used, the assumptions associated with its application, how to manually calculate it and how to make use of an online calculator for calculating the Chisquare statistics and its associated Pvalue.
Assumptions Underlying a Chisquare Test The data are randomly drawn from a populationThe values in the cells are considered adequate when expected counts are not <5 and there are no cells with zero count [4],[5]The sample size is sufficiently large. The application of the Chisquare test to a smaller sample could lead to type II error (i.e. accepting the null hypothesis when it is actually false). There is no expected cutoff for the sample size; however, the minimum sample size varies from 20 to 50The variables under consideration must be mutually exclusive. It means that each variable must only be counted once in a particular category and should not be allowed to appear in other category. In other, words no item shall be counted twice. How to Calculate a Chisquare Statistics? The formula for calculating a Chisquare statistic is: [INLINE:1] Where, O stands for the observed frequency, E stands for the expected frequency. Expected count is subtracted from the observed count to find the difference between the two. Then the square of the difference is calculated to get rid of the negative vales (as the squares of 2 and −2 are, of course, both 4). Then the square of the difference is divided by the expected count to normalize bigger and smaller values (because we don't want to get bigger Chisquare values just because we are working on large data sets). The sigma sign in front of them denotes that we have, to sum up, these values calculated for each cell. As an example, suppose we want to find out that whether there is an association between smoking and lung disease. The null and alternative hypothesis will be: H 0 : There is no association between smoking and lung disease. H 1 : There is an association between smoking and lung disease. The basic step for calculating a Chisquare test is setting up a 2 × 2 contingency table [Table 2].{Table 2} The general formula for calculating the expected counts from observed count for a particular cell is [(corresponding row total * corresponding column total) /Total no. of patients] [Table 3].{Table 3} Before we proceed further, we need to know how many degrees of freedom (df) we have. When a comparison is made between one sample and another, a simple rule is that the df equals (number of columns − 1) × (number of rows − 1) excluding the rows and column containing the total. Hence, in our example df = (2−1) × (2−1) = 1. Hypothetical data for calculating the Chisquare test for our example of testing an association between smoking and lung disease is given in [Table 4]. Chisquare test can be calculated manually by using the formula described above. Refer [Table 5] and [Table 6] for manual calculations. Chisquare value for our example as shown in [Table 6] is 3.42, df = 1. If we want to test our hypothesis at 5% level of significance than our predetermined alpha level of significance is 0.05. Looking into the Chisquare distribution table [Table 7] with 1 degree of freedom and reading along the row we find our value of χ2 (3.42) lies between 2.706 and 3.841. The corresponding probability is between the 0.10 and 0.05 probability levels. That means that the P value is above 0.05 (it is actually 0.065). Since a P value of 0.065 is greater than the conventionally accepted significance level of 0.05 (i.e., P > 0.05) we fail to reject the null hypothesis or in other words we accept our null hypothesis and conclude that there is no association between smoking and lung disease.{Table 4}{Table 5}{Table 6}{Table 7} How to Use a Chisquare Distribution Table to Approximate P Value? Scientists and statisticians use large tables of values to calculate the P value for their experiment. These tables are generally set up with the vertical axis on the left corresponding to df and the horizontal axis on the top corresponding to P value. Use these tables by first finding our df, then reading that row across from the left to the right until we find the first value bigger than our Chisquare value. Look at the corresponding P value at the top of the column. Chisquare distribution tables are available from a variety of sourcesthey can easily be found online or in science and statistics textbooks. Using an Online Chisquare Calculator The Chisquare statistics and its associated P value can be calculated through online calculators also which are easily available on the internet. For userfriendly online calculator, you may visit this uniform resource locator www.socscistatistics.com/tests/chisquare/default2.aspx. Many more online calculators are available on the World Wide Web. The basic step for using an online calculator is to correctly fill in your data into it. Step by step procedure of using an online calculator is described below: Step 1: For our example of finding an association between smoking and lung disease we have to fill in the observed values in the cells of an online calculator [Figure 1]{Figure 1}Step 2: Click on the next button. Another screen will pop up as shown in [Figure 2]{Figure 2}Step 3: Click on the Calculate Chi^2 button. And you are done with your calculation Output of the Chisquare test will be as shown in [Figure 3].{Figure 3} The image above shows the Chisquare value as 3.4177 and its associated P value as 0.0645 which is actually greater than P value of 0.05, hence no significant difference has been observed. To conclude, there is no association between smoking and lung disease. What Does a Chisquare Test Tell and What it Does Not? It may be clearly understood that Chisquare test only tells us the probability of independence of a distribution of data or in simple terms it will only test that whether two variables are associated with each other or not. It will not tell us that how closely they are associated. For instance in the above example, the Chisquare test will only tell us that whether there is any relation between smoking and lung disease. It will not tell us that how likely it is, that smokers are prone to lung disease. However, once we got to know that there is a relation between these two variables, we can explore other methods to calculate the amount of association between them. References


