REVIEW ARTICLE
Year
: 2020  |  Volume : 6  |  Issue : 1  |  Page : 4--6

Types of variables in medical research

Shivam Pandey
Department of Biostatistics, All India Institute of Medical Sciences, New Delhi, India

Shivam Pandey
Department of Biostatistics, All India Institute of Medical Sciences, New Delhi - 110 029
India

Abstract

Formulation of a research question is the initial and important step in any research work. Depending on the research questions to be answered, the researchers decide the statistical methods to be used for the analysis. Researchers have to be acquainted with a variety of variables involved in their study to choose appropriate diagrams/graphs and summary measures for presentation and to valid statistical tests for the analysis of data. The type of variable also decides the type of statistical analyses to be performed, whether parametric or nonparametric. Parametric methods, such as t-tests, ANOVA, Pearson correlation, and regression, require that the data follow a normal distribution. Frequently used nonparametric methods such as Mann–Whitney or Wilcoxon rank-sum test and rank correlation make no assumptions on the distribution of the data. Failure to pay attention to assumptions may lead to increase in type I or type II error rates.

 How to cite this article: Pandey S. Types of variables in medical research.J Pract Cardiovasc Sci 2020;6:4-6

 How to cite this URL: Pandey S. Types of variables in medical research. J Pract Cardiovasc Sci [serial online] 2020 [cited 2020 Aug 14 ];6:4-6 Available from: http://www.j-pcs.org/text.asp?2020/6/1/4/282800

Full Text

Introduction

Formulation of a research question is the initial and important step in any research work. Depending on the research questions to be answered, the researchers decide on the statistical methods to be used for the analysis. Researchers have to be acquainted with a variety of variables involved in their study to choose appropriate diagrams/graphs and summary measures for presentation and to valid statistical tests for the analysis of data.

The information collected about a sample of subjects comprises characteristics which vary among subjects. Any characteristic that varies from individual to individual is called a variable.[1] The characteristics such as age, sex, height, weight, body mass index (BMI), blood group, heart rate, and number of teeth are some of the examples of biological variables in research. A basic distinction between these variables is their quantitative or qualitative (categorical) measurements.

Quantitative Variables

Quantitative variables are those characteristics that can be counted or measured numerically. They can be continuous or discrete. Continuous variables can theoretically take infinitely many values in a given range. This means that we can always find an intermediate value between any two values, however close they are. For example, in a given range of 50–60 kg weight, one can write infinitely many values such as 5, 50.1, 50.12, 5.01, and 50.004 kg, depending on the extent of the accuracy desired by the researcher. Height of a person, age, mid-arm circumference, blood pressure, and BMI are some of the examples for continuous variable. Here, the obtained measurements can take any value in a given range.

Discrete variables (discontinuous variable) can take only a specified number of values in a given range. For example, the number of members per family in a given range of 0–6 can be 0, 1, 2, 3, 4, 5, and 6. No more values in this range can be written. Number of visits to the hospital in a year, number of children in a family, number of admitted children in a hospital ward, number of missing teeth, etc., are some of the examples for discrete variables. Discrete variables are usually counts.

Qualitative Variables

Qualitative (categorical) variables are those characteristics that cannot be measured numerically. Usually, for the purpose of data entry, categories are coded by assigning them numerical values. These variables are either nominal (no natural ordering) or ordinal (ordered categories).

Nominal variables allow for only classification or categorization based on some distinctly different characteristics, but we cannot rank order these categories. Some examples of nominal variables are sex, religion, blood group, etc. Numerical values assigned to different categories are useful for the purpose of identification only. When the qualitative variable has only two categories, it is called a binary or dichotomous variable. Nominal variables are summarized through counting (frequency) and expressing proportion of each category (percentage). Ordinal variables allow us to rank order the categories in terms of which the category has more or less of the characteristic. A typical example of an ordinal variable in medicine is the stages of disease. We know that “Stage I” is less severe than “Stage II” of a disease, but we cannot tell the exact difference between the two stages. Socioeconomic status and pain scores are other examples of ordinal variables. Ordinal variables are also summarized through counting (frequency) and expressing proportion of each category (percentage).

Categorizing a Continuous Variable

Quantitative variables are often converted to qualitative ones using “cut-points.” Instead of presenting the mean fasting glucose level of male and female subjects, one may prefer to present the proportion of diabetics in male and female population using a fasting glucose level of 110 mg/dL as the cut-point to categorize the subjects as diabetic/nondiabetic. However, categorizing a continuous variable will lead to loss of information.[2] For example, while categorizing, subjects with fasting glucose level of 108 and 80 mg/dL are treated as normal and classified as nondiabetic. Similarly, subjects with glucose level 115 and 150 mg/dL are classified as diabetic. The difference in the values will not be noticed when we present the number of diabetic and nondiabetic cases.

Presentation of Data

Qualitative variables: Categorical data

Qualitative data may be presented in the form of frequency tables. We count the number of subjects in each category and present the number and percentages in a table. For example., we can summarize the gender distribution in a table showing categories and percentages. If we have data for two categorical variables, these may be summarized in the form of a contingency table showing frequencies and percentages. In the case of ordinal variables, descriptive measures such as frequency and percentage have to be reported when the number of categories is few. In addition, median and interquartile range along with minimum and maximum values are considered appropriate for summarizing ordinal variables. Nominal and ordinal data with a limited number of categories can also be presented in a diagrammatic form, such as bar chart and pie chart. In a bar chart, length of the bars represents the frequency of each category of the variable. A pie chart is essentially a circle divided into segments with the area of each proportional to the observed frequency in each category of the variable. Total area represents the total frequency.

Quantitative variables: Continuous and discrete data

Mean and standard deviation are appropriate summary measures for continuous variables with symmetrical distributions. Median and interquartile range are to be computed to summarize quantitative variables with skewed distributions. Range is an informative summary measure and is normally used as supplement to standard deviation or interquartile range. Discrete variables may be summarized and analyzed either as a continuous variable if the categories are many or as an ordinal variable in case of few categories.

Quantitative data can be represented graphically by means of a histogram. A histogram is useful to decide about the shape of the distribution, whether symmetrical or skewed. Histograms may not be a very good method of identifying the shape of the distribution for small samples. As a rule of thumb, if the mean is smaller than twice the standard deviation, the data are likely to be skewed.[1],[2],[3] Quantitative data can also be displayed as stem and leaf plots, dot plots, and box and whisker plots.[2]

Analysis of Data

Type of variable decides the type of statistical analyses to be performed, whether parametric or nonparametric. Parametric methods, such as t-tests, ANOVA, Pearson correlation, and regression, require that the data follow a normal distribution. Frequently used nonparametric methods such as Mann–Whitney or Wilcoxon rank-sum test, and rank correlation make no assumptions about the distribution of the data; they use the rank order of the measurements rather than the actual measurements.[4] Chi-square test or Fisher's exact test if the cell frequencies are very small is the usual method to compare two categorical variables. Failure to pay attention to assumptions may lead to increase in type I or type II error rates.

Data are analyzed from similar studies, completely differently depending on the type of variable involved. For example, let us say our target is people above 50 years in a certain population and we have measured the BMI in a sample of 30 male and 30 female subjects, and our null hypothesis is “Male and female population have the same BMI.” We would compare the mean BMI in males and females with a two-sample t-test (a parametric test). If the variable BMI is converted to obesity (BMI >29.9), and it is a nominal variable, we would compare the frequency of obese in males and females with a Chi-square test (a nonparametric test). We would find a smaller P value for t-test compared to a Chi-square test. An important message conveyed by this example is statistical tests will have more power for a continuous variable than the corresponding nominal or ordinal variables.[5] In other words, nonparametric tests require a larger sample size than a parametric test to achieve the same power as that of a parametric test.

Detailed discussion of the various tests is out of the scope of this article. Campbell and Swinscow have summarized the tests for various types of variables.[6] For computational procedure and more details about various parametric tests, researchers may refer some standard textbooks.[1],[3],[7],[8] For a good discussion of a number of nonparametric tests, readers may refer Seigel and Castellan and Conover.[9],[10]

Ethics clearance

Since the study does not involve human or animal subjects, therefore ethical clearance in this case will not be required.