• Users Online: 3334
  • Home
  • Print this page
  • Email this page
Home About us Editorial board Ahead of print Current issue Search Archives Submit article Instructions Subscribe Contacts Login 


 
 Table of Contents  
REVIEW ARTICLE
Year : 2020  |  Volume : 6  |  Issue : 1  |  Page : 4-6

Types of variables in medical research


Department of Biostatistics, All India Institute of Medical Sciences, New Delhi, India

Date of Submission08-Mar-2020
Date of Acceptance25-Mar-2020
Date of Web Publication17-Apr-2020

Correspondence Address:
Shivam Pandey
Department of Biostatistics, All India Institute of Medical Sciences, New Delhi - 110 029
India
Login to access the Email id

Source of Support: None, Conflict of Interest: None


DOI: 10.4103/jpcs.jpcs_14_20

Rights and Permissions
  Abstract 


Formulation of a research question is the initial and important step in any research work. Depending on the research questions to be answered, the researchers decide the statistical methods to be used for the analysis. Researchers have to be acquainted with a variety of variables involved in their study to choose appropriate diagrams/graphs and summary measures for presentation and to valid statistical tests for the analysis of data. The type of variable also decides the type of statistical analyses to be performed, whether parametric or nonparametric. Parametric methods, such as t-tests, ANOVA, Pearson correlation, and regression, require that the data follow a normal distribution. Frequently used nonparametric methods such as Mann–Whitney or Wilcoxon rank-sum test and rank correlation make no assumptions on the distribution of the data. Failure to pay attention to assumptions may lead to increase in type I or type II error rates.

Keywords: Categorical, continuous, variables


How to cite this article:
Pandey S. Types of variables in medical research. J Pract Cardiovasc Sci 2020;6:4-6

How to cite this URL:
Pandey S. Types of variables in medical research. J Pract Cardiovasc Sci [serial online] 2020 [cited 2020 Aug 4];6:4-6. Available from: http://www.j-pcs.org/text.asp?2020/6/1/4/282800




  Introduction Top


Formulation of a research question is the initial and important step in any research work. Depending on the research questions to be answered, the researchers decide on the statistical methods to be used for the analysis. Researchers have to be acquainted with a variety of variables involved in their study to choose appropriate diagrams/graphs and summary measures for presentation and to valid statistical tests for the analysis of data.

The information collected about a sample of subjects comprises characteristics which vary among subjects. Any characteristic that varies from individual to individual is called a variable.[1] The characteristics such as age, sex, height, weight, body mass index (BMI), blood group, heart rate, and number of teeth are some of the examples of biological variables in research. A basic distinction between these variables is their quantitative or qualitative (categorical) measurements.


  Quantitative Variables Top


Quantitative variables are those characteristics that can be counted or measured numerically. They can be continuous or discrete. Continuous variables can theoretically take infinitely many values in a given range. This means that we can always find an intermediate value between any two values, however close they are. For example, in a given range of 50–60 kg weight, one can write infinitely many values such as 5, 50.1, 50.12, 5.01, and 50.004 kg, depending on the extent of the accuracy desired by the researcher. Height of a person, age, mid-arm circumference, blood pressure, and BMI are some of the examples for continuous variable. Here, the obtained measurements can take any value in a given range.

Discrete variables (discontinuous variable) can take only a specified number of values in a given range. For example, the number of members per family in a given range of 0–6 can be 0, 1, 2, 3, 4, 5, and 6. No more values in this range can be written. Number of visits to the hospital in a year, number of children in a family, number of admitted children in a hospital ward, number of missing teeth, etc., are some of the examples for discrete variables. Discrete variables are usually counts.


  Qualitative Variables Top


Qualitative (categorical) variables are those characteristics that cannot be measured numerically. Usually, for the purpose of data entry, categories are coded by assigning them numerical values. These variables are either nominal (no natural ordering) or ordinal (ordered categories).

Nominal variables allow for only classification or categorization based on some distinctly different characteristics, but we cannot rank order these categories. Some examples of nominal variables are sex, religion, blood group, etc. Numerical values assigned to different categories are useful for the purpose of identification only. When the qualitative variable has only two categories, it is called a binary or dichotomous variable. Nominal variables are summarized through counting (frequency) and expressing proportion of each category (percentage). Ordinal variables allow us to rank order the categories in terms of which the category has more or less of the characteristic. A typical example of an ordinal variable in medicine is the stages of disease. We know that “Stage I” is less severe than “Stage II” of a disease, but we cannot tell the exact difference between the two stages. Socioeconomic status and pain scores are other examples of ordinal variables. Ordinal variables are also summarized through counting (frequency) and expressing proportion of each category (percentage).


  Categorizing a Continuous Variable Top


Quantitative variables are often converted to qualitative ones using “cut-points.” Instead of presenting the mean fasting glucose level of male and female subjects, one may prefer to present the proportion of diabetics in male and female population using a fasting glucose level of 110 mg/dL as the cut-point to categorize the subjects as diabetic/nondiabetic. However, categorizing a continuous variable will lead to loss of information.[2] For example, while categorizing, subjects with fasting glucose level of 108 and 80 mg/dL are treated as normal and classified as nondiabetic. Similarly, subjects with glucose level 115 and 150 mg/dL are classified as diabetic. The difference in the values will not be noticed when we present the number of diabetic and nondiabetic cases.


  Presentation of Data Top


Qualitative variables: Categorical data

Qualitative data may be presented in the form of frequency tables. We count the number of subjects in each category and present the number and percentages in a table. For example., we can summarize the gender distribution in a table showing categories and percentages. If we have data for two categorical variables, these may be summarized in the form of a contingency table showing frequencies and percentages. In the case of ordinal variables, descriptive measures such as frequency and percentage have to be reported when the number of categories is few. In addition, median and interquartile range along with minimum and maximum values are considered appropriate for summarizing ordinal variables. Nominal and ordinal data with a limited number of categories can also be presented in a diagrammatic form, such as bar chart and pie chart. In a bar chart, length of the bars represents the frequency of each category of the variable. A pie chart is essentially a circle divided into segments with the area of each proportional to the observed frequency in each category of the variable. Total area represents the total frequency.

Quantitative variables: Continuous and discrete data

Mean and standard deviation are appropriate summary measures for continuous variables with symmetrical distributions. Median and interquartile range are to be computed to summarize quantitative variables with skewed distributions. Range is an informative summary measure and is normally used as supplement to standard deviation or interquartile range. Discrete variables may be summarized and analyzed either as a continuous variable if the categories are many or as an ordinal variable in case of few categories.

Quantitative data can be represented graphically by means of a histogram. A histogram is useful to decide about the shape of the distribution, whether symmetrical or skewed. Histograms may not be a very good method of identifying the shape of the distribution for small samples. As a rule of thumb, if the mean is smaller than twice the standard deviation, the data are likely to be skewed.[1],[2],[3] Quantitative data can also be displayed as stem and leaf plots, dot plots, and box and whisker plots.[2]


  Analysis of Data Top


Type of variable decides the type of statistical analyses to be performed, whether parametric or nonparametric. Parametric methods, such as t-tests, ANOVA, Pearson correlation, and regression, require that the data follow a normal distribution. Frequently used nonparametric methods such as Mann–Whitney or Wilcoxon rank-sum test, and rank correlation make no assumptions about the distribution of the data; they use the rank order of the measurements rather than the actual measurements.[4] Chi-square test or Fisher's exact test if the cell frequencies are very small is the usual method to compare two categorical variables. Failure to pay attention to assumptions may lead to increase in type I or type II error rates.

Data are analyzed from similar studies, completely differently depending on the type of variable involved. For example, let us say our target is people above 50 years in a certain population and we have measured the BMI in a sample of 30 male and 30 female subjects, and our null hypothesis is “Male and female population have the same BMI.” We would compare the mean BMI in males and females with a two-sample t-test (a parametric test). If the variable BMI is converted to obesity (BMI >29.9), and it is a nominal variable, we would compare the frequency of obese in males and females with a Chi-square test (a nonparametric test). We would find a smaller P value for t-test compared to a Chi-square test. An important message conveyed by this example is statistical tests will have more power for a continuous variable than the corresponding nominal or ordinal variables.[5] In other words, nonparametric tests require a larger sample size than a parametric test to achieve the same power as that of a parametric test.

Detailed discussion of the various tests is out of the scope of this article. Campbell and Swinscow have summarized the tests for various types of variables.[6] For computational procedure and more details about various parametric tests, researchers may refer some standard textbooks.[1],[3],[7],[8] For a good discussion of a number of nonparametric tests, readers may refer Seigel and Castellan and Conover.[9],[10]

Ethics clearance

Since the study does not involve human or animal subjects, therefore ethical clearance in this case will not be required.

Financial support and sponsorship

Nil.

Conflicts of interest

There are no conflicts of interest.



 
  References Top

1.
Daniel WW. Biostatistics: A Foundation for Analysis in the Health Sciences. 6th ed. New York: John Wiley and Sons; 1995.  Back to cited text no. 1
    
2.
Freeman JV, Walters SJ, Campbell MJ, editors. How to Display Data. Oxford: Blackwell; 2008.  Back to cited text no. 2
    
3.
Altman DG, Bland JM. Detecting skewness from summary information. BMJ 1996;313:1200.  Back to cited text no. 3
    
4.
Altman DG, Bland JM. Parametric vs. non-parametric methods for data analysis. BMJ 2009;338:a3167.  Back to cited text no. 4
    
5.
Altman DG, Bland JM. The cost of dichotomizing continuous variables. BMJ 2006;332:1080.  Back to cited text no. 5
    
6.
Campbell MJ, Swinscow TD. Statistics at Square One. 11th ed. Oxford: Wiley-Blackwell; 2009.  Back to cited text no. 6
    
7.
McDonald JH. Handbook of Biological Statistics. Baltimore, MD: Sparky House Publishing; 2009.  Back to cited text no. 7
    
8.
Bland M. An Introduction to Medical Statistics. 3rd ed. UK: Oxford University Press; 2000.  Back to cited text no. 8
    
9.
Siegel S, Castellan NJ. Nonparametric Statistics for the Behavioral Sciences. 2nd ed. New York: McGrawHill; 1988.  Back to cited text no. 9
    
10.
Conover WJ. Practical Nonparametric Statistics. 3rd ed. New York: John Wiley; 1998.  Back to cited text no. 10
    




 

Top
 
 
  Search
 
Similar in PUBMED
   Search Pubmed for
   Search in Google Scholar for
 Related articles
Access Statistics
Email Alert *
Add to My List *
* Registration required (free)

 
  In this article
   Abstract
  Introduction
   Quantitative Var...
   Qualitative Vari...
   Categorizing a C...
  Presentation of Data
  Analysis of Data
   References

 Article Access Statistics
    Viewed1059    
    Printed255    
    Emailed0    
    PDF Downloaded152    
    Comments [Add]    

Recommend this journal


[TAG2]
[TAG3]
[TAG4]