CURRICULUM IN CARDIOLOGY - STATISTICS
Year
: 2019  |  Volume : 5  |  Issue : 2  |  Page : 108--110

How to read a forest plot

Ushmita Seth
Technology Consultant, B Tech ( Delhi Technological University), Delhi, India

Ms. Ushmita Seth
Technology Consultant, B Tech ( Delhi Technological University), Delhi
India

Abstract

When the data-based practice began to accumulate, forest plots were introduced to realize the collective power of the statistical data. It is a graphical representation of a meta-analysis, also known as blobbogram. It allows you to view and analyze the resulting sample individual statistics from multiple similar studies all in one place, along with summary statistics at the bottom. The plot includes the point value of the sample statistic as well as its confidence interval (usually taken as 95%).

 How to cite this article: Seth U. How to read a forest plot.J Pract Cardiovasc Sci 2019;5:108-110

 How to cite this URL: Seth U. How to read a forest plot. J Pract Cardiovasc Sci [serial online] 2019 [cited 2023 Feb 2 ];5:108-110 Available from: https://www.j-pcs.org/text.asp?2019/5/2/108/264631

Full Text

A forest plot is a graphical display of one common statistical conclusion from a number of studies directing the same problem. This tackles the complexities of collective inferences of various experiments which lead to a powerful conclusion.

In 1990, oncologist Richard Peto joked that the plot was named after fellow breast cancer researcher Pat Forrest, resulting in the frequent misnaming of the plot as Forest plot. However, it was named as the graph had a resemblance to an image of a forest when placed at a right angle [Figure 1]. As the plot consists of lines and large dots, somewhere along each line, the line represents a tree and the dot corresponds to the leaf cover.[1]{Figure 1}

Let us understand the different branches of a forest plot with the given example [Table 1].{Table 1}

Here is a common representation of the raw data[2] for the plot. The first column signifies the name of the study. The second and third columns describe the experimental results for treatment and control groups, respectively. “n” stands for the number of patients who had the outcome, and “N” stands for the total number of people in the group.

The third column generally indicates the point estimate of the common statistic that is being used to compare all the studies. It could be a relative statistic, such as odds ratio (OR) or relative risk (RR), or it could be an absolute statistic such as standardized mean difference or absolute risk reduction. The fourth and fifth columns represent the upper and lower bounds of the confidence interval (CI), respectively.

The pooling of diverse statistical analysis is done by two methods either using fixed-effects model or random-effects model.[3] It has been recommended to use the random-effects pooling model in clinical psychology and the health sciences.[4] The fixed-effects model assumes that all studies are conducted on a single homogeneous population. While pooling the effect sizes, a weighted average of a sample statistic is conducted with the study with smaller variance (i.e., greater precision) given a larger weight.

However, in practice, all studies can almost never be from the same population, and therefore, alternatively, we can do it using the random-effects model. Here, we assume that studies are conducted not only on one single population but also on a “diverse” population. We, therefore, assume that there is not only one true effect size but also a distribution of true effect sizes. We, therefore, want to estimate the mean of this distribution of true effect sizes.

θk = θF+ϵk + ζk

θk = Observed effect size of an individual study k

θF = True effect size of the population

ϵk = Sampling error

ζk = Second type of error as even the true effect size θF is also a part of distribution of true effect sizes (of the universe of populations)

To take ζk into account, we have to estimate the variance of the distribution of true effect sizes, which is denoted by τ2, or tau2. There are several estimators for τ2.

As in fixed-effects model, we require a weight to be assigned to each study which would decide its influence on the overall meta-analysis. The choice of estimator defines the final calculation of the variance and, therefore, leads to different pooled sized estimates and CIs. An article by Veroniki et al.[5] provides a summary of various estimators and their biases.

Let us now draw the forest plot corresponding to the above data [Figure 1].

First, we look at the two axes. The X-axis is the scale for the statistics being displayed (OR in our case). The vertical line is not a Y-axis as such; it is the line of “null effect” for the statistic which has been used in our case – the value of the point statistic which signifies no difference between treatment and control groups. It would be placed at 1 for a relative statistic and at 0 for an absolute statistic.

Next, the results of each study are placed one below the other on the plot. For each study, the location of square with respect to X-axis marks its point estimate, the size of the square marks the sample size, and the length of the horizontal line on which the square lies represents the CI for the point estimate. If at any point, the horizontal line crosses the line of null effect, it basically means that the point of null effect lies within your CI and could even be the true value. Therefore, the study is not statistically significant. Forest plot indicates the estimated effects of CIs for individual study and also overall estimated effects of CIs.

The diamond at the bottom represents the summary statistic and CI based on a meta-analysis. The center of the diamond (or if you draw a vertical line joining its vertical points) represents the point estimate. The horizontal points represent the CIs. As the diamond is a culmination of all the individual studies, the CI would be the smallest (CIs are inversely proportional to sample size, as larger sample size means smaller standard error and vice versa).

The final point about analyzing a forest plot is its “heterogeneity.” Heterogeneity arises due to the bias creeping into the final estimate as the individual studies have been conducted using different methods across different populations. Therefore, an additional commonly used metric called “I2” or I-squared[6] is calculated at the end of the plot. If I2 is <50%, then the individual studies fall within the acceptable range of inconsistency. If it is >50%, then they are too inconsistent to be used together for the meta-analysis.

Conclusion

A forest plot is a graphical display of results from a number of studies addressing the same question. It is called a forest plot [Figure 2] because it represents a forest of lines. It was developed as a means of graphically representing a meta-analysis. They are commonly presented with two columns. The left-hand column lists the names of the studies. The right-hand column is a plot of the measure of effect (e.g., RR) for each of these studies, represented by a square, incorporating CIs represented by horizontal lines. The overall measure of effect is represented as a dashed vertical line. This is plotted as a diamond, the lateral points of which indicate CIs for this estimate. A vertical line representing no effect is also plotted, and if the points of the diamond overlap the line of no effect, the overall result cannot be said to differ from no effect at the given level of confidence.{Figure 2}