Clinical heterogeneity | Due to variability in participants, interventions, and outcomes |
Methodological heterogeneity | Due to variability in study design, methods of assessment, risk of bias |
Statistical heterogeneity | Variation in effect estimates more than would be expected due to chance, a consequence of clinical and methodological diversity |
Statistical test, analysis | Explanation | Limitations/explanation |
Cochran’s Q test | Tests the null hypothesis of homogeneity. Q is the (chi-square statistic) test reported at bottom of the forest plot. It assesses if the observed differences are likely due to chance. A low P value, a large chi-square relative to df provides statistical evidence of heterogeneity. | Low power (failure to reject null hypothesis) when the number of studies is small. P value of .10 often used as threshold of statistical significance. May detect unimportant heterogeneity with large number of studies. |
Tau2
| Absolute measure of between-study heterogeneity. | Higher values suggest greater between-study heterogeneity. Imprecise when the number of studies is low. |
I2
statistic | Derived from Cochran’s Q. Relative measure of heterogeneity, proportion of variability in effect size estimates that is real (variation beyond chance) compared with that occurring by chance. | Imprecise when the number of studies is low. Both the point estimate and a measure of its precision, the 95% CI, should be reported. When all the 95% CIs of the effect size of all the studies in a meta-analysis overlap, I2
will be 0. |
Subgroup/moderator analysis | Examples: Will subgroups of studies with different attributes (participant or intervention, risks of bias) have different effect estimates? | Often used to identify source of heterogeneity. Subgroups are best identified a priori. |
Metaregression | Describes association between effect size and study level covariates. Ask this question: Are there factors that modify effectiveness of a treatment? | Describes variation between studies. Results are observational in nature. Limited by the number of studies compared with number of covariates. Multiple metaregression often challenging because of missing data for different covariates from different studies. |
Sensitivity analysis | Examples: test how sensitive the effect estimate is to (1) deleting each study from the model once, (2) deleting studies that are outliers (results, size), and (3) deleting studies based on quality aspects such as risk of bias. | Often used to identify source of heterogeneity. |
CI of summary mean treatment effect estimate | Describes the uncertainty of the summary treatment effect estimate (single numerical estimate) generated from meta-analysis. | Is a measure of precision of the mean treatment effect estimate generated from multiple studies. Does not describe degree of heterogeneity among studies. |
Prediction interval | Describes the spread of underlying effects in the studies included in a random-effects meta-analysis. That is, the prediction interval describes the degree of heterogeneity in meta-analysis. Describes the width of the distribution. | Based on assumption of normality of underlying effects estimates. Is a measure of dispersion of individual results and provides a wider range of expected treatment effects compared with CI of the mean effect estimate. |