Guidance for reporting a systematic review (with or without a meta-analysis)

This advice is based on the PRISMA statement; an evidence-based minimum set of items for reporting systematic reviews and meta-analyses. Although the advice focuses on the reporting of reviews evaluating randomized trials, it can also be used for reporting systematic reviews of other types of research, particularly evaluations of interventions.  Read more

The following information was originally published here.

Go to checklist

Title

1. Title

Identify the report as a systematic review.

Explanation

Inclusion of “systematic review” in the title facilitates identification by potential users (patients, healthcare providers, policy makers, etc) and appropriate indexing in databases. Terms such as “review,” “literature review,” “evidence synthesis,” or “knowledge synthesis” are not recommended because they do not distinguish systematic and non-systematic approaches. We also discourage using the terms “systematic review” and “meta-analysis” interchangeably because a systematic review refers to the entire set of processes used to identify, select, and synthesise evidence, whereas meta-analysis refers only to the statistical synthesis. Furthermore, a meta-analysis can be done outside the context of a systematic review (for example, when researchers meta-analyse results from a limited set of studies that they have conducted).

Essential elements

Identify the report as a systematic review in the title. Report an informative title that provides key information about the main objective or question that the review addresses (for reviews of interventions, this usually includes the population and the intervention(s) that the review addresses).

Additional elements

Consider providing additional information in the title, such as the method of analysis used (for example, “a systematic review with meta-analysis”), the designs of included studies (for example, “a systematic review of randomised trials”), or an indication that the review is an update of an existing review or a continually updated (“living”) systematic review.

Example/s:

Example 1

Comparison of the therapeutic effects of rivaroxaban versus warfarin in antiphospholipid syndrome: a systematic review

Example 2

Repetitive transcranial magnetic stimulation for the treatment of lower limb dysfunction in patients poststroke: a systematic review with meta-analysis

Example 3

Efficacy, tolerability and safety of cannabis-based medicines for cancer pain: A systematic review with meta-analysis of randomised controlled trials

Example 4

Does routine anti-osteoporosis medication lower the risk of fractures in male subjects? An updated systematic review with meta-analysis of clinical trials.

Abstract

2. Abstract

Report an abstract addressing each item in the PRISMA 2020 for Abstracts checklist.

  1. Identify the report as a systematic review
  2. Provide an explicit statement of the main objective(s) or question(s) the review addresses
  3. Specify the inclusion and exclusion criteria for the review
  4. Specify the information sources (such as databases, registers) used to identify studies and the date when each was last searched
  5. Specify the methods used to assess risk of bias in the included studies
  6. Specify the methods used to present and synthesise results
  7. Give the total number of included studies and participants and summarise relevant characteristics of studies
  8. Present results for main outcomes, preferably indicating the number of included studies and participants for each. If meta-analysis was done, report the summary estimate and confidence/credible interval. If comparing groups, indicate the direction of the effect (that is, which group is favoured)
  9. Provide a brief summary of the limitations of the evidence included in the review (such as study risk of bias, inconsistency, and imprecision)
  10. Provide a general interpretation of the results and important implications
  11. Specify the primary source of funding for the review
  12. Provide the register name and registration number

Explanation An abstract providing key information about the main objective(s) or question(s) that the review addresses, methods, results, and implications of the findings should help readers decide whether to access the full report. For some readers, the abstract may be all that they have access to. Therefore, it is critical that results are presented for all main outcomes for the main review objective(s) or question(s) regardless of the statistical significance, magnitude, or direction of effect. Terms presented in the abstract will be used to index the systematic review in bibliographic databases. Therefore, reporting keywords that accurately describe the review question (such as population, interventions, outcomes) is recommended.

Example/s:

Example 1

Title: Psychological interventions for common mental disorders in women experiencing intimate partner violence in low-income and middle-income countries: a systematic review and meta-analysis.

Background: Evidence on the effectiveness of psychological interventions for women with common mental disorders (CMDs) who also experience intimate partner violence is scarce. We aimed to test our hypothesis that exposure to intimate partner violence would reduce intervention effectiveness for CMDs in low-income and middle-income countries (LMICs).

Methods: For this systematic review and meta-analysis, we searched MEDLINE, Embase, PsycINFO, Web of Knowledge, Scopus, CINAHL, LILACS, ScieELO, Cochrane, PubMed databases, trials registries, 3ie, Google Scholar, and forward and backward citations for studies published between database inception and Aug 16, 2019. All randomised controlled trials (RCTs) of psychological interventions for CMDs in LMICs which measured intimate partner violence were included, without language or date restrictions. We approached study authors to obtain unpublished aggregate subgroup data for women who did and did not report intimate partner violence. We did separate random-effects meta-analyses for anxiety, depression, post-traumatic stress disorder (PTSD), and psychological distress outcomes. Evidence from randomised controlled trials was synthesised as differences between standardised mean differences (SMDs) for change in symptoms, comparing women who did and who did not report intimate partner violence via random-effects meta-analyses. The quality of the evidence was assessed with the Cochrane risk of bias tool. This study is registered on PROSPERO, number CRD42017078611.

Findings: Of 8122 records identified, 21 were eligible and data were available for 15 RCTs, all of which had a low to moderate risk of overall bias. Anxiety (five interventions, 728 participants) showed a greater response to intervention among women reporting intimate partner violence than among those who did not (difference in standardised mean differences [dSMD] 0.31, 95% CI 0.04 to 0.57, I2=49.4%). No differences in response to intervention were seen in women reporting intimate partner violence for PTSD (eight interventions, n=1436; dSMD 0.14, 95% CI −0.06 to 0.33, I2=42.6%), depression (12 interventions, n=2940; 0.10, −0.04 to 0.25, I2=49.3%), and psychological distress (four interventions, n=1591; 0.07, −0.05 to 0.18, I2=0.0%, p=0.681).

Interpretation: Psychological interventions treat anxiety effectively in women with current or recent intimate partner violence exposure in LMICs when delivered by appropriately trained and supervised health-care staff, even when not tailored for this population or targeting intimate partner violence directly. Future research should investigate whether adapting evidence-based psychological interventions for CMDs to address intimate partner violence enhances their acceptability, feasibility, and effectiveness in LMICs.

Funding: UK National Institute for Health Research ASSET and King's IoPPN Clinician Investigator Scholarship.

Example 2

Background: Cognitive bias modification (CBM) therapies, including attention bias modification, interpretation bias modification, or approach and avoidance training, are prototypical examples of mechanistically derived treatments, but their effectiveness is contentious. We aimed to assess the relative effectiveness of various CBM interventions for anxious and depressive symptomatology. Methods: For this systematic review and network meta-analysis, we searched PubMed, PsycINFO, Embase, and Cochrane Central Register from database inception up until Feb 7, 2020. We included randomised controlled trials of CBM versus control conditions or other forms of CBM for adults aged 18 years and older with clinical or subclinical anxiety or depression measured with a diagnostic interview or a validated clinical scale. We excluded studies comparing CBM with a non-CBM active intervention. Two researchers independently selected studies and evaluated risk of bias with the Cochrane Collaboration tool. Primary outcomes encompassed anxiety and depressive symptoms measured with validated clinical scales. We computed standardised mean differences (SMDs) with a restricted maximum likelihood random effects model. This study is registered with PROSPERO, CRD42018086113. Findings: From 2125 records we selected 85 trials, 65 (n=3897) on anxiety and 20 (n=1116) on depression. In a well-connected network of anxiety trials, interpretation bias modification outperformed waitlist (SMD -0.55, 95% CI -0.91 to -0.19) and sham training (SMD -0.30, -0.50 to -0.10) for the primary outcome. Attention bias modification showed benefits only in post-hoc sensitivity analyses excluding post-traumatic stress disorder trials. Prediction intervals for all findings were large, including an SMD of 0. Networks of depression trials displayed evidence of inconsistency. Only four randomised controlled trials had low risk of bias on all six domains assessed. Interpretation: CBM interventions showed consistent but small benefits; however heterogeneity and risk of bias undermine the reliability of these findings. Larger, definitive trials for interpretation bias modification for anxiety might be warranted, but insufficient evidence precludes conclusions for depression. Funding: Romanian Ministry of Research and Innovation, The National Council for Scientific Research-The Executive Agency for Higher Education, Research, Development and Innovation Funding.

Example 3

Objectives: There are high levels of inappropriate antibiotic use in long-term care facilities (LTCFs). Our objective was to examine evidence of the effectiveness of interventions designed to reduce antibiotic use and/or inappropriate use in LTCFs. Design: Systematic review and meta-analysis. Data sources: MEDLINE, Embase and CINAHL from 1997 until November 2018. Eligibility criteria: Controlled and uncontrolled studies in LTCFs measuring intervention effects on rates of overall antibiotic use and/or appropriateness of use were included. Secondary outcomes were intervention implementation barriers from process evaluations. Data extraction and synthesis: Two reviewers independently applied the Cochrane Effective Practice and Organisation of Care group’s resources to classify interventions and assess risk of bias. Meta-analyses used random effects models to pool results. Results: Of include studies (n=19), 10 had a control group and 17 had a high risk of bias. All interventions had multiple components. Eight studies (with high risk of bias) showed positive impacts on outcomes and included one of the following interventions: audit and feedback, introduction of care pathways or an infectious disease team. Meta-analyses on change in the percentage of residents on antibiotics (pooled relative risk (RR) (three studies, 6862 residents): 0.85, 95% CI: 0.61 to 1.18), appropriateness of decision to treat with antibiotics (pooled RR (three studies, 993 antibiotic orders): 1.10, 95% CI: 0.64 to 1.91) and appropriateness of antibiotic selection for respiratory tract infections (pooled RR (three studies, 292 orders): 1.15, 95% CI: 0.95 to 1.40), showed no significant intervention effects. However, meta-analyses only included results from intervention groups since most studies lacked a control group. Insufficient data prevented meta-analysis on other outcomes. Process evaluations (n=7) noted poor intervention adoption, low physician engagement and high staff turnover as barriers. Conclusions: There is insufficient evidence that interventions employed to date are effective at improving antibiotic use in LTCFs. Future studies should use rigorous study designs and tailor intervention implementation to the setting.

Example 4

Objective: To examine the dose-response relation between reduction in dietary sodium and blood pressure change and to explore the impact of intervention duration. Design: Systematic review and meta-analysis following PRISMA guidelines. Data sources: Ovid MEDLINE(R), EMBASE, and Cochrane Central Register of Controlled Trials (Wiley) and reference lists of relevant articles up to 21 January 2019. Inclusion criteria: Randomised trials comparing different levels of sodium intake undertaken among adult populations with estimates of intake made using 24 hour urinary sodium excretion. Data extraction and analysis: Two of three reviewers screened the records independently for eligibility. One reviewer extracted all data and the other two reviewed the data for accuracy. Reviewers performed random effects meta-analyses, subgroup analyses, and meta-regression. Results: 133 studies with 12,197 participants were included. The mean reductions (reduced sodium v usual sodium) of 24 hour urinary sodium, systolic blood pressure (SBP), and diastolic blood pressure (DBP) were 130 mmol (95% confidence interval 115 to 145, P<0.001), 4.26 mm Hg (3.62 to 4.89, P<0.001), and 2.07 mm Hg (1.67 to 2.48, P<0.001), respectively. Each 50 mmol reduction in 24 hour sodium excretion was associated with a 1.10 mm Hg (0.66 to 1.54; P<0.001) reduction in SBP and a 0.33 mm Hg (0.04 to 0.63; P=0.03) reduction in DBP. Reductions in blood pressure were observed in diverse population subsets examined, including hypertensive and non-hypertensive individuals. For the same reduction in 24 hour urinary sodium there was greater SBP reduction in older people, non-white populations, and those with higher baseline SBP levels. In trials of less than 15 days’ duration, each 50 mmol reduction in 24 hour urinary sodium excretion was associated with a 1.05 mm Hg (0.40 to 1.70; P=0.002) SBP fall, less than half the effect observed in studies of longer duration (2.13 mm Hg; 0.85 to 3.40; P=0.002). Otherwise, there was no association between trial duration and SBP reduction. Conclusions: The magnitude of blood pressure lowering achieved with sodium reduction showed a dose-response relation and was greater for older populations, non-white populations, and those with higher blood pressure. Short term studies underestimate the effect of sodium reduction on blood pressure. Systematic review registration: PROSPERO CRD42019140812.

Introduction

3. Background/rationale

Describe the rationale for the review in the context of existing knowledge.

Explanation

Describing the rationale should help readers understand why the review was conducted and what the review might add to existing knowledge.

Essential elements

Describe the current state of knowledge and its uncertainties.

Articulate why it is important to do the review.

If other systematic reviews addressing the same (or a largely similar) question are available, explain why the current review was considered necessary (for example, previous reviews are out of date or have discordant results; new review methods are available to address the review question; existing reviews are methodologically flawed; or the current review was commissioned to inform a guideline or policy for a particular organisation). If the review is an update or replication of a particular systematic review, indicate this and cite the previous review.

If the review examines the effects of interventions, also briefly describe how the intervention(s) examined might work.

Additional elements

If there is complexity in the intervention or context of its delivery, or both (such as multi-component interventions, interventions targeting the population and individual level, equity considerations30), consider presenting a logic model (sometimes referred to as a conceptual framework or theory of change) to visually display the hypothesised relationship between intervention components and outcomes.

Example/s:

Example 1

To contain widespread infection and to reduce morbidity and mortality among health-care workers and others in contact with potentially infected people, jurisdictions have issued conflicting advice about physical or social distancing. Use of face masks with or without eye protection to achieve additional protection is debated in the mainstream media and by public health authorities, in particular the use of face masks for the general population; moreover, optimum use of face masks in health-care settings, which have been used for decades for infection prevention, is facing challenges amid personal protective equipment (PPE) shortages. Any recommendations about social or physical distancing, and the use of face masks, should be based on the best available evidence. Evidence has been reviewed for other respiratory viral infections, mainly seasonal influenza, but no comprehensive review is available of information on SARS-CoV-2 or related betacoronaviruses that have caused epidemics, such as severe acute respiratory syndrome (SARS) or Middle East respiratory syndrome (MERS). We, therefore, systematically reviewed the effect of physical distance, face masks, and eye protection on transmission of SARS-CoV-2, SARS-CoV, and MERS-CoV.

Example 2

A recent Campbell systematic review and network meta-analysis (NMA) by members of our team (V.W., P.T., G.A.W., E.G., Z.B.), with 47 randomised trials and >1 million children, found little to no overall effect on growth, attention and school attendance (Welch et al., 2016). With NMA, we were able to explore the size of effect with different types and frequency of drugs and their combination with food or micronutrients; none of which contributed to larger effects. Our review also did not find larger effects in subgroups of children at the aggregate level across characteristics such as age, baseline nutritional status, prevalence or intensity of infection that have been postulated to be important (Welch et al., 2016). These analyses were conducted at the study level, rather than using data for each individual child, which limits the power to detect effect modification by individual participant characteristics. This review was therefore unable to identify whether mass deworming was more effective for children with certain characteristics. There was substantial unexplained heterogeneity between studies, with some studies finding larger effects than others, and no single individual-level, setting-level or methodology characteristic explaining this variation. Thus, we concluded that our analysis of effect modifiers was limited by the aggregate level data…We decided in collaboration with several authors of primary trials that there would be value in conducting an individual participant data (IPD) meta-analysis to explore the question of whether mass deworming is more effective for subgroups of children defined by characteristics such as infection intensity or status, age or nutritional status. This understanding could help to develop targeted strategies to reach these children better with deworming and guide policy regarding deworming.

Example 3

Currently there is no clear evidence to indicate which surgery is the best choice. It is unclear if the older operations that were previously available (such as anterior repair and colposuspension) really result in equivalent or better outcomes than the polypropylene mid-urethral sling. However, the feeling of our clinical experts who used to offer colposuspension and traditional slings is that these techniques had more frequent and severe associated complications and returning to them may be detrimental to women. To enable women to make an evidence-based choice and inform practice guidelines, it is essential to collect reliable evidence in a transparent, concise manner to allow impartial counselling of women regarding the benefits and risks of the alternative surgical operations for the management of stress urinary incontinence. The wide range of surgical operations available, the different techniques used to perform these operations and the lack of a consensus among surgeons make it challenging to establish which procedure is the most effective. The existing evidence base, including the Cochrane systematic reviews, has focused on discrete two-way comparisons, with no attempt being made to collate all of the evidence on the surgical options available and rank them in terms of clinical effectiveness, safety and cost-effectiveness. This has resulted in a piecemeal evidence base that is difficult for women and clinicians to interpret. This assessment includes an evidence synthesis of all available randomized controlled trials to determine the relative clinical effectiveness and safety of interventions, a discrete choice experiment (DCE) to explore women’s preferences, an economic decision model to determine the most cost-effective treatment and a value-of-information (VOI) analysis to help inform the focus of further research.

Example 4

…it is well known that the organic nitrates lower blood pressure in hypertensive individuals, which brings about the question of whether inorganic nitrates have the same ability. This review focuses on the dietary alteration component of lifestyle modifications by the use of inorganic nitrate in the treatment of hypertension. The appraisal of the evidence was completed to ultimately help providers make informed decisions regarding interventions to address one of the nation's biggest killers. There was a systematic review published in 2013 that addressed the effects of dietary inorganic nitrate on blood pressure with an overrepresentation of healthy, normotensive participants. That review found that inorganic nitrates decrease blood pressure. For this reason, this review examines studies published from 2013 through 2018 with blood pressure greater than 120/80 mmHg in participants, which would be considered elevated according to the guidelines published by the American College of Cardiology (ACC) and American Heart Association (AHA). The results of this review will contribute towards a greater understanding of possible treatments for hypertension, sequentially resulting in less morbidity and mortality from cardiovascular diseases. At the time of this systematic review, there was no systematic review that evaluated the effects of inorganic nitrate specifically on adults with blood pressure greater than 120/80 mmHg.

4. Objectives

Provide an explicit statement of the objective(s) or question(s) the review addresses.

Explanation

An explicit and concise statement of the review objective(s) or question(s) will help readers understand the scope of the review and assess whether the methods used in the review (such as eligibility criteria, search methods, data items, and the comparisons used in the synthesis) adequately address the objective(s). Such statements may be written in the form of objectives (“the objectives of the review were to examine the effects of…”) or as questions (“what are the effects of…?”).

Essential elements

Provide an explicit statement of all objective(s) or question(s) the review addresses, expressed in terms of a relevant question formulation framework

If the purpose is to evaluate the effects of interventions, use the Population, Intervention, Comparator, Outcome (PICO) framework or one of its variants to state the comparisons that will be made.

Example/s:

Example 1

Objectives: To evaluate the benefits and harms of down‐titration (dose reduction, discontinuation, or disease activity‐guided dose tapering) of anti‐tumour necrosis factor-blocking agents (adalimumab, certolizumab pegol, etanercept, golimumab, infliximab) on disease activity, functioning, costs, safety, and radiographic damage compared with usual care in people with rheumatoid arthritis and low disease activity.

Example 2

Key Questions:

  1. What are the benefits of PrEP in persons without pre-existing HIV infection versus placebo or no PrEP (including deferred PrEP) on the prevention of HIV infection and quality of life? a. How do the benefits of PrEP differ by population subgroups? b. How do the benefits of PrEP differ by dosing strategy or regimen?

  2. What is the diagnostic accuracy of provider or patient risk assessment tools in identifying persons at increased risk of HIV acquisition who are candidates for PrEP?

  3. What are rates of adherence to PrEP in U.S. primary care–applicable settings?

  4. What is the association between adherence to PrEP and effectiveness for preventing HIV acquisition?

  5. What are the harms of PrEP versus placebo or no PrEP when used for the prevention of HIV infection?

Example 3

The primary objective of this review was to determine the impact of mother-targeted mHealth educational interventions during the perinatal period in low- and middle-income countries on maternal and neonatal outcomes. Thus, this quantitative review aimed to answer the following questions:

i. What is the impact of mother-targeted mHealth educational interventions on maternal knowledge, self-efficacy and antenatal/postnatal care clinic attendance in low- and middle-income countries?

ii. What is the impact of mother-targeted mHealth educational interventions on neonatal mortality and morbidity in lowand middle-income countries?

Example 4

In order to determine the effectiveness of screening for esophageal adenocarcinoma among gastroesophageal reflux disease patients, the following key questions were addressed:

1a. In adults (≥ 18 years) with chronic gastroesophageal reflux disease with or without other risk factors, what is the effectiveness (benefits and harms) of screening for esophageal adenocarcinoma and precancerous conditions (Barrett’s Esophagus and lowand high-grade dysplasia)? What are the effects in relevant subgroup populations?

1b. If there is evidence of effectiveness, what is the optimal time to initiate and to end screening, and what is the optimal screening interval (includes single and multiple tests and ongoing ‘surveillance’)?

Methods

5. Eligibility criteria

Specify the inclusion and exclusion criteria for the review and how studies were grouped for the syntheses.

Explanation

Specifying the criteria used to decide what evidence was eligible or ineligible in sufficient detail should enable readers to understand the scope of the review and verify inclusion decisions. The PICO framework is commonly used to structure the reporting of eligibility criteria for reviews of interventions. In addition to specifying the review PICO, the intervention, outcome, and population groups that were used in the syntheses need to be identified and defined. For example, in a review examining the effects of psychological interventions for smoking cessation in pregnancy, the authors specified intervention groups (counselling, health education, feedback, incentive-based interventions, social support, and exercise) and the defining components of each group.

Essential elements

Specify all study characteristics used to decide whether a study was eligible for inclusion in the review, that is, components described in the PICO framework or one of its variants, and other characteristics, such as eligible study design(s) and setting(s) and minimum duration of follow-up.

Specify eligibility criteria with regard to report characteristics, such as year of dissemination, language, and report status (for example, whether reports such as unpublished manuscripts and conference abstracts were eligible for inclusion). Clearly indicate if studies were ineligible because the outcomes of interest were not measured, or ineligible because the results for the outcome of interest were not reported. Reporting that studies were excluded because they had “no relevant outcome data” is ambiguous and should be avoided.

Specify any groups used in the synthesis (such as intervention, outcome, and population groups) and link these to the comparisons specified in the objectives (item #4).

Additional elements

Consider providing rationales for any notable restrictions to study eligibility. For example, authors might explain that the review was restricted to studies published from 2000 onward because that was the year the device was first available.

Example/s:

Example 1

Types of studies: We include all published or unpublished randomised controlled trials (RCTs). We would also have included cluster-randomised controlled trials and cross-over trials, but we found none. There were no language restrictions, nor did we exclude studies on the basis of the date of publication.

Types of participants: We included people of any age or gender with a primary clinical diagnosis of anorexia nervosa (AN), either or both purging or restricting subtypes, based on DSM (APA 2013) or ICD criteria (WHO 1992) or clinicians' judgement, and of any severity. We included those with chronic AN. We included those with psychiatric comorbidity, with the details of comorbidity documented. Participants may have received the intervention in any setting (including in-, day- or outpatient) and may have started in the trial at the beginning of treatment or part-way through (e.g. after discharge from hospital or some other indication/definition of stabilisation). We included those living in a family unit (of any nature, as described/defined by study authors), and those living outside of a family unit.

Types of interventions: Trials where the intervention describes inclusion of the family in some way and is labelled 'family therapy'. These interventions may have been delivered as a monotherapy or in conjunction with other interventions (including standard care, which may or may not be in the context of an inpatient admission). The main categories of family therapy approaches considered were:

• Structural family therapy • Systems (systemic) family therapy • Strategic family therapy • Family-based therapy and its variants (including short-term, long-term, and separated) and behavioural family systems therapy (these two therapies were grouped together, given the similarity of approach) • Other (including other approaches that use family involvement in therapy but are less specific about the theoretical underpinning of the therapy and its procedures).

Family therapy approaches were compared with: • Standard care or treatment as usual • Biological interventions (for example, antidepressants, antipsychotics, mood stabilisers, anxiolytics, neutraceuticals, and other agents such as anti-glucocorticoids) • Educational interventions (for example, nutritional interventions and dietetics) • Psychological interventions (for example, cognitive behavioural therapy (CBT) and its derivatives, cognitive analytical therapy, interpersonal therapy, supportive therapy, psychodynamic therapy, play therapy, other) • Alternative or complementary interventions (for example, massage, exercise, light therapies).

Additionally, different types of family therapy approaches were compared to each other. The addition of a family therapy approach to other interventions (including standard care) was also compared to other interventions alone. We would also have included the following comparisons: Family therapy approaches versus biological interventions; and Family therapy approaches versus alternative/complementary interventions; however, we had neither the relevant trials nor useable data from these.

Types of outcome measures:

Primary outcomes included: • Remission (by DSM or ICD or trialist-defined cut-off on standardised scale measure for remission versus no remission) • All-cause mortality Secondary outcomes included: • Family functioning as measured on standardised, validated and reliable measures, e.g. Family Environment Scale (Moos 1994), Expressed Emotions (Vaughn 1976), FACES III (Olson 1985) • General functioning, measured by return to school or work, or by general mental health functioning measures, e.g. Global Assessment of Functioning (GAF) (APA 1994) • Dropout (by rates per group during treatment) • Eating disorder psychopathology (evidence of ongoing preoccupation with weight/shape/food/eating by eating-disorder symptom measures using any recognised validated eating disorders questionnaire or interview schedule, e.g. the Morgan-Russell Assessment Schedule (Morgan 1988), Eating Attitudes Test (EAT, Garner 1979), Eating Disorders Inventory (Garner 1983; Garner 1991). • Weight, including all representations of this measure such as kilograms, body mass index (BMI, kg/m2) and average body weight (ABW) calculations. We included this measure after the finalisation of our protocol, due to the lack of universal reporting on remission, and the differing definitions used for remission • Relapse (by DSM or ICD or trialist-defined criteria for relapse or hospitalisation)” (17)

Example 2

Population: We included randomized controlled trials of adult (age ≥18 years) patients undergoing non-cardiac surgery, excluding organ transplantation surgery (as findings in patients who need immunosuppression may not be generalisable to others).

Intervention: We considered all perioperative care interventions identified by the search if they were protocolised (therapies were systematically provided to patients according to pre-defined algorithm or plan) and were started and completed during the perioperative pathway (that is, during preoperative preparation for surgery, intraoperative care, or inpatient postoperative recovery). Examples of interventions that we did or did not deem perioperative in nature included long term preoperative drug treatment (not included, as not started and completed during the perioperative pathway) and perioperative physiotherapy interventions (included, as both started and completed during the perioperative pathway). We excluded studies in which the intervention was directly related to surgical technique.

Outcomes: To be included, a trial had to use a defined clinical outcome relating to postoperative pulmonary complications, such as “pneumonia” diagnosed according to the Centers for Disease Control and Prevention’s definition. Randomized controlled trials reporting solely physiological (for example, lung volumes and flow measurements) or biochemical (for example, lung inflammatory markers) outcomes are valuable but neither patient centric nor necessarily clinically relevant, and we therefore excluded them. We applied no language restrictions. Our primary outcome measure was the incidence of postoperative pulmonary complications, with postoperative pulmonary complications being defined as the composite of any of respiratory infection, respiratory failure, pleural effusion, atelectasis, or pneumothorax…Where a composite postoperative pulmonary complication was not reported, we contacted corresponding authors via email to request additional information, including primary data.

Example 3

The eligible studies had to meet all of the following criteria: 1) adult 18 years and older with exacerbations of chronic obstructive pulmonary disease (ECOPD); 2) received pharmacologic intervention or nonpharmacologic interventions; 3) compared with placebo, standard care, for antibiotics and systemic corticosteroids: different types of agents, different delivery modes, and different durations of treatments; 4) reported outcomes of interest; 5) conducted in outpatient, inpatients, and emergency department; 6) randomized controlled trials (RCTs); and 7) published in English. We excluded studies conducted in the intensive care unit, or chronic ventilator unit or respiratory care unit; studies of patients with exacerbation of chronic bronchitis if they did not have any evidence of airflow limitation on spirometry (at any time, including during a stable state); and studies of health service interventions (e.g. hospital in the home as alternative to hospitalization). We focused only on interventions during the initial acute phase of an exacerbation of COPD and not during the convalescence period. We did not restrict study location or sample size. The detailed inclusion and exclusion criteria are listed in Table 1. All outcomes were final health outcomes except for the intermediate outcome, “forced expiratory volume in one second” (FEV1). FEV1 was included because it is a commonly used outcome in COPD studies and has been shown to be highly predictive of final health outcomes during ECOPD (including mortality, need for intubation, or hospital admission for COPD).

6. Information sources

Specify all databases, registers, websites, organisations, reference lists, and other sources searched or consulted to identify studies. Specify the date when each source was last searched or consulted.

Explanation

Authors should provide a detailed description of the information sources, such as bibliographic databases, registers and reference lists that were searched or consulted, including the dates when each source was last searched, to allow readers to assess the completeness and currency of the systematic review, and facilitate updating.40 Authors should fully report the “what, when, and how” of the sources searched; the “what” and “when” are covered in item #6, and the “how” is covered in item #7. Further guidance and examples about searching can be found in PRISMA-Search, an extension to the PRISMA statement for reporting literature searches in systematic reviews.

Essential elements

Specify the date when each source (such as database, register, website, organisation) was last searched or consulted.

If bibliographic databases were searched, specify for each database its name (such as MEDLINE, CINAHL), the interface or platform through which the database was searched (such as Ovid, EBSCOhost), and the dates of coverage (where this information is provided).

If study registers (such as ClinicalTrials.gov), regulatory databases (such as Drugs@FDA), and other online repositories (such as SIDER Side Effect Resource) were searched, specify the name of each source and any date restrictions that were applied.

If websites, search engines, or other online sources were browsed or searched, specify the name and URL (uniform resource locator) of each source.

If organisations or manufacturers were contacted to identify studies, specify the name of each source.

If individuals were contacted to identify studies, specify the types of individuals contacted (such as authors of studies included in the review or researchers with expertise in the area).

If reference lists were examined, specify the types of references examined (such as references cited in study reports included in the systematic review, or references cited in systematic review reports on the same or a similar topic).

If cited or citing reference searches (also called backwards and forward citation searching) were conducted, specify the bibliographic details of the reports to which citation searching was applied, the citation index or platform used (such as Web of Science), and the date the citation searching was done.

If journals or conference proceedings were consulted, specify the names of each source, the dates covered and how they were searched (such as handsearching or browsing online).

Example/s:

Example 1

We conducted electronic searches for eligible studies within each of the following databases:

• Cochrane Central Register of Controlled Trials (CENTRAL) (1992 to 23rd July 2018); • MEDLINE (including MEDLINE In-Process) (OvidSP) (1946 to 23rd July 2018); • Embase (OvidSP) (1980 to 23rd July 2018); • PsycINFO (OvidSP) (1806 to 23rd July 2018); • Applied Social Sciences Index and Abstracts (ASSIA) (ProQuest) (1987 to 24th July 2018); • Science Citation Index Expanded (Web of Science) (1900 to 24th July 2018); • Social Sciences Citation Index (Web of Science) (1956 to 24th July 2018); and • Trials Register of Promoting Health Interventions (EPPI Centre) (2004 to 27th July 2018).

We conducted electronic searches of the following grey literature databases using search strategies adapted from the final MEDLINE search strategy, as described above:

• Conference Proceedings Citation Index - Science (Web of Science) (1990 to 24th July 2018); • Conference Proceedings Citation Index - Social Science & Humanities (Web of Science) (1990 to 24th July 2018); and • OpenGrey (1997 to 24th July 2018).

We searched trial registers (US National Institutes of Health Ongoing Trials Register ClinicalTrials.gov (www.clinicaltrials.gov/), the World Health Organization International Clinical Trials Registry Platform (apps.who.int/trialsearch/), and the EU Clinical Trials Register (www.clinicaltrialsregister.eu/) to identify registered trials (up to 25th July 2018), and the websites of key organisations in the area of health and nutrition, including the following:

• UK Department of Health; • Centers for Disease Control and Prevention (CDC), USA; • World Health Organization (WHO); • International Obesity Task Force; and • EU Platform for Action on Diet, Physical Activity and Health.

In addition, we searched the reference lists of all eligible study reports and undertook forward citation tracking (using Google Scholar) to identify further eligible studies or study reports (up to 25th July 2018)

Example 2

On 21 December 2017, MAJ searched 16 health, social care, education and legal databases, the names and date coverage of which are given in Table 2. […] We also carried out a ‘snowball’ search to identify additional studies by searching the reference lists of publications eligible for full-text review and using Google Scholar to identify and screen studies citing them. […] On 26 April 2018, we conducted a search of Google Scholar and additional supplementary searches for publications on websites of 10 relevant organisations (including government departments, charities, think-tanks and research institutes). Full details of these supplementary searches can be found in the Additional file 2. Finally, we updated the database search on 7 May 2019, and the snowball and additional searches on 10 May 2019 as detailed in Additional file 3. We used the same search method, except that we narrowed the searches to 2017 onwards.

Example 3

We performed searches in the following databases:

• Health:

o MEDLINE o Embase (Excerpta Medica dataBASE) o CENTRAL (Cochrane Central Register of Controlled Trials)

• Multidisciplinary:

o Scopus o Google Scholar o Social Science Citation Index

• Public health, health promotion and occupational health databases:

o BiblioMap (EPPI-Centre database of health promotion research) o TRoPHI (EPPI-Centre Trials Register of Promoting Health Interventions)

• Nutrition:

o eLENA (WHO e-Library of Evidence for Nutrition Actions)

• Sources for grey literature:

o openGrey (formerly openSIGLE)

• Unpublished studies:

o ClinicalTrials.gov o ICTRP (International Clinical Trials Registry Platform) • Databases with a regional focus:

o LILACS o SciELO Citation Index

We used the Ovid search interface for MEDLINE, Embase and CENTRAL. In addition, we searched the websites of key organisations in the area of health, health promotion and nutrition, including the following:

• EU platform for action on diet, physical activity and health (ec.europa.eu/health/ph_determinants/life_style/nutrition/platform/database/dsp_search.cfm). • U.S. Centers for Disease Control and Prevention (www.cdc.gov/nutrition/data-statistics/sugar-sweetened-beveragesintake.html). • Rudd Center for Food Policy and Obesity (www.uconnruddcenter.org/publications). • Harvard TH Chan School of Public Health Obesity Prevention Source (www.hsph.harvard.edu/obesity-prevention-source). • World Obesity (www.worldobesity.org/what-we-do/policy-prevention).

We handsearched reference lists of included studies and previously published reviews, and contacted the corresponding author of included studies and previously published reviews as well as the members of the Review Advisory Group to identify additional studies. We also conducted a citing studies search with Scopus, i.e. we searched for studies that have cited included studies and previously published reviews. The studies used for these forward and backward citation searches are provided in Appendix 6…We updated searches to 24 January 2018. We included ScieELO, Google Scholar, Open Grey and Bibliomap in our original search (conducted on 27 - 28 June 2016), but not in our 2018 search update.

Example 4

…we searched several resources to maximise the inclusion of all relevant studies. A list of sources that were searched with their brief description is presented in table 1.

7. Search strategy

Present the full search strategies for all databases, registers, and websites, including any filters and limits used.

Explanation

Reporting the full details of all search strategies (such as the full, line by line search strategy as run in each database) should enhance the transparency of the systematic review, improve replicability, and enable a review to be more easily updated. Presenting only one search strategy from among several hinders readers’ ability to assess how comprehensive the searchers were and does not provide them with the opportunity to detect any errors. Furthermore, making only one search strategy available limits replication or updating of the searches in the other databases, as the search strategies would need to be reconstructed through adaptation of the one(s) made available. As well as reporting the search strategies, a description of the search strategy development process can help readers judge how far the strategy is likely to have identified all studies relevant to the review’s inclusion criteria. The description of the search strategy development process might include details of the approaches used to identify keywords, synonyms, or subject indexing terms used in the search strategies, or any processes used to validate or peer review the search strategies. Empirical evidence suggests that peer review of search strategies is associated with improvements to search strategies, leading to retrieval of additional relevant records. Further guidance and examples of reporting search strategies can be found in PRISMA-Search (https://www.equator-network.org/reporting-guidelines/prisma-s/).

Essential elements

Provide the full line by line search strategy as run in each database with a sophisticated interface (such as Ovid), or the sequence of terms that were used to search simpler interfaces, such as search engines or websites.

Describe any limits applied to the search strategy (such as date or language) and justify these by linking back to the review’s eligibility criteria. If published approaches such as search filters designed to retrieve specific types of records (for example, filter for randomised trials) or search strategies from other systematic reviews, were used, cite them. If published approaches were adapted—for example, if existing search filters were amended—note the changes made.

If natural language processing or text frequency analysis tools were used to identify or refine keywords, synonyms, or subject indexing terms to use in the search strategy, specify the tool(s) used.

If a tool was used to automatically translate search strings for one database to another, specify the tool used.

If the search strategy was validated—for example, by evaluating whether it could identify a set of clearly eligible studies—report the validation process used and specify which studies were included in the validation set.

If the search strategy was peer reviewed, report the peer review process used and specify any tool used, such as the Peer Review of Electronic Search Strategies (PRESS) checklist.

If the search strategy structure adopted was not based on a PICO-style approach, describe the final conceptual structure and any explorations that were undertaken to achieve it (for example, use of a multi-faceted approach that uses a series of searches, with different combinations of concepts, to capture a complex research question, or use of a variety of different search approaches to compensate for when a specific concept is difficult to define).

Example/s:

Example 1

Note: the following is an abridged version of an example presented in full in supplementary table S1 on bmj.com. (https://www.bmj.com/content/bmj/suppl/2021/03/29/bmj.n160.DC1/pagm061901.w1.pdf)

MEDLINE(R) In-Process & Other Non-Indexed Citations and Ovid MEDLINE were searched via OvidSP. The database coverage was 1946 to present and the databases were searched on 29 August 2013.

  1. Urinary Bladder, Overactive/
  2. ((overactiv$ or over-activ$ or hyperactiv$ or hyper-activ$ or unstable or instability or incontinen$) adj3 bladder$).ti,ab.
  3. (OAB or OABS or IOAB or IOABS).ti,ab.
  4. (urge syndrome$ or urge frequenc$).ti,ab.
  5. ((overactiv$ or over-activ$ or hyperactiv$ or hyper-activ$ or unstable or instability) adj3 detrusor$).ti,ab.
  6. Urination Disorders/
  7. exp Urinary Incontinence/
  8. Urinary Bladder Diseases/
  9. (urge$ adj3 incontinen$).ti,ab.
  10. (urin$ adj3 (incontinen$ or leak$ or urgen$ or frequen$)).ti,ab.
  11. (urin$ adj3 (disorder$ or dysfunct$)).ti,ab.
  12. (detrusor$ adj3 (hyperreflexia$ or hyper-reflexia$ or hypertoni$ or hyper-toni$)).ti,ab.
  13. (void$ adj3 (disorder$ or dysfunct$)).ti,ab.
  14. (micturition$ adj3 (disorder$ or dysfunct$)).ti,ab.
  15. exp Enuresis/
  16. Nocturia/
  17. (nocturia or nycturia or enuresis).ti,ab.
  18. or/1-17
  19. (mirabegron or betmiga$ or myrbetriq$ or betanis$ or YM-178 or YM178 or 223673-61-8 or “223673618” or MVR3JL3B2V).ti,ab,rn.
  20. exp Electric Stimulation Therapy/
  21. Electric Stimulation/
  22. ((sacral or S3) adj3 (stimulat$ or modulat$)).ti,ab.
  23. (neuromodulat$ or neuro-modulat$ or neural modulat$ or electromodulat$ or electro-modulat$ or neurostimulat$ or neuro-stimulat$ or neural stimulat$ or electrostimulat$ or electro-stimulat$).ti,ab.
  24. (InterStim or SNS).ti,ab.
  25. ((electric$ or nerve$1) adj3 (stimulat$ or modulat$)).ti,ab.
  26. (electric$ therap$ or electrotherap$ or electro-therap$).ti,ab.
  27. TENS.ti,ab.
  28. exp Electrodes/
  29. electrode$1.ti,ab.
  30. ((implant$ or insert$) adj3 pulse generator$).ti,ab.
  31. ((implant$ or insert$) adj3 (neuroprosthe$ or neuro-prosthe$ or neural prosthe$)).ti,ab.
  32. PTNS.ti,ab.
  33. (SANS or Stoller Afferent or urosurg$).ti,ab.
  34. (evaluat$ adj3 peripheral nerve$).ti,ab.
  35. exp Botulinum Toxins/
  36. (botulinum$ or botox$ or onabotulinumtoxin$ or 1309378-01-5 or “1309378015”).ti,ab,rn.
  37. or/19-36
  38. 18 and 37
  39. randomized controlled trial.pt.
  40. controlled clinical trial.pt.
  41. random$.ti,ab.
  42. placebo.ti,ab.
  43. drug therapy.fs.
  44. trial.ti,ab.
  45. groups.ab.
  46. or/39-45
  47. 38 and 46
  48. animals/ not humans/
  49. 47 not 48
  50. limit 49 to english language

Search strategy development process: Five known relevant studies were used to identify records within databases. Candidate search terms were identified by looking at words in the titles, abstracts and subject indexing of those records. A draft search strategy was developed using those terms and additional search terms were identified from the results of that strategy. Search terms were also identified and checked using the PubMed PubReMiner word frequency analysis tool. The MEDLINE strategy makes use of the Cochrane RCT filter reported in the Cochrane Handbook v5.2. As per the eligibility criteria the strategy was limited to English language studies. The search strategy was validated by testing whether it could identify the five known relevant studies and also three further studies included in two systematic reviews identified as part of the strategy development process. All eight studies were identified by the search strategies in MEDLINE and Embase. The strategy was developed by an information specialist and the final strategies were peer reviewed by an experienced information specialist within our team. Peer review involved proofreading the syntax and spelling and overall structure but did not make use of the PRESS checklist.

Example 2

For clinicaltrials.gov we used the advanced search interface, and used the search syntax “(sugar-sweetened beverage) OR SSB OR soda” to run searches in the following fields: • Condition or disease • Other terms • Intervention/treatment • Title/Acronym • Outcome Measure

The search yielded 646 records, which we collated and de-duplicated in MS Excel. After de-duplication, 282 unique records remained.

For the International Clinical Trials Registry Platform (ICTRP) we used the advanced search interface, and used the search syntax “sugar-sweetened beverage OR SSB OR soda” to run searches in the following fields (with synonyms, all recruitment status): • Title • OR Condition • OR Intervention The search resulted in 171 hits.

Based on the search, we identified two completed studies eligible for inclusion in our review (Collins 2016 SNAP; Collins 2016 WIC), which we found through clinicaltrials.gov. Moreover, we identified 10 ongoing studies which we judged likely to meet our eligibility criteria upon completion. We present details of these in Characteristics of ongoing studies. We found eight of these through our search in clincialtrials.gov, and two through our search in the ICTRP. We ran trial register searches on 21 June 2018.

8. Selection process

Specify the methods used to decide whether a study met the inclusion criteria of the review, including how many reviewers screened each record and each report retrieved, whether they worked independently, and, if applicable, details of automation tools used in the process.

Explanation

Study selection is typically a multi-stage process in which potentially eligible studies are first identified from screening titles and abstracts, then assessed through full text review and, where necessary, contact with study investigators. Increasingly, a mix of screening approaches might be applied (such as automation to eliminate records before screening or prioritise records during screening). In addition to automation, authors increasingly have access to screening decisions that are made by people independent of the author team (such as crowdsourcing). Authors should describe in detail the process for deciding how records retrieved by the search were considered for inclusion in the review, to enable readers to assess the potential for errors in selection.

Essential elements for systematic reviews regardless of the selection processes used

Report how many reviewers screened each record (title/abstract) and each report retrieved, whether multiple reviewers worked independently (that is, were unaware of each other’s decisions) at each stage of screening or not (for example, records screened by one reviewer and exclusions verified by another), and any processes used to resolve disagreements between screeners (for example, referral to a third reviewer or by consensus).

Report any processes used to obtain or confirm relevant information from study investigators.

If abstracts or articles required translation into another language to determine their eligibility, report how these were translated (for example, by asking a native speaker or by using software programs).

Essential elements for systematic reviews using automation tools in the selection process

Report how automation tools were integrated within the overall study selection process; for example, whether records were excluded based solely on a machine assessment or whether machine assessments were used to double-check human decisions.

If an externally derived machine learning classifier was applied (such as Cochrane RCT Classifier), either to eliminate records or to replace a single screener, include a reference or URL to the version used. If the classifier was used to eliminate records before screening, report the number eliminated in the PRISMA flow diagram as “Records marked as ineligible by automation tools.”

If an internally derived machine learning classifier was used to assist with the screening process, identify the software/classifier and version, describe how it was used (such as to remove records or replace a single screener) and trained (if relevant), and what internal or external validation was done to understand the risk of missed studies or incorrect classifications. For example, authors might state that the classifier was trained on the set of records generated for the review in question (as may be the case when updating reviews) and specify which thresholds were applied to remove records.

If machine learning algorithms were used to prioritise screening (whereby unscreened records are continually re-ordered based on screening decisions), state the software used and provide details of any screening rules applied (for example, screening stopped altogether leaving some records to be excluded based on automated assessment alone, or screening switched from double to single screening once a pre-specified number or proportion of consecutive records was eliminated).

Essential elements for systematic reviews using crowdsourcing or previous “known” assessments in the selection process

If crowdsourcing was used to screen records, provide details of the platform used and specify how it was integrated within the overall study selection process.

If datasets of already-screened records were used to eliminate records retrieved by the search from further consideration, briefly describe the derivation of these datasets. For example, if prior work has already determined that a given record does not meet the eligibility criteria, it can be removed without manual checking. This is the case for Cochrane’s Screen4Me service, in which an increasingly large dataset of records that are known not to represent randomised trials can be used to eliminate any matching records from further consideration.

Example/s:

Example 1

Three researchers (AP, HB-R, FG) independently reviewed titles and abstracts of the first 100 records and discussed inconsistencies until consensus was obtained. Then, in pairs, the researchers independently screened titles and abstracts of all articles retrieved. In case of disagreement, consensus on which articles to screen full-text was reached by discussion. If necessary, the third researcher was consulted to make the final decision. Next, two researchers (AP, HB-R) independently screened full-text articles for inclusion. Again, in case of disagreement, consensus was reached on inclusion or exclusion by discussion and if necessary, the third researcher (FG) was consulted.

Example 2

Citations identified from the literature searches and reference list checking were imported to EndNote and duplicates were removed. Three reviewers independently screened a sample of 109 citations to pre-test and refine coding guidance based on the inclusion criteria. Disagreements about eligibility were resolved through discussion. One reviewer (SB, JR, or SM) then each screened about a third of the remaining citations (grouped by year of publication) for inclusion in the review using the pre-tested coding guidance.

Full-text of all potentially eligible studies were retrieved. A sample of full-text studies was independently screened by two reviewers (SB and JR) until concordance was achieved (~15%; 37/228 of full-text studies screened). The remaining full-text studies were screened by one reviewer (SB or JR). All included studies, and those for which eligibility was uncertain, were screened by a second reviewer (JR or SB). Disagreements or uncertainty about eligibility were resolved through discussion, with advice from the review biostatisticians (JM, AF, or both) to confirm eligibility based on study design and analysis methods. Further information was sought from the authors of two studies (Piumatti 2018, Wardzala 2018) to clarify methods and interpretation of the analysis.

Citations that did not meet the inclusion criteria were excluded and the reason for exclusion was recorded at the full-text screening. Cohort names, author names, and study locations, dates and sample characteristics were used to identify multiple reports arising from the same study (deemed to be a ‘cohort’). These reports were matched and data extracted only from the report that provided the most relevant analysis and complete information for the review. In most cases, the decision was based on the outcome reported (global function was prioritised).

Example 3

We imported titles and abstracts retrieved by the searches into EPPI Reviewer v.4.10.2 (ER4) systematic review software. Duplicate records were identified, manually reviewed and then removed using ER4’s automatic de-duplication feature, with the similarity threshold initially set to 0.85 and then to 0.80. Due to the large number of records retrieved, we developed a semiautomated screening workflow in ER4 that used machine learning to assign title-abstract records for duplicate manual screening.

This workflow was designed to maximise the recall of eligible studies while reducing the overall screening workload to match the resources available. We planned for duplicate manual screening to apply to up to a third of records retrieved. In developing the workflow, we first screened a random sample of 500 title-abstract records to calculate inter-rater reliability and establish an initial estimate of the baseline inclusion rate (sample sized determined as per Shemilt 2014). Secondly, title-abstract records were prioritised for manual screening using active learning to distinguish between relevant and irrelevant records in conjunction with manual user input. This phase of the workflow stopped when each review author had completed 15 hours of duplicate screening without identifying any further potentially eligible studies. In practice, this equated to 1700 title-abstract records.

When we found non-English language articles, we used Google Translate in the first instance to determine potential eligibility. We intended that if an article appeared to be eligible, we would have the article translated by a native language speaker or professional translation service, however no articles needed translating.

Example 4 [Drafted by Steve McDonald and James Thomas, March 2020]

Study selection followed a three-stage process that involved machine learning classifiers, crowdsourcing and manual screening. After removing duplicates, we applied Cochrane’s RCT machine learning classifier (Thomas 2020) and removed from further consideration any record classified as highly unlikely to report a randomized trial (i.e. below the externally calibrated recall threshold of 99%). Records that remained were then screened by Cochrane Crowd (Noel-Storr 2020), a crowdsourcing platform that has consistently shown to be over 99% accurate. In Cochrane Crowd, every record is screened by at least two crowd members, with all disagreements resolved by two expert screeners. Records rejected by the crowd were removed from further consideration. Finally, records the crowd deemed likely to be reports of randomized trials were screened independently by two members of the review team in Covidence.

9. Data collection process

Specify the methods used to collect data from reports, including how many reviewers collected data from each report, whether they worked independently, any processes for obtaining or confirming data from study investigators, and, if applicable, details of automation tools used in the process.

Explanation

Authors should report the methods used to collect data from reports of included studies, to enable readers to assess the potential for errors in the data presented.

Essential elements

Report how many reviewers collected data from each report, whether multiple reviewers worked independently or not (for example, data collected by one reviewer and checked by another), and any processes used to resolve disagreements between data collectors.

Report any processes used to obtain or confirm relevant data from study investigators (such as how they were contacted, what data were sought, and success in obtaining the necessary information).

If any automation tools were used to collect data, report how the tool was used (such as machine learning models to extract sentences from articles relevant to the PICO characteristics), how the tool was trained, and what internal or external validation was done to understand the risk of incorrect extractions.

If articles required translation into another language to enable data collection, report how these articles were translated (for example, by asking a native speaker or by using software programs).

If any software was used to extract data from figures, specify the software used.

If any decision rules were used to select data from multiple reports corresponding to a study, and any steps were taken to resolve inconsistencies across reports, report the rules and steps used.

Example/s:

Example 1

We designed a data extraction form based on that used by Lumley 2009, which two review authors (RC and TC) used to extract data from eligible studies. Extracted data were compared, with any discrepancies being resolved through discussion. RC entered data into Review Manager 5 software (Review Manager 2014), double checking this for accuracy. When information regarding any of the above was unclear, we contacted authors of the reports to provide further details.

Example 2

We developed a standardized data extraction form to extract study characteristics...The standardized form was pilot-tested by all study team members using five randomly selected studies. Reviewers worked independently to extract study details. A third reviewer reviewed data extraction, and resolve conflicts.

Example 3

A data extraction sheet was developed, pilot tested on ten randomly selected included articles and then refined. After finalizing the data extraction sheet, one reviewer performed the initial data extraction for all included articles and a second reviewer checked all proceedings…. Corresponding authors were asked for additional information in cases where data provided in the published articles were insufficient.

10a Data items

List and define all outcomes for which data were sought. Specify whether all results that were compatible with each outcome domain in each study were sought (for example, for all measures, time points, analyses), and, if not, the methods used to decide which results to collect.

Explanation

Defining outcomes in systematic reviews generally involves specifying outcome domains (such as pain, quality of life, adverse events such as nausea) and the time frame of measurement (such as less than six months). Included studies may report multiple results that are eligible for inclusion within the review outcome definition. For example, a study may report results for two measures of pain (such as the McGill Pain Questionnaire and the Brief Pain Inventory), at two time points (such as four weeks and eight weeks), all of which are compatible with a review outcome defined as “pain <6 months.” Multiple results compatible with an outcome domain in a study might also arise when study investigators report results based on multiple analysis populations (such as all participants randomised, all participants receiving a specific amount of treatment), methods for handling missing data (such as multiple imputation, last-observation-carried-forward), or methods for handling confounding (such as adjustment for different covariates).

Reviewers might seek all results that were compatible with each outcome definition from each study or use a process to select a subset of the results. Examples of processes to select results include selecting the outcome definition that (a) was most common across studies, (b) the review authors considered “best” according to a prespecified hierarchy (for example, which prioritises measures included in a core outcome measurement set), or (c) the study investigators considered most important (such as the study’s primary outcome). It is important to specify the methods that were used to select the results when multiple results were available so that users are able to judge the appropriateness of those methods and whether there is potential for bias in the selection of results.

Reviewers may make changes to the inclusion or definition of the outcome domains or to the importance given to them in the review (for example, an outcome listed as “important” in the protocol is considered “critical” in the review). Providing a rationale for the change allows readers to assess the legitimacy of the change and whether it has potential to introduce bias in the review process.

Essential elements

List and define the outcome domains and time frame of measurement for which data were sought.

Specify whether all results that were compatible with each outcome domain in each study were sought, and, if not, what process was used to select results within eligible domains.

If any changes were made to the inclusion or definition of the outcome domains or to the importance given to them in the review, specify the changes, along with a rationale.

If any changes were made to the processes used to select results within eligible outcome domains, specify the changes, along with a rationale.

Additional elements

Consider specifying which outcome domains were considered the most important for interpreting the review’s conclusions (such as “critical” versus “important” outcomes) and provide rationale for the labelling (such as “a recent core outcome set identified the outcomes labelled ‘critical’ as being the most important to patients”).

Example/s:

Example 1

Eligible outcomes were broadly categorised as follows:

Cognitive function

• Global cognitive function • Domain-specific cognitive function (especially domains that reflect specific alcohol-related neuropathologies, such as psychomotor speed and working memory)

Clinical diagnoses of cognitive impairment

• Mild cognitive impairment (also referred to as mild neurocognitive disorders)

These conditions were ‘characterised by a decline from a previously attained cognitive level’.

Major cognitive impairment (also referred to as major neurocognitive disorders; including dementia) was excluded.

We expected that definitions and diagnostic criteria would vary across studies, so we accepted a range of definitions as noted under ‘Methods of outcome assessment’ section. Table 1 provides an example of specific domains of cognitive function used in the diagnosis of mild and major cognitive impairment in the Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition (DSM-5)).

Method of outcome measurement: Any measure of cognitive function was eligible for inclusion. The tests or diagnostic criteria used in each study should have had evidence of validity and reliability for the assessment of mild cognitive impairment, but studies were not excluded on this basis.

We anticipated that many different methods would be used to assess cognitive functioning across studies. These include the following.

Clinical diagnoses of • Mild cognitive impairment using explicit criteria (e.g. National Institute on Aging and the Alzheimer’s Association (United States; NIA-AA) criteria; any of the definitions of mild cognitive impairment described in Matthews et al. 2008)

Neuropsychological tests used to assess global cognitive function, for example the: • Mini-Mental State Examination (MMSE) • Addenbrooke’s Cognitive Examination-Revised (ACE-R) which “incorporates the MMSE and assesses attention, orientation, fluency, language, visuospatial function, and memory, yielding subscale scores for each domain” • Montreal Cognitive Assessment (MOCA), which provides measures for specific cognitive abilities and may be more suitable for assessing mild cognitive impairment than the MMSE

Neuropsychological tests for assessing domain-specific cognitive function, for example, tests of: • Attention and processing speed, for example, the Trail making test (TMT-A) • Memory, for example, the Hopkins verbal learning test (HVLT-R; immediate, delay) • Visuospatial ability, for example the Block design test • Executive function, for example, the Controlled Oral Word Association Test (COWAT)

Results could be reported as an overall test score that provides a composite measure across multiple areas of cognitive ability (i.e. global cognitive function), sub-scales that provide a measure of domain-specific cognitive function or cognitive abilities (e.g. processing speed, memory), or both.

Timing of outcome assessment Studies with a minimum follow-up of 6 months were eligible, a time frame chosen to ensure that studies were designed to examine more persistent effects of alcohol consumption. This threshold was based on previous reviews examining the association between long-term cognitive impairment and alcohol consumption (e.g. Anstey 2009 specified 12 months) and guidance from the Cochrane Dementia and Cochrane Improvement Group, which suggests a minimum follow-up of 9 months for studies examining progression from mild cognitive impairment to dementia. We deliberately specified a shorter period to ensure studies reporting important long-term effects were not missed.

No restrictions were placed on the number of points at which the outcome was measured, but the length of follow-up and number of measurement points (including a baseline measure of cognition) was considered when interpreting study findings and in deciding which outcomes were similar enough to combine for synthesis. Since long-term cognitive impairment is characterised as a decline from a previous level of cognitive function and implies a persistent effect, studies with longer-term outcome follow up at multiple time points should provide the most direct evidence.

Selection of cognitive outcomes where multiple were reported We anticipated that individual studies would report data for multiple cognitive outcomes. Specifically, a single study may report results: • For multiple constructs related to cognitive function, for example, global cognitive function and cognitive ability on specific domains (e.g. memory, attention, problem-solving, language); • Using multiple methods or tools to measure the same or similar outcome, for example reporting measures of global cognitive function using both the Mini-Mental State Examination and the Montreal Cognitive Assessment; • At multiple time points, for example, at 1, 5, and 10 years.

Where multiple cognition outcomes were reported, we selected one outcome for inclusion in analyses and for reporting the main outcomes (e.g. for GRADEing), choosing the result that provided the most complete information for analysis. Where multiple results remained, we listed all available outcomes (without results) and asked our content expert to independently rank these based on relevance to the review question, and the validity and reliability of the measures used. Measures of global cognitive function were prioritised, followed by measures of memory, then executive function.

In the circumstance where results from multiple multivariable models were presented, we extracted associations from the most fully adjusted model, except in the case where an analysis adjusted for a possible intermediary along the causal pathway (i.e. post-baseline measures of prognostic factors (e.g. smoking, drug use, hypertension))

Example 2

We presented the major outcomes below in the ‘Summary of findings’ tables. • Participant-reported pain relief of 30% or greater. • Mean pain score, or mean change in pain score on VAS or Numerical Rating Scale (NRS) or categorical rating scale (in that order of preference). • Disability or function. • Composite endpoints measuring ‘success’ of treatment such as participants feeling no further symptoms. • Quality of life. • Number of participant withdrawals, for example, due to adverse events or intolerance to treatment. • Number of participants experiencing any adverse event.

We extracted outcome measures assessing benefits of treatment (e.g. pain, function, success, quality of life) at the time points: • up to six weeks; • greater than six weeks to three months (this was the primary time point); • greater than three months to up to six months; • greater than six months to 12 months; • greater than 12 months.

If data were available in a trial at multiple time points within each of the above periods (e.g. at four, five and six weeks), we only extracted data at the latest possible time point of each period. We extracted adverse events, calcification resolution and treatment success at the end of the trial.

For a particular systematic review outcome there may be a multiplicity of results available in the trial reports (e.g. multiple scales, time points and analyses). To prevent selective inclusion of data based on the results, we used the following a priori defined decision rules to select data from trials.

• Where trialists reported both final values and change from baseline values for the same outcome, we extracted final values. • Where trialists reported both unadjusted and adjusted values for the same outcome, we extracted unadjusted values. • Where trialists reported data analysed based on the intention-to-treat (ITT) sample and another sample (e.g. perprotocol, as-treated), we extracted ITT-analysed data.

Where trials did not include a measure of overall pain but included one or more other measures of pain, for the purpose of combining data for the primary analysis of overall pain, we combined overall pain with other types of pain in the following hierarchy:

• overall or unspecified pain; • pain at rest; • pain with activity; • daytime pain; • night-time pain.

Where trials included more than one measure of disability or function, we extracted data from the one function scale that was highest on the following a priori defined list:

• Shoulder Pain And Disability Index (SPADI); • Shoulder Disability Questionnaire (SDQ); • Constant score; • Disabilities of the Arm, Shoulder and Hand (DASH); • Health Assessment Questionnaire (HAQ); • any other function scale.

Where trials included more than one measure of treatment success, we extracted data from the one function scale that was highest on the following a priori defined list:

• participant-defined measures of success, such as asking participants if treatment was successful; • trialist-defined measures of success, such as a 30-point increase on the Constant Score.

Example 3

We reported measures of treatment effect from included studies that were adjusted for potential confounding variables over reported estimates that were not adjusted for potential confounding. Where studies used multiple follow-up periods, we used data from the final (most recent) study follow-up. We included data from the primary implementation outcome in meta-analyses. In instances where the authors of included studies did not identify a primary implementation outcome, we used the outcome on which the study sample size and power calculation was based. In its absence, for studies using score-based measures of implementation, and reporting total and subscale scores, we assumed the total score represented the primary implementation outcome. Otherwise, we attempted to calculate a relative effect size for each implementation outcome measure, rank these based on effect size and used the measure reporting the median effect size to include in any pooled analysis. We calculated the effect size by subtracting the change from baseline of the primary implementation outcome for the control or comparison group from the change from baseline in the experimental or intervention group. If data to enable calculation of the change from baseline were unavailable, we used the differences between groups post-intervention. For score-based measures, we calculated a standardised (‘d’) measure of effect size for each outcome to rank the effect size. Where there were an even number of implementation outcomes, one of the two measures at the median was randomly selected and used for inclusion in metaanalysis.

Example 4

Twelve dementia care partners (nurses, allied health professionals, physicians, and a caregiver) selected our study outcomes (18) by independently ranking a group of commonly reported neuropsychiatric symptoms (for example, aggression, agitation, and sleep disturbances) in descending order of importance. The care partners selected change in aggression as our main outcome and change in agitation as our secondary outcome… For all of our NMAs, we preferentially abstracted a scale (e.g. Neuropsychiatry inventory (NPI) – agitation subscale, CMAI) reported by study authors before abstracting an individual aggressive or agitated behaviour (e.g. kicking, biting, screaming). Only in the case of our NMA for the outcome of overall agitation and aggression were there cases where study authors reported more than one scale for the same outcome (e.g. NPI-agitation subscale and CMAI). The CMAI was the most commonly reported scale for the outcome of overall agitation and aggression. The NPI-agitation subscale was the second most common scale for the outcome of overall agitation and aggression. Other scales were reported much less frequently. Therefore, the CMAI was always preferentially abstracted, where reported. If the CMAI was not reported, but the NPIagitation subscale was reported, then it was preferentially abstracted before any other scales used to report the outcome of overall agitation and aggression.

10b Data items

List and define all other variables for which data were sought (such as participant and intervention characteristics, funding sources). Describe any assumptions made about any missing or unclear information.

Explanation

Authors should report the data and information collected from the studies so that readers can understand the type of the information sought and to inform data collection in other similar reviews. Variables of interest might include characteristics of the study (such as countries, settings, number of centres, funding sources, registration status), characteristics of the study design (such as randomised or non-randomised), characteristics of participants (such as age, sex, socioeconomic status), number of participants enrolled and included in analyses, the results (such as summary statistics, estimates of effect and measures of precision, factors adjusted for in analyses), and competing interests of study authors. For reviews of interventions, authors may also collect data on characteristics of the interventions (such as what interventions and comparators were delivered, how they were delivered, by whom, where, and for how long).

Essential elements

List and define all other variables for which data were sought. It may be sufficient to report a brief summary of information collected if the data collection and dictionary forms are made available (for example, as additional files or deposited in a publicly available repository).

Describe any assumptions made about any missing or unclear information from the studies. For example, in a study that includes “children and adolescents,” for which the investigators did not specify the age range, authors might assume that the oldest participants would be 18 years, based on what was observed in similar studies included in the review, and should report that assumption.

If a tool was used to inform which data items to collect (such as the Tool for Addressing Conflicts of Interest in Trials (TACIT) or a tool for recording intervention details), cite the tool used.

Example/s:

Example 1

We extracted information relating to the characteristics of included studies and results as follows.

  1. Study identifiers and characteristics of the study design • Study references (multiple publications arising from the same study were matched to an index reference, which is the study from which results were selected for analysis or summary) • Study or cohort name, location, and commencement date • Study design (categorised as ‘prospective cohort study’, ‘nested case-control study’, or ‘other’ using the checklist of study design features developed by Reeves and colleagues) • Funding sources and funder involvement in the study

  2. Characteristics of the exposure and comparator groups • Levels of alcohol consumption as defined in the study, including details of how consumption was measured and categorised, and information required to convert data for reporting and analysis o Qualitative descriptors of each category, if used (e.g. never or non-drinker, abstainer, former drinker, low/moderate/heavy consumption) o Upper and lower boundaries of each category (e.g. 1 to 29 g per day; 5.1 to 10 units per week based on a standard drink in the UK) o Group used as referent category (comparator) in analyses and how defined o Units of measurement (e.g. standard units of alcohol per day and definition of unit) o Method of collecting alcohol consumption data (e.g. retrospective survey involving recall of alcohol consumption over different periods of life; intake diaries to measure current alcohol consumption); time points at which exposure data were collected o Sample size for each exposure group at each measurement point and included in analysis; number lost to follow up [these data were used in the analysis and risk of bias assessment] o Any additional parameters used to derive each category or exposure measure (e.g. alcohol consumption at each drinking occasion; frequency of drinking; recall period) • Patterns of exposure o Any additional data not listed above that characterises and quantifies different patterns of alcohol exposure (e.g. consumption on heaviest drinking day; diagnosis of an alcohol-use disorder such as dependence or harmful drinking, and the method of assessment; definition of other frequency-based categories used to characterise patterns of drinking such as occasional drinking or infrequent consumption) o Duration/length of exposure period at study baseline and follow-up (directly reported or data that can be used to calculate) o Age at commencement of drinking (initial exposure)

  3. Characteristics of participants • Age at baseline and follow up, sex, ethnicity, co-morbidities, socio-economic status (including education), use of licit or illicit drugs, family history of alcohol dependence • Other characteristics of importance within the context of each study • Eligibility criteria used in the study” (26)

Example 2

We collected data on: • the report: author, year, and source of publication; • the study: sample characteristics, social demography, and definition and criteria used for depression; • the participants: stroke sequence (first ever vs recurrent), social situation, time elapsed since stroke onset, history of psychiatric illness, current neurological status, current treatment for depression, and history of coronary artery disease; • the research design and features: sampling mechanism, treatment assignment mechanism, adherence, non-response, and length of follow up; • the intervention: type, duration, dose, timing, and mode of delivery.” (32)

Example 3

When trial authors reported child grade rather than age, we assumed the following age distributions: kindergarten, four to six years; first grade, five to seven years; second grade, six to eight years, third grade, seven to nine; fourth grade, 8 to 10; fifth grade 9 to 11; sixth grade, 10 to 12; seventh grade, 11 to 13; eighth grade, 12 to 14; ninth grade, 13 to 15; tenth grade, 14 to 16; eleventh grade, 15 to 17; and twelfth grade, 16 to 18.

11. Study risk of bias assessment

Specify the methods used to assess risk of bias in the included studies, including details of the tool(s) used, how many reviewers assessed each study and whether they worked independently, and, if applicable, details of automation tools used in the process.

Explanation

Users of reviews need to know the risk of bias in the included studies to appropriately interpret the evidence. Numerous tools have been developed to assess study limitations for various designs. However, many tools have been criticised because of their content (which may extend beyond assessing study limitations that have the potential to bias findings) and the way in which the items are combined (such as scales where items are combined to yield a numerical score) (see box 4). Reporting details of the selected tool enables readers to assess whether the tool focuses solely on items that have the potential to bias findings. Reporting details of how studies were assessed (such as by one or two authors) allows readers to assess the potential for errors in the assessments. Reporting how risk of bias assessments were incorporated into the analysis is addressed in Items #13e and #13f

Essential elements

Specify the tool(s) (and version) used to assess risk of bias in the included studies.

Specify the methodological domains/components/items of the risk of bias tool(s) used.

Report whether an overall risk of bias judgment that summarised across domains/components/items was made, and if so, what rules were used to reach an overall judgment.

If any adaptations to an existing tool to assess risk of bias in studies were made (such as omitting or modifying items), specify the adaptations.

If a new risk of bias tool was developed for use in the review, describe the content of the tool and make it publicly accessible.

Report how many reviewers assessed risk of bias in each study, whether multiple reviewers worked independently (such as assessments performed by one reviewer and checked by another), and any processes used to resolve disagreements between assessors.

Report any processes used to obtain or confirm relevant information from study investigators.

If an automation tool was used to assess risk of bias in studies, report how the automation tool was used (such as machine learning models to extract sentences from articles relevant to risk of bias), how the tool was trained, and details on the tool’s performance and internal validation.

Example/s:

Example 1

We assessed risk of bias in the included studies using the revised Cochrane ‘Risk of bias’ tool for randomised trials (RoB 2.0) (Higgins 2016a), employing the additional guidance for cluster-randomised and cross-over trials (Eldridge 2016; Higgins 2016b). RoB 2.0 addresses five specific domains: (1) bias arising from the randomisation process; (2) bias due to deviations from intended interventions; (3) bias due to missing outcome data; (4) bias in measurement of the outcome; and (5) bias in selection of the reported result. Two review authors independently applied the tool to each included study, and recorded supporting information and justifications for judgments of risk of bias for each domain (low; high; some concerns). Any discrepancies in judgments of risk of bias or justifications for judgments were resolved by discussion to reach consensus between the two review authors, with a third review author acting as an arbiter if necessary. Following guidance given for RoB 2.0 (Section 1.3.4) (Higgins 2016a), we derived an overall summary 'Risk of bias' judgement (low; some concerns; high) for each specific outcome, whereby the overall RoB for each study was determined by the highest RoB level in any of the domains that were assessed.

Example 2

The expanded risk of bias analysis was based on six dimensions that focused on the design of the study, the analysis of the data, and the contents of the study report. These six dimensions, which conform to the requirements set forth by the UK Economic and Social Research Council (ESRC), are:

  1. Selection and matching of intervention and control areas
  2. Blinding of data collection and analysis
  3. Pre- and post-intervention data collection periods
  4. Reporting of results
  5. Control of confounders
  6. Control of other potential sources of bias

See Appendix G for a list of the 17 specific criteria included in each dimension. Each individual criterion statement was scored on whether it was True, False, or Unclear and these were used to assess each study on whether it presented a high, low, or unclear risk of bias across the six domains.

Risk of bias assessment was performed independently by three review authors (E.G.C., S.K., and C.P.). For the studies identified in the previous review, the same three review authors independently assessed the risk of bias of the included studies. Any discrepancies were resolved by deferment to further review authors (R.S. and P.E.). All disagreements were resolved by consensus.

12. Effect measures

Specify for each outcome the effect measure(s) (such as risk ratio, mean difference) used in the synthesis or presentation of results.

Explanation

To interpret a synthesised or study result, users need to know what effect measure was used. Effect measures refer to statistical constructs that compare outcome data between two groups. For instance, a risk ratio is an example of an effect measure that might be used for dichotomous outcomes.89 The chosen effect measure has implications for interpretation of the findings and might affect the meta-analysis results (such as heterogeneity90). Authors might use one effect measure to synthesise results and then re-express the synthesised results using another effect measure. For example, for meta-analyses of standardised mean differences, authors might re-express the combined results in units of a well known measurement scale, and for meta-analyses of risk ratios or odds ratios, authors might re-express results in absolute terms (such as risk difference).91 Furthermore, authors need to interpret effect estimates in relation to whether the effect is of importance to decision makers. For a particular outcome and effect measure, this requires specification of thresholds (or ranges) used to interpret the size of effect (such as minimally important difference; ranges for no/trivial, small, moderate, and large effects).

Essential elements

Specify for each outcome or type of outcome (such as binary, continuous) the effect measure(s) (such as risk ratio, mean difference) used in the synthesis or presentation of results.

State any thresholds or ranges used to interpret the size of effect (such as minimally important difference; ranges for no/trivial, small, moderate, and large effects) and the rationale for these thresholds.

If synthesised results were re-expressed to a different effect measure, report the methods used to re-express results (such as meta-analysing risk ratios and computing an absolute risk reduction based on an assumed comparator risk).

Additional elements

Consider providing justification for the choice of effect measure. For example, a standardised mean difference may have been chosen because multiple instruments or scales were used across studies to measure the same outcome domain (such as different instruments to assess depression).

Example/s:

Example 1

We planned to analyse dichotomous outcomes by calculating the risk ratio (RR) of a successful outcome (i.e. improvement in relevant variables) for each trial…Because the included resilience-training studies used different measurement scales to assess resilience and related constructs, we used standardised mean difference (SMD) effect sizes (Cohen's d) and their 95% confidence intervals (CIs) for continuous data in pair-wise meta-analyses.

Example 2

We estimated the risk ratio (RR) and its 95% confidence interval (CI) after surgery (pars plana vitrectomy combined with scleral buckle vs pars plana vitrectomy alone) for the following dichotomous outcomes with information obtained from the included studies.

• Primary surgical success. • Second surgery for retinal reattachment. • Development of adverse events such as retinal detachment recurrence, elevation of intraocular pressure above 21 mmHg, choroidal detachment, cystoid macular edema, macular pucker, proliferative vitreoretinopathy, progression of cataract in initially phakic eyes, and any other adverse events reported by included trials at any time from day one up to the last reported follow-up visit after surgery.

Example 3

For survival outcomes (e.g. regression of endometrial hyperplasia, recurrence of endometrial hyperplasia, progression to endometrial carcinoma), we planned to calculate hazard ratios if data were available. Otherwise, we would calculate rates at a set time point, using the Mantel-Haenszel odds ratio (OR) and the numbers of events in control and intervention groups.

13a Synthesis methods

Describe the processes used to decide which studies were eligible for each synthesis (such as tabulating the study intervention characteristics and comparing against the planned groups for each synthesis (item #5)).

Explanation

Before undertaking any statistical synthesis (item #13d), decisions must be made about which studies are eligible for each planned synthesis (item #5). These decisions will likely involve subjective judgments that could alter the result of a synthesis, yet the processes used and information to support the decisions are often absent from reviews. Reporting the processes (whether formal or informal) and any supporting information is recommended for transparency of the decisions made in grouping studies for synthesis. Structured approaches may involve the tabulation and coding of the main characteristics of the populations, interventions, and outcomes. For example, in a review examining the effects of psychological interventions for smoking cessation in pregnancy, the main intervention component of each study was coded as one of the following based on pre-specified criteria: counselling, health education, feedback, incentive-based interventions, social support, and exercise. This coding provided the basis for determining which studies were eligible for each planned synthesis (such as incentive-based interventions versus usual care). Similar coding processes can be applied to populations and outcomes.

Essential elements

Describe the processes used to decide which studies were eligible for each synthesis.

Example/s:

Example 1

Given the complexity of the interventions being investigated, we attempted to categorize the included interventions along four dimensions: (1) was housing provided to the participants as part of the intervention; (2) to what degree was the tenants’ residence in the provided housing dependent on, for example, sobriety, treatment attendance, etc.; (3) if housing was provided, was it segregated from the larger community, or scattered around the city; and (4) if case management services were provided as part of the intervention, to what degree of intensity. We created categories of interventions based on the above dimensions:

  1. Case management only
  2. Abstinence-contingent housing
  3. Non-abstinence-contingent housing
  4. Housing vouchers
  5. Residential treatment with case management

Some of the interventions had multiple components (e.g. abstinence-contingent housing with case management). These interventions were categorized according to the main component (the component that the primary authors emphasized). They were also placed in separate analyses. We then organized the studies according to which comparison intervention was used (any of the above interventions, or usual services).

13b Synthesis methods

Describe any methods required to prepare the data for presentation or synthesis, such as handling of missing summary statistics or data conversions.

Explanation

Authors may need to prepare the data collected from studies so that it is suitable for presentation or to be included in a synthesis. This could involve algebraic manipulation to convert reported statistics to required statistics (such as converting standard errors to standard deviations),89 transforming effect estimates (such as converting standardised mean differences to odds ratios93), or imputing missing summary data (such as missing standard deviations for continuous outcomes, intra-cluster correlations in cluster randomised trials).949596 Reporting the methods required to prepare the data will allow readers to judge the appropriateness of the methods used and the assumptions made and aid in attempts to replicate the synthesis.

Essential elements

Report any methods required to prepare the data collected from studies for presentation or synthesis, such as handling of missing summary statistics or data conversions.

Example/s:

Example 1

In cases where the means, number of participants and test statistics for t-test were reported, but not the standard deviations, and there was the opportunity to include results in a meta-analysis, we calculated standard deviations, assuming the same standard deviation for each of the two groups (intervention and control)

Example 2

Where we were interested in an intervention and it was compared to two or more comparison interventions that were both considered to be within the realm of “usual services”, we combined the two comparison arms into one comparison group and compared the means of the combined control groups to the intervention for a given outcome (for Morse 1992). In one study we have combined two intervention arms that both employed slightly differing versions of an intervention (assertive community treatment) into one intervention group and compared that to the usual services comparison condition (for Morse 1997)

Example 3

We used cluster-adjusted estimates from cluster randomised controlled trials (c-RCTs) where available. If the studies had not adjusted for clustering, we attempted to adjust their standard errors using the methods described in the Cochrane Handbook for Systematic Reviews of Interventions (Higgins 2019), using an estimate of the intra-cluster correlation coefficient (ICC) derived from the trial. If the trial did not report the cluster-adjusted estimated or the ICC, we imputed an ICC from a similar study included in the review, adjusting if the nature or size of the clusters was different (e.g. households compared to classrooms). We assessed any imputed ICCs using sensitivity analysis.

Example 4

Some studies targeted quality problems that involve ‘underuse’, so that improvements in quality correspond to increases in the percentage of patients who receive a target process of care (for example, increasing the percentage of patients who receive the influenza vaccine). However, other studies targeted ‘overuse’, so that improvements correspond to reductions in the percentage of patients receiving inappropriate or unnecessary processes of care (for example, reducing the percentage of patients who receive antibiotics for viral upper respiratory tract infections). In order to standardise the direction of effects, we defined all process adherence outcomes so that higher values represented an improvement. For example, data from a study aimed at reducing the percentage of patients receiving inappropriate medications would be captured as the complementary percentage of patients who did not receive inappropriate medications. Increasing this percentage of patients for whom providers did not prescribe the medications would thus represent an improvement. Each outcome can then be interpreted as compliance with desired practice.

13c Synthesis methods

Describe any methods used to tabulate or visually display results of individual studies and syntheses.

Explanation

Presentation of study results using tabulation and visual display is important for transparency (particularly so for reviews or outcomes within reviews where a meta-analysis has not been undertaken) and facilitates the identification of patterns in the data. Tables may be used to present results from individual studies or from a synthesis (such as Summary of Findings table; see item #22). The purpose of tabulating data varies but commonly includes the complete and transparent reporting of the results or comparing the results across study characteristics. Different purposes will likely lead to different table structures. Reporting the chosen structure(s), along with details of the data presented (such as effect estimates), can aid users in understanding the basis and rationale for the structure (such as, “Table have been structured by outcome domain, within which studies are ordered from low to high risk of bias to increase the prominence of the most trustworthy evidence.”).

The principal graphical method for meta-analysis is the forest plot, which displays the effect estimates and confidence intervals of each study and often the summary estimate. Similar to tabulation, ordering the studies in the forest plot based on study characteristics (such as by size of the effect estimate, year of publication, study weight, or overall risk of bias) rather than alphabetically (as is often done) can reveal patterns in the data.101 Other graphs that aim to display information about the magnitude or direction of effects might be considered when a forest plot cannot be used due to incompletely reported effect estimates (such as no measure of precision reported).28102 Careful choice and design of graphs is required so that they effectively and accurately represent the data.

Essential elements

Report chosen tabular structure(s) used to display results of individual studies and syntheses, along with details of the data presented.

Report chosen graphical methods used to visually display results of individual studies and syntheses.

Additional elements

If studies are ordered or grouped within tables or graphs based on study characteristics (such as by size of the study effect, year of publication), consider reporting the basis for the chosen ordering/grouping.

If non-standard graphs were used, consider reporting the rationale for selecting the chosen graph.

Example/s:

Example 1

… in line with the review protocol we synthesized evidence narratively as well as graphically using harvest plots. Harvest plots have been shown to be an effective, clear and transparent way to summarize evidence of effectiveness for complex interventions (Ogilvie 2008; Turley 2013). We created eight separate harvest plots, one for health outcomes and one for air quality outcomes for each intervention category.

Example 2

Meta-analyses could not be undertaken due to the heterogeneity of interventions, settings, study designs and outcome measures. Albatross plots were created to provide a graphical overview of the data for interventions with more than five data points for an outcome. Albatross plots are a scatter plot of p-values against the total number of individuals in each study. Small pvalues from negative associations appear at the left of the plot, small p-values from positive associations at the right, and studies with null results towards the middle. The plot allows p-values to be interpreted in the context of the study sample size; effect contours show a standardised effect size (expressed as relative risk—RR) for a given p-value and study size, providing an indication of the overall magnitude of any association. We estimated an overall magnitude of association from these contours, but this should be interpreted cautiously.

Example 3

We developed ‘Summary of findings’ tables using GRADEpro GDT. These tables comprise summaries of the estimated intervention effect and the number of participants and studies for each primary outcome, and include justifications underpinning GRADE assessments. We planned to present separate summary effect sizes and certainty of evidence ratings for food, alcohol, and tobacco products, and for availability and proximity interventions within each of these product types, but in practice no eligible alcohol or tobacco studies were identified. Results of random-effects meta-analyses are presented as SMDs with 95% CIs. To facilitate interpretation of these estimated effect sizes, we re-expressed them employing selected familiar metrics of selection or consumption using observational data from a population-representative sample.

13d Synthesis methods

Describe any methods used to synthesise results and provide a rationale for the choice(s). If meta-analysis was performed, describe the model(s), method(s) to identify the presence and extent of statistical heterogeneity, and software package(s) used.

Explanation

Various statistical methods are available to synthesise results, the most common of which is meta-analysis of effect estimates (see box 5). Meta-analysis is used to synthesise effect estimates across studies, yielding a summary estimate. Different meta-analysis models are available, with the random-effects and fixed-effect models being in widespread use. Model choice can importantly affect the summary estimate and its confidence interval; hence the rationale for the selected model should be provided (see box 5). For random-effects models, many methods are available, and their performance has been shown to differ depending on the characteristics of the meta-analysis (such as the number and size of the included studies).

When study data are not amenable to meta-analysis of effect estimates, alternative statistical synthesis methods (such as calculating the median effect across studies, combining P values) or structured summaries might be used.28115 Additional guidance for reporting alternative statistical synthesis methods is available (see Synthesis Without Meta-analysis (SWiM) reporting guideline).

Regardless of the chosen synthesis method(s), authors should provide sufficient detail such that readers are able to assess the appropriateness of the selected methods and could reproduce the reported results (with access to the data).

Essential elements

If statistical synthesis methods were used, reference the software, packages, and version numbers used to implement synthesis methods (such as metan in Stata 16, metafor (version 2.1-0) in R118).

If it was not possible to conduct a meta-analysis, describe and justify the synthesis methods (such as combining P values was used because no or minimal information beyond P values and direction of effect was reported in the studies) or summary approach used.

If meta-analysis was done, specify:

• the meta-analysis model (fixed-effect, fixed-effects, or random-effects) and provide rationale for the selected model. • the method used (such as Mantel-Haenszel, inverse-variance). • any methods used to identify or quantify statistical heterogeneity (such as visual inspection of results, a formal statistical test for heterogeneity, heterogeneity variance (τ2), inconsistency (such as I2119), and prediction intervals).

If a random-effects meta-analysis model was used, specify:

• the between-study (heterogeneity) variance estimator used (such as DerSimonian and Laird, restricted maximum likelihood (REML)). • the method used to calculate the confidence interval for the summary effect (such as Wald-type confidence interval, Hartung-Knapp-Sidik-Jonkman).

If a Bayesian approach to meta-analysis was used, describe the prior distributions about quantities of interest (such as intervention effect being analysed, amount of heterogeneity in results across studies).

If multiple effect estimates from a study were included in a meta-analysis (as may arise, for example, when a study reports multiple outcomes eligible for inclusion in a particular meta-analysis), describe the method(s) used to model or account for the statistical dependency (such as multivariate meta-analysis, multilevel models, or robust variance estimation).

If a planned synthesis was not considered possible or appropriate, report this and the reason for that decision.

Additional elements

If a random-effects meta-analysis model was used, consider specifying other details about the methods used, such as the method for calculating confidence limits for the heterogeneity variance.

Example/s:

Example 1

As the effects of functional appliance treatment were deemed to be highly variable according to patient age, sex, individual maturation of the maxillofacial structures, and appliance characteristics, a random-effects model was chosen to calculate the average distribution of treatment effects that can be expected. A restricted maximum likelihood random-effects variance estimator was used instead of the older DerSimonian-Laird one, following recent guidance. Random effects 95% predictions were to be calculated for meta-analyses with at least three studies to aid in their interpretation by quantifying expected treatment effects in a future clinical setting. The extent and impact of between-study heterogeneity were assessed by inspecting the forest plots and by calculating the tau-squared and the I-squared statistics, respectively. The 95% CIs (uncertainty intervals) around tausquared and the I-squared were calculated to judge our confidence about these metrics. We arbitrarily adopted the I-squared thresholds of > 75% to be considered as signs of considerable heterogeneity, but we also judged the evidence for this heterogeneity (through the uncertainty intervals) and the localization on the forest plot…All analyses were run in Stata SE 14.0 (StataCorp, College Station, TX) by one author.

Example 2

Diverse interventions, settings, and participants characterise the field of smoking cessation. We judged it likely that the included studies would show heterogeneity in treatment effect (the observed intervention effects being more different from each other than one would expect because of random error alone). As such, the assumptions of a fixed-effect meta-analysis (that all studies in the meta-analysis share a common overall effect size and that all factors that could influence the effect size are the same across studies), were unlikely to hold…In random-effects meta-analysis models (restricted maximum-likelihood method), we calculated pooled risk ratios (RRs) with 95% confidence intervals (CIs) for both socioeconomic-position-tailored and nonsocioeconomic-position-tailored interventions as the weighted average of each individual study's estimated intervention effect. All computations were done on a log scale with the log RR, its variance, and standard error (SE), before exponentiating the summary effect for interpretation. We explored heterogeneity by observation of forest plots and use of the χ2 test to show whether observed differences in results were compatible with chance alone. We calculated I2 statistics to examine the level of inconsistency across study findings…Analyses were done in the RStudio development environment version 1.1.463 using R version 3.5.2 and the metafor package.

Example 3

We based our primary analyses upon consideration of dichotomous process adherence measures (for example, the proportion of patients managed according to evidence-based recommendations). In order to provide a quantitative assessment of the effects associated with reminders without resorting to numerous assumptions or conveying a misleading degree of confidence in the results, we used the median improvement in dichotomous process adherence measures across studies…With each study represented by a single median outcome, we calculated the median effect size and interquartile range across all included studies for that comparison.

Example 4

The statistical approach used, therefore, was the combination of the significance levels (P values). The rationale for this choice is that all the trials explored the same broad question, i.e. “is homeopathic treatment efficacious?”, even if, for individual trials, the question asked expressed more specific terms and focused on a given treatment of a particular disease. Thus, unlike in the conventional meta-analytical methods, the method used does not involve pooling the numerical estimates of treatment effect sizes obtained, in our case, in very different situations. Using this approach, the null hypothesis tested is that the effect of interest (in this case, the efficacy of homeopathic treatment) is not present in any of the trials considered. If the null hypothesis is rejected, we can conclude that in at least one trial there is a non-null effect…Thus, we used seven methods: the sum of logs, the sum of Z, the weighted sum of Z, the sum of t, the mean Z, the mean P, the count test and the logit procedure. We present the results obtained with the method that gave the most conservative (least optimistic) results. A two-sided approach was adopted because of the format of the tested hypothesis (i.e. the effect could be either “negative” or “positive”).

13e Synthesis methods

Describe any methods used to explore possible causes of heterogeneity among study results (such as subgroup analysis, meta-regression).

Explanation

If authors used methods to explore possible causes of variation of results across studies (that is, statistical heterogeneity) such as subgroup analysis or meta-regression (see box 5), they should provide sufficient details so that readers are able to assess the appropriateness of the selected methods and could reproduce the reported results (with access to the data). Such methods might be used to explore whether, for example, participant or intervention characteristics or risk of bias of the included studies explain variation in results.

Essential elements

If methods were used to explore possible causes of statistical heterogeneity, specify the method used (such as subgroup analysis, meta-regression).

If subgroup analysis or meta-regression was performed, specify for each:

• which factors were explored, levels of those factors, and which direction of effect modification was expected and why (where possible). • whether analyses were conducted using study-level variables (where each study is included in one subgroup only), within-study contrasts (where data on subsets of participants within a study are available, allowing the study to be included in more than one subgroup), or some combination of the above. • how subgroup effects were compared (such as statistical test for interaction for subgroup analyses).

If other methods were used to explore heterogeneity because data were not amenable to meta-analysis of effect estimates, describe the methods used (such as structuring tables to examine variation in results across studies based on subpopulation, key intervention components, or contextual factors) along with the factors and levels.

If any analyses used to explore heterogeneity were not pre-specified, identify them as such.

Example/s:

Example 1

Given a sufficient number of trials, we used unadjusted and adjusted mixed-effects meta-regression analyses to assess whether variation among studies in smoking cessation effect size was moderated by tailoring of the intervention for disadvantaged groups. The resulting regression coefficient indicates how the outcome variable (log risk ratio (RR) for smoking cessation) changes when interventions take a socioeconomic-position-tailored versus non-socioeconomic-tailored approach. A statistically significant (p<0·05) coefficient indicates that there is a linear association between the effect estimate for smoking cessation and the explanatory variable. More moderators (study-level variables) can be included in the model, which might account for part of the heterogeneity in the true effects. We pre-planned an adjusted model to include important study covariates related to the intensity and delivery of the intervention (number of sessions delivered (above median vs below median), whether interventions involved a trained smoking cessation specialist (yes vs no), and use of pharmacotherapy in the intervention group (yes vs no). These covariates were included a priori as potential confounders given that programmes tailored to socioeconomic position might include more intervention sessions or components or be delivered by different professionals with varying experience. The regression coefficient estimates how the intervention effect in the socioeconomic-position-tailored subgroup differs from the reference group of non-socioeconomic-position-tailored interventions.

Example 2

First, we assessed the association between absolute LDL cholesterol reduction (calculated as a difference in baseline minus lastmeasured achieved LDL cholesterol between the treatment groups) and the relative risk (RR) of major vascular events for statins, ezetimibe, and PCSK9 inhibitors. Second, we did analyses to establish the effect of a reduction of 1 mmol/L in LDL cholesterol on the RR of major vascular events, stratified into four groups with mean baseline LDL cholesterol concentrations of 2.60 mmol/L or less, 2.61–3.40 mmol/L, 3.41–4.10 mmol/L, and more than 4.10 mmol/L (the recommended LDL cholesterol thresholds for treatment initiation). Subgroups of trials that reported outcomes of patients with baseline LDL cholesterol less than 2.07 mmol/L (80 mg/dL) were also analysed to most closely approximate a mean baseline LDL cholesterol of 1.80 mmol/L (70 mg/dL; the LDL cholesterol threshold for treatment in high-risk patients in the ACC/AHA and CCS guidelines). Subgroups of trials that reported outcomes of patients by sex, presence or absence of diabetes, presence or absence of chronic kidney disease (defined as estimate glomerular filtration rate <60 mL/min per 1.73 m), and presence or absence of heart failure were also meta-analysed. Meta-regression analyses were done with the following covariates: baseline LDL cholesterol, extent of LDL cholesterol reduction, mean age, 10-year risk of atherosclerotic cardiovascular disease, and median duration of follow-up. Non-standardised and standardised analyses were done for each 1 mmol/L reduction in LDL cholesterol. We used a multivariable model including the same covariates and drug class... Heterogeneity of RRs were assessed using I2 and Cochran's Q statistic was used to test for differences between subgroups.

Example 3

We anticipated that, in settings where intimate partner violence was sufficiently prevalent to be measured, female therapists might have been considered more culturally acceptable to female participants. We did post-hoc subgroup analyses to compare differences in standardised mean differences (dSMDs) of trauma-focused interventions versus generic psychological interventions, female-delivered interventions versus mixed gender-delivered interventions, novel treatments for low and middle income countries (LMICs) versus those with an established evidence base in high-income countries, and those asking only about recent (within the past 12 months) intimate partner violence versus lifetime intimate partner violence.

13f Synthesis methods

Describe any sensitivity analyses conducted to assess robustness of the synthesised results.

Explanation

If authors performed sensitivity analyses to assess robustness of the synthesised results to decisions made during the review process, they should provide sufficient details so that readers are able to assess the appropriateness of the analyses and could reproduce the reported results (with access to the data). Ideally, sensitivity analyses should be pre-specified in the protocol, but unexpected issues may emerge during the review process that necessitate their use.

Essential elements

If sensitivity analyses were performed, provide details of each analysis (such as removal of studies at high risk of bias, use of an alternative meta-analysis model).

If any sensitivity analyses were not pre-specified, identify them as such.

Example/s:

Example 1

We conducted sensitivity meta-analyses restricted to trials with recent publication (2000 or later); overall low risk of bias (low risk of bias in all seven criteria); and enrolment of generally healthy women (rather than those with a specific clinical diagnosis). To incorporate trials with zero events in both intervention and control arms (which are automatically dropped from analyses of pooled relative risks), we also did sensitivity analyses for dichotomous outcomes in which we added a continuity correction of 0.5 to zero cells.

Example 2

At the request of the funders, we did an additional sensitivity analysis with respect to compliance. Our protocol stated an intention to subgroup by “recent publications”; we changed this to run a sensitivity analysis including publications before 2010 combined with all publications from 2010 onwards with a trials registry entry (even if published retrospectively). As our funders were particularly interested in effects within trials of at least 12 months, we also ran an analysis limiting to trials of at least 52 weeks’ duration.

14. Reporting bias assessment

Describe any methods used to assess risk of bias due to missing results in a synthesis (arising from reporting biases).

Explanation

The validity of a synthesis may be threatened when the available results differ systematically from the missing results. This is known as “bias due to missing results” and arises from “reporting biases” such as selective non-publication and selective non-reporting of results. Direct methods for assessing the risk of bias due to missing results include comparing outcomes and analyses pre-specified in study registers, protocols, and statistical analysis plans with results that were available in study reports. Statistical and graphical methods exist to assess whether the observed data suggest potential for missing results (such as contour enhanced funnel plots, Egger’s test) and how robust the synthesis is to different assumptions about the nature of potentially missing results (such as selection models). Tools (such as checklists, scales, or domain-based tools) that prompt users to consider some or all of these approaches are available. Therefore, reporting methods (tools, graphical, statistical, or other) used to assess risk of bias due to missing results is recommended so that readers are able to assess how appropriate the methods were. The process by which assessments were conducted should also be reported to enable readers to assess the potential for errors and facilitate replicability.

Essential elements

Specify the methods (tool, graphical, statistical, or other) used to assess the risk of bias due to missing results in a synthesis (arising from reporting biases).

If risk of bias due to missing results was assessed using an existing tool, specify the methodological components/domains/items of the tool, and the process used to reach a judgment of overall risk of bias.

If any adaptations to an existing tool to assess risk of bias due to missing results were made (such as omitting or modifying items), specify the adaptations.

If a new tool to assess risk of bias due to missing results was developed for use in the review, describe the content of the tool and make it publicly accessible. Report how many reviewers assessed risk of bias due to missing results in a synthesis, whether multiple reviewers worked independently, and any processes used to resolve disagreements between assessors.

Report any processes used to obtain or confirm relevant information from study investigators.

If an automation tool was used to assess risk of bias due to missing results, report how the tool was used, how the tool was trained, and details on the tool’s performance and internal validation.

Example/s:

Example 1

To assess small-study effects, we planned to generate funnel plots for meta-analyses including at least 10 trials of varying size. If asymmetry in the funnel plot was detected, we planned to review the characteristics of the trials to assess whether the asymmetry was likely due to publication bias or other factors such as methodological or clinical heterogeneity of the trials. To assess outcome reporting bias, we compared the outcomes specified in trial protocols with the outcomes reported in the corresponding trial publications; if trial protocols were unavailable, we compared the outcomes reported in the methods and results sections of the trial publications.

Example 2

To assess selective reporting bias, we compared the measurements and outcomes planned by the original investigators during the trial with those reported within the published paper by checking the trial protocols (when available) against the information in the final publication. Where published protocols were not available and the trial authors did not provide an unpublished protocol upon request, we compared the methods and the results sections of the published papers. We also used our knowledge of the clinical area to identify where trial investigators had not reported commonly used outcome measures.

Example 3

Small study effects owing to potential publication bias, poor methodological quality in smaller studies, artefactual associations, true heterogeneity, or chance were evaluated by using contour-enhanced funnel plots alongside visual examination and statistical tests for asymmetry (Debray’s test).

Example 4

Small-study effects (e.g. publication bias) was checked by contour-enhanced funnel plots and adjusted for by obtaining a precision-effect estimate with standard error. Although precision-effect estimate with standard error tends to slightly underestimate the true association if the observed effects were generated by questionable research practices, simulations suggest that it provides the most precise estimates in the presence of residual effect heterogeneity and small-study effects.

15. Certainty assessment

Describe any methods used to assess certainty (or confidence) in the body of evidence for an outcome.

Explanation

Authors typically use some criteria to decide how certain (or confident) they are in the body of evidence for each important outcome. Common factors considered include precision of the effect estimate (or sample size), consistency of findings across studies, study design limitations and missing results (risk of bias), and how directly the studies address the question. Tools and frameworks can be used to provide a systematic, explicit approach to assessing these factors and provide a common approach and terminology for communicating certainty. For example, using the GRADE approach, authors will first apply criteria to assess each GRADE domain (imprecision, inconsistency, risk of bias, and so forth) and then make an overall judgment of whether the evidence supporting a result is of high, moderate, low, or very low certainty. Reporting the factors considered and the criteria used to assess each factor enables readers to determine which factors fed into reviewers’ assessment of certainty. Reporting the process by which assessments were conducted enables readers to assess the potential for errors and facilitates replication.

Essential elements

Specify the tool or system (and version) used to assess certainty in the body of evidence.

Report the factors considered (such as precision of the effect estimate, consistency of findings across studies) and the criteria used to assess each factor when assessing certainty in the body of evidence.

Describe the decision rules used to arrive at an overall judgment of the level of certainty (such as high, moderate, low, very low), together with the intended interpretation (or definition) of each level of certainty.125

If applicable, report any review-specific considerations for assessing certainty, such as thresholds used to assess imprecision and ranges of magnitude of effect that might be considered trivial, moderate or large, and the rationale for these thresholds and ranges (item #12).

If any adaptations to an existing tool or system to assess certainty were made, specify the adaptations in sufficient detail that the approach is replicable.

Report how many reviewers assessed the certainty of evidence, whether multiple reviewers worked independently, and any processes used to resolve disagreements between assessors.

Report any processes used to obtain or confirm relevant information from investigators.

If an automation tool was used to support the assessment of certainty, report how the automation tool was used, how the tool was trained, and details on the tool’s performance and internal validation.

Describe methods for reporting the results of assessments of certainty, such as the use of Summary of Findings tables (see item #22).

If standard phrases that incorporate the certainty of evidence were used (such as “hip protectors probably reduce the risk of hip fracture slightly”), report the intended interpretation of each phrase and the reference for the source guidance.

Where a published system is adhered to, it may be sufficient to briefly describe the factors considered and the decision rules for reaching an overall judgment and reference the source guidance for full details of assessment criteria.

Example/s:

Example 1

Two people (AM, JS) independently assessed the certainty of the evidence. We used the five GRADE considerations (study limitations, consistency of effect, imprecision, indirectness, and publication bias) to assess the certainty of the body of evidence as it related to the studies that contributed data to the meta-analyses for the prespecified outcomes. We assessed the certainty of evidence as high, moderate, low, or very low. We considered the following criteria for upgrading the certainty of evidence, if appropriate: large effect, dose-response gradient, and plausible confounding effect. We used the methods and recommendations described in sections 8.5 and 8.7, and chapters 11 and 12, of the Cochrane Handbook for Systematic Reviews of Interventions. We used GRADEpro GDT software to prepare the 'Summary of findings' tables (GRADEpro GDT 2015). We justified all decisions to down- or up-grade the certainty of studies using footnotes, and we provided comments to aid the reader’s understanding of the results where necessary.

Example 2

We reported our findings using the language suggested by Glenton and colleagues, focusing on the size of the effect and its clinical significance in relation to the certainty of the evidence on which the result is based (including the precision of the effect).

Example 3

We classified the overall strength of evidence (SOE) for each outcome as high, moderate, low, or insufficient by using an established method that considers study quality, consistency of findings, directness of the comparisons, precision, and applicability (Berkman et al.). For findings with SOE greater than insufficient, we classified the direction of effect as “evidence of benefit,” “no benefit” (that is, no difference from placebo or mixed findings), or “favors placebo.”.

Results

16a Study selection

Describe the results of the search and selection process, from the number of records identified in the search to the number of studies included in the review, ideally using a flow diagram (http://www.prisma-statement.org/PRISMAStatement/FlowDiagram).

Explanation

Review authors should report, ideally with a flow diagram, the results of the search and selection process so that readers can understand the flow of retrieved records through to inclusion in the review. Such information is useful for future systematic review teams seeking to estimate resource requirements and for information specialists in evaluating their searches.133134 Specifying the number of records yielded per database will make it easier for others to assess whether they have successfully replicated a search. The flow diagram in figure 1 provides a template of the flow of records through the review separated by source, although other layouts may be preferable depending on the information sources consulted.65

Essential elements

Report, ideally using a flow diagram, the number of: records identified; records excluded before screening (for example, because they were duplicates or deemed ineligible by machine classifiers); records screened; records excluded after screening titles or titles and abstracts; reports retrieved for detailed evaluation; potentially eligible reports that were not retrievable; retrieved reports that did not meet inclusion criteria and the primary reasons for exclusion (such as ineligible study design, ineligible population); and the number of studies and reports included in the review. If applicable, authors should also report the number of ongoing studies and associated reports identified.

If the review is an update of a previous review, report results of the search and selection process for the current review and specify the number of studies included in the previous review. An additional box could be added to the flow diagram indicating the number of studies included in the previous review

If applicable, indicate in the PRISMA flow diagram how many records were excluded by a human and how many by automation tools.

Example/s:

Example 1

We found 1,333 records in databases searching. After duplicates removal, we screened 1,092 records, from which we reviewed 34 full-text documents, and finally included six papers [cited]. Later, we searched documents that cited any of the initially included studies as well as the references of the initially included studies. However, no extra articles that fulfilled inclusion criteria were found in these searches (Fig 1).

Example 2

Our search of the Cochrane Kidney and Transplant specialised register identified 869 records. We identified an additional 78 records using other sources (reference lists of review articles, relevant studies, and clinical practice guidelines) – therefore a total of 947 records (n=176 studies) were identified. We excluded 61 studies (n=252 records), either due to a population other than heart failure (n=38 studies), a non-pharmacological intervention (n=5), follow-up shorter than three months (n=16), or a study design other than a RCT (n=2) (see Characteristics of excluded studies). Overall, 115 studies were eligible. Of these, three are ongoing and awaiting publication of primary data (PARAGON-HF 2018; RELAXAHF-2 2017; TMAC 2007) and will be included in a future update of this review. As a result, 112 studies were included in this review (Figure 1).

Example 3

A total of 3191 articles resulted from searching the four databases during the initial search (21 March 2018). After authors removed duplicates, 2822 articles remained for title and abstract review, including 14 articles identified through manual search of references. Two authors (CM and HMB) reviewed the titles and abstracts of all 2822 articles. A third author (SK) resolved any discrepancies. Following this step, two authors (CM and HMB) reviewed the full text of all 114 articles eligible for full-text screening. A third author (SK) resolved any discrepancies. Eighty articles were excluded for the following reasons: they did not have data on the specified outcomes (n=27), used qualitative methodologies (n=27), focused on a tobacco product other than ecigarettes (n=12), were only focused on menthol flavour (n=2), was a duplicate (n=1) or were not peer-reviewed, did not include original data, did not include full-text or included only a conference abstract (n=11). Articles that addressed e-cigarettes from the original systematic review (n=17) were then added to the 34 articles identified from this current review, combining for a total of 51 articles included in the final analysis. The study selection processes, which approximate but do not exactly follow the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) methodology, are illustrated in figure 1.” (58)

Example 4

See page 52 of Supplement B of the systematic review by Mayo-Wilson et al., available at https://ars.elscdn.com/content/image/1-s2.0-S0895435617307217-mmc1.pdf.

16b Study selection

Cite studies that might appear to meet the inclusion criteria, but which were excluded, and explain why they were excluded.

Explanation

Identifying the excluded records allows readers to make an assessment of the validity and applicability of the systematic review. At a minimum, a list of studies that might appear to meet the inclusion criteria but which were excluded, with citation and a reason for exclusion, should be reported. This would include studies meeting most inclusion criteria (such as those with appropriate intervention and population but an ineligible control or study design). It is also useful to list studies that were potentially relevant but for which the full text or data essential to inform eligibility were not accessible. This information can be reported in the text or as a list/table in the report or in an online supplement. Potentially contentious exclusions should be clearly stated in the report.

Essential elements

Cite studies that might appear to meet the inclusion criteria, but which were excluded, and explain why they were excluded.

Example/s:

Example 1

We excluded seven studies from our review (Bosiers 2015; ConSeQuent; DEBATE-ISR; EXCITE ISR; NCT00481780; NCT02832024; RELINE), and we listed reasons for exclusion in the Characteristics of excluded studies tables. We excluded studies because they compared stenting in Bosiers 2015 and RELINE, laser atherectomy in EXCITE ISR, or cutting balloon angioplasty in NCT00481780 versus uncoated balloon angioplasty for in-stent restenosis. The ConSeQuent trial compared DEB versus uncoated balloon angioplasty for native vessel restenosis rather than in-stent restenosis. The DEBATE-ISR study compared a prospective cohort of patients receiving DEB therapy for in-stent restenosis against a historical cohort of diabetic patients. Finally, the NCT02832024 study compared stent deployment versus atherectomy versus uncoated balloon angioplasty alone for in-stent restenosis.

Example 2

Of the remaining 64 articles, 54 were excluded for a variety of other reasons (Fig. 1). Ultimately, this review included a total of ten studies. All included studies were present in the initial database search. Excluded articles are listed in appendix Table B including reasons for exclusion.” Citations for the excluded articles are listed in Appendix E.

17. Study characteristics

Cite each included study and present its characteristics.

Explanation

Reporting the details of the included studies allows readers to understand the characteristics of studies that have addressed the review question(s) and is therefore important for understanding the applicability of the review. Characteristics of interest might include study design features, characteristics of participants, how outcomes were ascertained (such as smoking cessation self reported or biochemically validated, or specific harms systematically assessed or reported by participants as they emerged), funding source, and competing interests of study authors. Presenting the key characteristics of each study in a table or figure can facilitate comparison of characteristics across the studies.92 Citing each study enables retrieval of relevant reports if desired.

For systematic reviews of interventions, presenting an additional table that summarises the intervention details for each study (such as using the template based on the Template for Intervention Description and Replication (TIDieR) has several benefits. An intervention summary table helps readers compare the characteristics of the interventions and consider those that may be feasible for implementation in their setting; highlights missing or unavailable details; shows which studies did not specify certain characteristics as part of the intervention; and highlights characteristics that have not been investigated in existing studies.

Essential elements

Cite each included study.

Present the key characteristics of each study in a table or figure (considering a format that will facilitate comparison of characteristics across the studies).

Additional elements

If the review examines the effects of interventions, consider presenting an additional table that summarises the intervention details for each study.

Example/s:

Example 1

Of the 12 unique studies, three were prospective cohort studies, (15 18 22) three were case-control studies, (20 25 26) and six were cross sectional studies (14 16 17 23 24 39) (table 1).

Example 2

Table 1 shows the characteristics of the included clinical trials.

Example 3

A summary of the main intervention components is described using the items from the Template for Intervention Description and Replication (TIDieR) checklist (see Table 1).

18. Risk of bias in studies

Present assessments of risk of bias for each included study.

Explanation

For readers to understand the internal validity of a systematic review’s results, they need to know the risk of bias in results of each included study. Reporting only summary data (such as “two of eight studies successfully blinded participants”) is inadequate because it fails to inform readers which studies had each particular methodological shortcoming. A more informative approach is to present tables or figures indicating for each study the risk of bias in each domain/component/item assessed (such as blinding of outcome assessors, missing outcome data), so that users can understand what factors led to the overall study-level risk of bias judgment.

Essential elements

Present tables or figures indicating for each study the risk of bias in each domain/component/item assessed and overall study-level risk of bias.

Present justification for each risk of bias judgment—for example, in the form of relevant quotations from reports of included studies.

Additional elements

If assessments of risk of bias were done for specific outcomes or results in each study, consider displaying risk of bias judgments on a forest plot, next to the study results, so that the limitations of studies contributing to a particular meta-analysis are evident (see Sterne et al86 for an example forest plot).

Example/s:

Example 1

We used the RoB 2.0 tool to assess risk of bias for each of the included studies. A summary of these assessments is provided in Table 1. In terms of overall risk of bias, there were concerns about risk of bias for the majority of studies (20/24), with two of these assessed as at high risk of bias (Musher-Eizenman 2010; Wansink 2013a). A text summary is provided below for each of the six individual components of the 'Risk of bias' assessment. Justifications for assessments are available at the following (http://dx.doi.org/10.6084/m9.figshare.9159824)

Example 2

Fig 2. Forest plot (including the risk of bias assessment) demonstrating significant reduction in the risk of acute grade 2 or worse xerostomia with intensity modulated radiation therapy (IMRT) compared to conventional techniques. Note comparable benefit of IMRT over two-dimensional radiotherapy (2D-RT) and three-dimensional radiotherapy (3D-RT) on subgroup analyses. https://doi.org/10.1371/journal.pone.0200137.g002.

19. Results of individual studies

For all outcomes, present for each study (a) summary statistics for each group (where appropriate) and (b) an effect estimate and its precision (such as confidence/credible interval), ideally using structured tables or plots.

Explanation

Presenting data from individual studies facilitates understanding of each study’s contribution to the findings and reuse of the data by others seeking to perform additional analyses or perform an update of the review. There are different ways of presenting results of individual studies (such as table, forest plot). Visual display of results supports interpretation by readers, while tabulation of the results makes it easier for others to reuse the data.

Displaying summary statistics by group is helpful, because it allows an assessment of the severity of the problem in the studies (such as level of depression symptoms), which is not available from between-group results (that is, effect estimates). However, there are some scenarios where presentation of simple summary statistics for each group may be misleading. For example, in the case of cluster-randomised designs, the observed number of events and sample size in each group does not reflect the effective sample size (that is, the sample size adjusted for correlation among observations). However, providing the estimated proportion of events (or another summary statistic) per group will be helpful. The effect estimates from models that appropriately adjust for clustering (and other design features) should be reported and included in the meta-analysis in such instances.

Essential elements

For all outcomes, irrespective of whether statistical synthesis was undertaken, present for each study summary statistics for each group (where appropriate). For dichotomous outcomes, report the number of participants with and without the events for each group; or the number with the event and the total for each group (such as 12/45). For continuous outcomes, report the mean, standard deviation, and sample size of each group.

For all outcomes, irrespective of whether statistical synthesis was undertaken, present for each study an effect estimate and its precision (such as standard error or 95% confidence/credible interval). For example, for time-to-event outcomes, present a hazard ratio and its confidence interval.

If study-level data are presented visually or reported in the text (or both), also present a tabular display of the results.

If results were obtained from multiple sources (such as journal article, study register entry, clinical study report, correspondence with authors), report the source of the data. This need not be overly burdensome. For example, a statement indicating that, unless otherwise specified, all data came from the primary reference for each included study would suffice. Alternatively, this could be achieved by, for example, presenting the origin of each data point in footnotes, in a column of the data table, or as a hyperlink to relevant text highlighted in reports (such as using SRDR Data Abstraction Assistant).

If applicable, indicate which results were not reported directly and had to be computed or estimated from other information (see item #13b).

Example/s:

Example 1

For an example of individual study results presented for a dichotomous outcome, see “Fig 2. Meta-analysis of comparative effects between quadruple and triple combination antiretroviral therapies (cART) as first line treatment for people with HIV, on undetectable HIV-1 RNA”

For an example of individual study results presented for a continuous outcome, see “Fig 3. Meta-analysis of comparative effects between quadruple and triple combination antiretroviral therapies (cART) as first line treatment for people with HIV, on increase in CD4 T cell count (cells/μL)”

Example 2

See meta-analysis in Figure 3. The footnote to this forest plot states: “CSPP100A2308 study: the SBP reduction in the treatment and placebo group are reported from the CSR page 61. CSPP100A2405 study: the SD for all treatment groups are calculated from SEM reported on page 7 in the CSR”.

20a Results of syntheses

For each synthesis, briefly summarise the characteristics and risk of bias among contributing studies.

Explanation

Many systematic review reports include narrative summaries of the characteristics and risk of bias across all included studies. However, such general summaries are not useful when the studies contributing to each synthesis vary, and particularly when there are many studies. For example, one meta-analysis might include three studies of participants aged 30 years on average, whereas another meta-analysis might include 10 studies of participants aged 60 years on average; in this case, knowing the mean age per synthesis is more meaningful than the overall mean age across all 13 studies. Providing a brief summary of the characteristics and risk of bias among studies contributing to each synthesis (meta-analysis or other) should help readers understand the applicability and risk of bias in the synthesised result. Furthermore, a summary at the level of the synthesis is more usable since it obviates the need for readers to refer to multiple sections of the review in order to interpret results.

Essential elements

Provide a brief summary of the characteristics and risk of bias among studies contributing to each synthesis (meta-analysis or other). The summary should focus only on study characteristics that help in interpreting the results (especially those that suggest the evidence addresses only a restricted part of the review question, or indirectly addresses the question). If the same set of studies contribute to more than one synthesis, or if the same risk of bias issues are relevant across studies for different syntheses, such a summary need be provided once only.

Indicate which studies were included in each synthesis (such as by listing each study in a forest plot or table or citing studies in the text).

Example/s:

Example 1

Twelve included studies (described in 13 publications) assessed the effectiveness of Baby-Friendly Hospital Initiative interventions. All focused on postpartum women enrolled from hospital wards or birth facilities soon after delivery. Studies were conducted in diverse country settings including the United States (two studies); Taiwan (two studies); and one each in the Republic of Belarus, Hong Kong, Czech Republic, Russia, Croatia, Brazil, United Kingdom (multiple regions), and Scotland. All studies focused on multiple hospitals (>4) or clusters of hospitals. The majority of studies focused on women giving birth between 2000 and 2009; two enrolled women in the late 1990s…One included study was an RCT, 10 were prospective cohort studies, and 1 was a single-group pre-post study…In terms of population characteristics, seven studies reported on maternal age and generally enrolled women in their 20s and 30s. Three studies (set in the United States and United Kingdom) reported on race; the percentage of non-white participants enrolled ranged from 3 to 47 percent. In the six studies reporting on the percentage of enrolled women who were primiparous, the range was 38 to 67 percent

Example 2

Overall, 61 studies described in 83 publications investigated our included tools for determining stroke risk in patients with nonvalvular AF and met the other inclusion criteria for Key Question 1. The included studies explored tools in studies of diverse quality, design, funding, and geographical location. Forty-three included studies were of good quality or rated as low risk of bias, 11 of fair quality or rated as medium risk of bias, and 7 were of poor quality or rated as high risk of bias. Studies with increased risk of bias had potential limitations related to handling of missing data, length of follow up between groups, blinding of outcomes assessors, whether confounders were assessed with reliable measures, and whether potential outcomes were prespecified. The studies covered broad geographical locations with 32 studies conducted in UK or continental Europe, 18 exclusively in the United States, 3 studies exclusively conducted in Canada, and 7 multinational trials. There was one study that did not report geographic location of enrolment. Ten studies were supported solely by industry, 8 studies received solely government support, 6 studies were supported by non-government, non-industry organizations, 15 studies received funding from multiple sources including government, industry, non-government and non-industry, and 22 studies did not report funding or it was unclear. We identified 52 studies using observational study design (prospective and retrospective cohorts) while 9 studies were identified as randomized controlled trials (RCTs).

Example 3

Nine randomized controlled trials (RCTs) directly compared delirium incidence between haloperidol and placebo groups. These RCTs enrolled 3,408 patients in both surgical and medical intensive care and non-intensive care unit settings and used a variety of validated delirium detection instruments. Five of the trials were low risk of bias, three had unclear risk of bias, and one had high risk of bias owing to lack of blinding and allocation concealment. Intravenous haloperidol was administered in all except two trials; in those two exceptions, oral doses were given. These nine trials were pooled, as they each identified new onset of delirium (incidence) within the week after exposure to prophylactic haloperidol or placebo.

Example 4

See Graphical Overview for Evidence Reviews visual summary.

20b Results of syntheses

Present results of all statistical syntheses conducted. If meta-analysis was done, present for each the summary estimate and its precision (such as confidence/credible interval) and measures of statistical heterogeneity. If comparing groups, describe the direction of the effect.

Explanation

Users of reviews rely on the reporting of all statistical syntheses conducted so that they have complete and unbiased evidence on which to base their decisions. Studies examining selective reporting of results in systematic reviews have found that 11% to 22% of reviews did not present results for at least one pre-specified outcome of the review.

Essential elements

Report results of all statistical syntheses described in the protocol and all syntheses conducted that were not pre-specified.

If meta-analysis was conducted, report for each: • the summary estimate and its precision (such as standard error or 95% confidence/credible interval). • measures of statistical heterogeneity (such as τ2, I2, prediction interval).

If other statistical synthesis methods were used (such as summarising effect estimates, combining P values), report the synthesised result and a measure of precision (or equivalent information, for example, the number of studies and total sample size).

If the statistical synthesis method does not yield an estimate of effect (such as when P values are combined), report the relevant statistics (such as P value from the statistical test), along with an interpretation of the result that is consistent with the question addressed by the synthesis method (for example, “There was strong evidence of benefit of the intervention in at least one study (P < 0.001, 10 studies)” when P values have been combined).

If comparing groups, describe the direction of effect (such as fewer events in the intervention group, or higher pain in the comparator group). If synthesising mean differences, specify for each synthesis, where applicable, the unit of measurement (such as kilograms or pounds for weight), the upper and lower limits of the measurement scale (for example, anchors range from 0 to 10), direction of benefit (for example, higher scores denote higher severity of pain), and the minimally important difference, if known. If synthesising standardised mean differences and the effect estimate is being re-expressed to a particular instrument, details of the instrument, as per the mean difference, should be reported.

Example/s:

Example 1

Twelve studies [each study cited], including a total of 159,086 patients, reported on the rate of major bleeding complications. Aspirin use was associated with a 46% relative risk increase of major bleeding complications (risk ratio 1.46; 95% CI, 1.30-1.64; p < 0.00001; I2 = 31%; absolute risk increase 0.077%; number needed to treat to harm 1295; Fig 1)

Example 2

Physical function (BASFI, 0 to 10 scale; lower score indicates higher function): Seven studies (312 participants) found a reduction in physical function score with exercise versus no intervention at the end of the intervention (mean difference (MD) -1.3, 95% confidence interval (CI) -1.7 to -0.9); absolute risk difference 13% (95% CI 9% to 17%); relative change 32% (95% CI 23% to 42%); Analysis 1.1). The statistical heterogeneity was not important (I²= 23%). There was no important clinical meaningful benefit.

Example 3

Score-based measures of implementation were the most common continuous outcomes in studies comparing an implementation strategy with usual practice or minimal support control and were reported in 11 studies including nine randomised trials. Pooled analysis providing moderate-certainty evidence including all nine randomised trials with score-based measures of implementation [each study cited] reported an improvement (standardised mean difference 0.49; 95% confidence interval 0.19 to 0.79; I2 = 54%; P < 0.001; participants = 495 services; equivalent to a mean difference of 0.88 on the Environment and Policy Assessment and Observation (EPAO) scale) favouring groups receiving implementation support strategies (Analysis 1.1).

Example 4

Ten studies compared the effects of using a sit-stand desk with or without information and counselling to the effects of using a sit-desk [each study cited]. The pooled analysis showed that the sit-stand desk with or without information and counselling intervention reduced sitting time at work by on average 100 minutes per eight-hour workday (95% confidence interval -116 to - 84, I² = 37%; Analysis 1.1)… Data presented by one study, Sandy 2016, did not allow for calculation of time spent in sitting time at work and therefore we did not include the study in the quantitative synthesis. The prediction interval for sitting time ranged from -146 to -54 minutes a day.

Example 5

The albatross plot for uptake of antenatal care is shown in Fig 2. Seven of the nine data points showed a positive association between conditional cash transfers and antenatal care uptake. Most of the studies were around the relative risk contour of 1.05, corresponding to about a 2-3 percentage point increase in uptake of antenatal care. However, the studies on Mi Familia Progresa, Programa de Asignación Familia and Program Keluarga Harapan all fall around the 1.05 relative risk contour, and have percentage point increases of 7.2-18.7, therefore a 10-percentage point increase might be a more reasonable estimate of the effect of conditional cash transfers on uptake of antenatal care; we considered this to be a large effect.

20c Results of syntheses

Present results of all investigations of possible causes of heterogeneity among study results.

Explanation

Presenting results from all investigations of possible causes of heterogeneity among study results is important for users of reviews and for future research. For users, understanding the factors that may, and equally, may not, explain variability in the effect estimates, may inform decision making. Similarly, presenting all results is important for designing future studies. For example, the results may help to generate hypotheses about potential modifying factors that can be tested in future studies or help identify “active” intervention ingredients that might be combined and tested in a future randomised trial. Selective reporting of the results leads to an incomplete representation of the evidence that risks misdirecting decision making and future research.

Essential elements

If investigations of possible causes of heterogeneity were conducted:

• present results regardless of the statistical significance, magnitude, or direction of effect modification. • identify the studies contributing to each subgroup. • report results with due consideration to the observational nature of the analysis and risk of confounding due to other factors.

If subgroup analysis was conducted, report for each analysis the exact P value for a test for interaction as well as, within each subgroup, the summary estimates, their precision (such as standard error or 95% confidence/credible interval) and measures of heterogeneity. Results from subgroup analyses might usefully be presented graphically (see Fisher et al).

If meta-regression was conducted, report for each analysis the exact P value for the regression coefficient and its precision.

If informal methods (that is, those that do not involve a formal statistical test) were used to investigate heterogeneity—which may arise particularly when the data are not amenable to meta-analysis—describe the results observed. For example, present a table that groups study results by dose or overall risk of bias and comment on any patterns observed.

Additional elements

If subgroup analysis was conducted, consider presenting the estimate for the difference between subgroups and its precision. If meta-regression was conducted, consider presenting a meta-regression scatterplot with the study effect estimates plotted against the potential effect modifier.

Example/s:

Example 1

Among the 4 trials that recruited critically ill patients who were and were not receiving invasive mechanical ventilation at randomization, the association between corticosteroids and lower mortality was less marked in patients receiving invasive mechanical ventilation (ratio of ORs, 4.34 [95% CI, 1.46-12.91]; P = .008 based on within-trial estimates combined across trials); however, only 401 patients (120 deaths) contributed to this comparison…All trials contributed data according to age group and sex. For the association between corticosteroids and mortality, the OR was 0.69 (95% CI, 0.51-0.93) among 880 patients older than 60 years, the OR was 0.67 (95% CI, 0.48-0.94) among 821 patients aged 60 years or younger (ratio of ORs, 1.02 [95% CI, 0.63-1.65], P = .94), the OR was 0.66 (95% CI, 0.51-0.84) among 1215 men, and the OR was 0.66 (95% CI, 0.43-0.99) among 488 women (ratio of ORs, 1.07 [95% CI, 0.58-1.98], P = .84).” (74)

Example 2

Interventions using a case manager with a nursing background showed a greater positive effect on caregiver quality of life compared to those that used other professional backgrounds (standardised mean difference = 0.94 versus 0.03, respectively; p < 0.001). Interventions that did not provide case managers with supervision showed greater effectiveness for reducing the percentage of patients that are institutionalised compared to those that provided supervision (odds ratio = 0.27 versus 0.96 respectively; p = 0.02). There was weak evidence that interventions using a lower caseload for case managers had greater effectiveness for reducing the number of patients institutionalised compared to interventions using a higher caseload for case managers (odds ratio = 0.23 versus 1.20 respectively; p = 0.08). There was little evidence that the other intervention components modify treatment effects (see Table 3).

Example 3

The results of the five meta-regressions…are highlighted in Table 5. The training duration, frequency, total trainings dose and training-to-sustainability ratio showed no impact on the effect size of the primary outcome pain. The PEDro sum score was negatively associated with the effect size; a study with a score-decrease of 1 point shows an increase in the effect size of .24. Fig 9 illustrates this association.

Example 4

Meta-regression results revealed that, when controlling for other explanatory variables, drug administration conditions were linked with pain reduction among included studies, such that cannabinoids (whole-plant cannabis and whole-cannabis extracts) β = −0.43, 95% confidence interval (CI) (−0.62, −0.24), p < 0.05 (Figure 4), and synthetic cannabinoids (Dronabinol, Nabilone, and CT3) β = −0.39, 95% CI (−0.65, −0.14), p < 0.05 (Figure 4), performed better than placebo. Furthermore, meta-regression results showed that, when controlling for other explanatory variables, sample size was linked with pain reduction, β = 0.01, 95% CI (0.00, 0.01), p < 0.05, such that studies involving smaller samples tended to report greater pain reduction effects (Figure 4). There were no observed interactions between drug administration conditions and sample size. Finally, meta-regression results showed that, when controlling for other explanatory variables, sample sex composition was linked with a modest, however non-significant, effect, β = −0.64, 95% CI (−1.37, 0.09), p = 0.09, such that studies including more female participants tended to report greater pain reductions (Figure 5).

20d Results of syntheses

Present results of all sensitivity analyses conducted to assess the robustness of the synthesised results.

Explanation

Presenting results of sensitivity analyses conducted allows readers to assess how robust the synthesised results were to decisions made during the review process. Reporting results of all sensitivity analyses is important; presentation of a subset, based on the nature of the results, risks introducing bias due to selective reporting. Forest plots are a useful way to present results of sensitivity analyses; however, these may be best placed in an appendix, with the main forest plots presented in the main report, to not reduce readability. An exception may be when sensitivity analyses reveal the results are not robust to decisions made during the review process.

Essential elements

If any sensitivity analyses were conducted:

• report the results for each sensitivity analysis.

• comment on how robust the main analysis was given the results of all corresponding sensitivity analyses.

Additional elements

If any sensitivity analyses were conducted, consider:

• presenting results in tables that indicate: (i) the summary effect estimate, a measure of precision (and potentially other relevant statistics, for example, I2 statistic) and contributing studies for the original meta-analysis; (ii) the same information for the sensitivity analysis; and (iii) details of the original and sensitivity analysis assumptions.

• presenting results of sensitivity analyses visually using forest plots.

Example/s:

Example 1

The magnitude of the pooled effect remained relatively stable in sensitivity analyses (table S13 in appendix 10)

Example 2

Sensitivity analyses that removed studies with potential bias showed consistent results with the primary meta-analyses (risk ratio 1.00 for undetectable HIV-1 RNA, 1.00 for virological failure, 0.98 for severe adverse effects, and 1.02 for AIDS defining events; supplement 3E, 3F, 3H, and 3I, respectively). Such sensitivity analyses were not performed for other outcomes because none of the studies reporting them was at a high risk of bias. Sensitivity analysis that pooled the outcome data reported at 48 weeks, which also showed consistent results, was performed for undetectable HIV-1 RNA and increase in CD4 T cell count only (supplement 3J and 3K) and not for other outcomes owing to lack of relevant data. When the standard deviations for increase in CD4 T cell count were replaced by those estimated by different methods, the results of figure 3 either remained similar (that is, quadruple and triple arms not statistically different) or favoured triple therapies (supplement 2).

Example 3

Table 3 shows the results of the secondary sensitivity analyses. Re-rupture rate was reported in 17 (59%) high quality studies – 10 randomized controlled trials and seven observational studies. The overall pooled effect showed that operative treatment was associated with a significant reduction in re-rupture rate compared with nonoperative treatment (risk difference 5.1%; risk ratio 0.44, 0.30 to 0.64; P<0.001; I2=0%) (supplementary figure F). Re-rupture rate was reported in 14 studies (48%) with a study period after the year 2000 – six randomized controlled trials and eight observational studies. The overall pooled effect showed a significant reduction in re-rupture rate after operative treatment compared with nonoperative treatment (risk difference 0.9%; risk ratio 0.59, 0.42 to 0.83; P=0.002; I2 =10%) (supplementary figure G).

21. Risk of reporting biases in syntheses

Present assessments of risk of bias due to missing results (arising from reporting biases) for each synthesis assessed.

Explanation

Presenting assessments of the risk of bias due to missing results in syntheses allows readers to assess potential threats to the trustworthiness of a systematic review’s results. Providing the evidence used to support judgments of risk of bias allows readers to determine the validity of the assessments.

Essential elements

Present assessments of risk of bias due to missing results (arising from reporting biases) for each synthesis assessed. If a tool was used to assess risk of bias due to missing results in a synthesis, present responses to questions in the tool, judgments about risk of bias, and any information used to support such judgments to help readers understand why particular judgments were made.

If a funnel plot was generated to evaluate small-study effects (one cause of which is reporting biases), present the plot and specify the effect estimate and measure of precision used in the plot (presented typically on the horizontal axis and vertical axis respectively106). If a contour-enhanced funnel plot was generated, specify the “milestones” of statistical significance that the plotted contour lines represent (P=0.01, 0.05, 0.1, etc).

If a test for funnel plot asymmetry was used, report the exact P value observed for the test and potentially other relevant statistics, such as the standardised normal deviate, from which the P value is derived.

If any sensitivity analyses seeking to explore the potential impact of missing results on the synthesis were conducted, present results of each analysis (see item #20d), compare them with results of the primary analysis, and report results with due consideration of the limitations of the statistical method.

Additional elements

If studies were assessed for selective non-reporting of results by comparing outcomes and analyses pre-specified in study registers, protocols, and statistical analysis plans with results that were available in study reports, consider presenting a matrix (with rows as studies and columns as syntheses) to present the availability of study results.

If an assessment of selective non-reporting of results reveals that some studies are missing from the synthesis, consider displaying the studies with missing results underneath a forest plot or including a table with the available study results (for example, see forest plot in Page et al).

Example/s:

Example 1

Clinical global impression of change was assessed in Doody 2008, NCT00912288, CONCERT and CONNECTION using the CIBICPlus. However, we were only able to extract results from Doody 2008 [because no results for CIBIC-Plus were reported in the other three studies]…The authors reported small but significant improvements on the CIBIC-Plus for 183 patients (89 on latrepirdine and 94 on placebo) favouring latrepirdine following the 26-week primary endpoint (MD -0.60, 95% CI -0.89 to -0.31, P < 0.001). Similar results were found at the additional 52-week follow-up (MD -0.70, 95% CI -1.01 to -0.39, P < 0.001). However, we considered this to be low quality evidence due to imprecision and reporting bias. Thus, we could not draw conclusions about the efficacy of latrepirdine in terms of changes in clinical impression.

Example 2

There is evidence of possible funnel plot asymmetry providing data on response to short-term medication treatment, both for the SSRIs and all medications combined. Inspection of the contour enhanced funnel plots for the SSRIs (Figure 4) and all of the trials (Figure 5) suggests that this asymmetry is due to publication bias, as trials with less precise treatment response outcomes are more likely than their higher precision counterparts to be missing from regions of the plot representing statistically nonsignificant treatment effects. Egger regression tests quantitatively confirmed this visual impression, providing evidence of possible publication bias for all of the medication trials (t = 2.8226, df = 49, P = 0.0069) and for the SSRIs (t = 2.6426, df = 22, P =0.015).

Example 3

To examine small study and publication bias we created a contour-enhanced funnel plot of the 11 effect sizes plotted against their standard errors (see Figure 32). Visual inspection of the funnel plot reveals an absence of adverse intervention effects. Given the absence of negative effects in the regions of statistical significance and non-significance, the results from this contour enhanced funnel plot indicate a potential risk of publication bias. To further investigate the possibility of bias, we conducted an Egger test for funnel plot asymmetry. The results provided no significant evidence of small study effects (bias coefficient: 0.36; t: −0.60, p = .56)...With these collective findings, we therefore conclude that the meta-analysis results shown in Figure 31 are likely robust to any small study/publication bias.

22. Certainty of evidence

Present assessments of certainty (or confidence) in the body of evidence for each outcome assessed.

Explanation

An important feature of systems for assessing certainty, such as GRADE, is explicit reporting of both the level of certainty (or confidence) in the evidence and the basis for judgments. Evidence summary tables, such as GRADE Summary of Findings tables, are an effective and efficient way to report assessments of the certainty of evidence.

Essential elements

Report the overall level of certainty in the body of evidence (such as high, moderate, low, or very low) for each important outcome.

Provide an explanation of reasons for rating down (or rating up) the certainty of evidence (such as in footnotes to an evidence summary table). Explanations for each judgment should be concise, informative, relevant to the target audience, easy to understand, and accurate (that is, addressing criteria specified in the methods guidance).

Communicate certainty in the evidence wherever results are reported (that is, abstract, evidence summary tables, results, conclusions). Use a format appropriate for the section of the review. For example, in text, certainty might be reported explicitly in a sentence (such as “Moderate-certainty evidence (downgraded for bias) indicates that…”) or in brackets alongside an effect estimate (such as “[RR 1.17, 95% CI 0.81 to 1.68; 4 studies, 1781 participants; moderate certainty evidence]”). When interpreting results in “summary of findings” tables or conclusions, certainty might be communicated implicitly using standard phrases (such as “Hip protectors probably reduce the risk of hip fracture slightly”).

Additional elements

Consider including evidence summary tables, such as GRADE Summary of Findings tables.

Example/s:

Example 1

Compared with non-operative treatment, low-certainty evidence indicates surgery (repair with subacromial decompression) may have little or no effect on function at 12 months. The evidence was downgraded two steps, once for bias and once for imprecision – the 95% CIs overlap minimal important difference in favour of surgery at this time point.

The summary of findings table presents the same information as the text above, with footnotes explaining judgements.

Example 2

Polyunsaturated fatty acids (PUFAs) were superior compared to placebo in reducing anxiety in individuals with autism spectrum disorder (standardised mean difference -1.01, 95% CI -1.86 to -0.17; very low certainty of evidence)…Summary of findings for the comparisons PUFAs versus placebo and PUFAs versus healthy diet are presented in Table 2 and Table 3.

Discussion

23a Results in context

Provide a general interpretation of the results in the context of other evidence.

Explanation

Discussing how the results of the review relate to other relevant evidence should help readers interpret the findings. For example, authors might compare the current results to results of other similar systematic reviews (such as reviews that addressed the same question using different methods or that addressed slightly different questions) and explore possible reasons for discordant results. Similarly, authors might summarise additional information relevant to decision makers that was not explored in the review, such as findings of studies evaluating the cost-effectiveness of the intervention or surveys gauging the values and preferences of patients.

Essential elements

Provide a general interpretation of the results in the context of other evidence.

Example/s:

Example 1

Although we need to exercise caution in interpreting these findings because of the small number of studies, these findings nonetheless appear to be largely in line with the recent systematic review on what works to improve education outcomes in lowand middle-income countries of Snilstveit et al. (2012). They found that structured pedagogical interventions may be among the effective approaches to improve learning outcomes in low- and middle-income countries. This is consistent with our findings that teacher training is only effective in improving early grade literacy outcomes when it is combined with teacher coaching. The finding is also consistent with our result that technology in education programs may have at best no effects unless they are combined with a focus on pedagogical practices. In line with our study, Snilstveit et al. (2012) also do not find evidence for statistically significant effects of the one-laptop-per-child program. These results are consistent with the results of a meta-analysis showing that technology in education programs are not effective when not accompanied by parent or student training (McEwan, 2015). However, neither Snilstveit et al. (2012) nor McEwan (2015) find evidence for negative effects of the one-laptop-per-child program on early grade literacy outcomes.

Example 2

As outlined in the protocol, the authors were aware of only two previous systematic reviews prior to commencing this study (Carter Anand et al., 2012; Webber et al., 2014). In one sense, the eligibility criteria within the current study were broader and more inclusive; for example, Webber et al. limited their review to mental health users only. The need for a results refinement process further highlights the broad scope of the current review. In another sense, however, this review was more restrictive in terms of the quality of evidence. To this end, quantitative studies were excluded if they were not designed to robustly evaluate effectiveness or did not have a control group, while previous reviews included studies without control groups (for example). Therefore, the studies included in this review are very different, in some respects from those captured in the above reviews. At the same time, however, the findings from this review were consistent in many respects with the two reviews previously identified.

Example 3

The evidence contained in this review is similar to, and extends that of the prior Cochrane Review (Ferri 2006b), which this review updates and replaces, as well as of other narrative reviews which found overall positive effects for AA/TSF interventions (e.g. Kaskutas 2009a; Kelly 2003b). The results presented in this review are also supported by other published analyses. One study from Project MATCH (Longabaugh 1998), found that regardless of whether outpatients’ pre-treatment network was supportive or unsupportive of alcohol use at treatment intake, AA/TSF participants were more likely to be involved with AA, which in turn, subsequently explained the observed lower drinks per drinking day (DDD) and greater PDA advantages for TSF-treated participants observed at the 36-month follow-up. The prior Cochrane Review contained eight studies with 3417 participants (Ferri 2006b), and found that on the whole, AA/TSF interventions were as effective, but not more effective, than the interventions to which they were compared. This new review is based on 27 studies reported in 36 articles and has a total of 10,565 participants. It is considerably larger, comprises more rigorous studies, and found that, compared to other active psychosocial interventions for AUD, AA/TSF interventions often produce greater abstinence – notably continuous abstinence – as well as some reductions in drinking intensity, fewer alcohol-related consequences, and lower alcohol addiction severity. This review also included economic analyses, which augments prior reviews and adds important information regarding the cost-benefits of providing AA/TSF in clinical settings.

23b Limitations of included studies

Discuss any limitations of the evidence included in the review.

Explanation

Discussing the completeness, applicability, and uncertainties in the evidence included in the review should help readers interpret the findings appropriately. For example, authors might acknowledge that they identified few eligible studies or studies with a small number of participants, leading to imprecise estimates; have concerns about risk of bias in studies or missing results; or identified studies that only partially or indirectly address the review question, leading to concerns about their relevance and applicability to particular patients, settings, or other target audiences. The assessments of certainty (or confidence) in the body of evidence (item #22) can support the discussion of such limitations.

Essential elements

Discuss any limitations of the evidence included in the review.

Example/s:

Example 1

Study populations were young, and few studies measured longitudinal exposure. The included studies were often limited by selection bias, recall bias, small sample of marijuana-only smokers, reporting of outcomes on marijuana users and tobacco users combined, and inadequate follow-up for the development of cancer… Most studies poorly assessed exposure, and some studies did not report details on exposure, preventing meta-analysis for several outcomes.

Example 2

…despite the use of a comprehensive search strategy, almost all included studies were from middle-income countries, possibly reflecting the shortage of resources for such studies in low-income countries. This means that our findings cannot be generalized to low-income countries. Also, relatively fewer findings were available from Africa, Southern Europe, and Central, Southern, and Southeastern Asia, which made it challenging to generalize conclusions about low and middle income countries.

23c Limitations of the review methods

Discuss any limitations of the review processes used.

Explanation

Discussing limitations, avoidable or unavoidable, in the review process should help readers understand the trustworthiness of the review findings. For example, authors might acknowledge the decision to restrict eligibility to studies in English only, search only a small number of databases, have only one reviewer screen records or collect data, or not contact study authors to clarify unclear information. They might also acknowledge that they were unable to access all potentially eligible study reports or to carry out some of the planned analyses because of insufficient data.149150 While some limitations may affect the validity of the review findings, others may not.

Essential elements

Discuss any limitations of the review processes used and comment on the potential impact of each limitation.

Example/s:

Example 1

Because of time constraints…we dually screened only 30% of the titles and abstracts; for the rest, we used single screening. A recent study showed that single abstract screening misses up to 13% of relevant studies (Gartlehner 2020). In addition, single review authors rated risk of bias, conducted data extraction and rated certainty of evidence. A second review author checked the plausibility of decisions and the correctness of data. Because these steps were not conducted dually and independently, we introduced some risk of error…Nevertheless, we are confident that none of these methodological limitations would change the overall conclusions of this review. Furthermore, we limited publications to English and Chinese languages. Because COVID-19 has become a rapidly evolving pandemic, we might have missed recent publications in languages of countries that have become heavily affected in the meantime (e.g. Italian or Spanish).

Example 2

We acknowledge several limitations…Although our network meta-analysis included all available randomized controlled trials, we could not conduct a subgroup analysis to identify a specific group of patients who could benefit from triple therapy more prominently…Because studies reporting information – such as eosinophil counts and chronic bronchitis – were fewer than expected, we could not generate a sufficient network for the sensitivity and meta-regression analyses. In addition, we did not evaluate the symptoms, use of rescue medication, quality of life, and lung function, which are other important outcomes.

Example 3

One of the primary limitations of our work is the heterogeneity of dietary patterns across studies. Although all patterns discriminated between participants with low and high intake of red and processed meat, other food and nutrient characteristics of dietary patterns and the quantity of red and processed meat consumed varied widely across studies. Moreover, the quantity of red and processed meat consumed differed across dietary patterns and studies. For example, one study compared 1.4 versus 3.5 servings of processed meat per week, whereas another compared 0.7 versus 4.9 servings per week. Such inconsistencies may have increased heterogeneity of meta-analyses and potentially reduced the magnitude of observed associations. Also, analyses of extreme categories of adherence may artificially inflate effect estimates and may not be indicative of effects observed at typical levels of adherence. Second, we were unable to analyze the data separately for red and processed meat because authors typically combined them or did not distinguish between them in primary studies.

23d Implications

Discuss implications of the results for practice, policy, and future research.

Explanation

There are many potential end users of a systematic review (such as patients, healthcare providers, researchers, insurers, and policy makers), each of whom will want to know what actions they should take given the review findings. Patients and healthcare providers may be primarily interested in the balance of benefits and harms, while policy makers and administrators may value data on organisational impact and resource utilisation. For reviews of interventions, authors might clarify trade-offs between benefits and harms and how the values attached to the most important outcomes of the review might lead different people to make different decisions. In addition, rather than making recommendations for practice or policy that apply universally, authors might discuss factors that are important in translating the evidence to different settings and factors that may modify the magnitude of effects.

Explicit recommendations for future research—as opposed to general statements such as “More research on this question is needed”—can better direct the questions future studies should address and the methods that should be used. For example, authors might consider describing the type of understudied participants who should be enrolled in future studies, the specific interventions that could be compared, suggested outcome measures to use, and ideal study design features to employ.

Essential elements

Discuss implications of the results for practice and policy.

Make explicit recommendations for future research.

Example/s:

Example 1

Implications for practice and policy: Findings from this review indicate that bystander programs have significant beneficial effects on bystander intervention behaviour. This provides important evidence of the effectiveness of mandated programs on college campuses. Additionally, the fact that our (preliminary) moderator analyses found program effects on bystander intervention to be similar for adolescents and college students suggests early implementation of bystander programs (i.e., in secondary schools with adolescents) may be warranted. Importantly, although we found that bystander programs had a significant beneficial effect on bystander intervention behaviour, we found no evidence that these programs had an effect on participants' sexual assault perpetration. Bystander programs may therefore be appropriate for targeting bystander behaviour, but may not be appropriate for targeting the behaviour of potential perpetrators. Additionally, effects of bystander programs on bystander intervention behaviour diminished by 6-month post-intervention. Thus, programs effects may be prolonged by the implementation of booster sessions conducted prior to 6 months post-intervention. Implications for research: Findings from this review suggest there is a fairly strong body of research assessing the effects of bystander programs on attitudes and behaviors. However, there are a couple of important questions worth further exploration. First, according to one prominent logical model, bystander programs promote bystander intervention by fostering prerequisite knowledge and attitudes (Burn, 2009). Our meta-analysis provides inconsistent evidence of the effects of bystander programs on knowledge and attitudes, but promising evidence of short-term effects on bystander intervention. This casts uncertainty around the proposed relationship between knowledge/attitudes and bystander behavior. Although we were unable to assess these issues in the current review, this will be an important direction for future research. Our understanding of the causal mechanisms of program effects on bystander behavior would benefit from further analysis (e.g., path analysis mapping relationships between specific knowledge/attitude effects and bystander intervention). Second, bystander programs exhibit a great deal of content variability, most notably in framing sexual assault as a gendered or gender-neutral problem. That is, bystander programs tend to adopt one of two main approaches to addressing sexual assault: (a) presenting sexual assault as a gendered problem (overwhelmingly affecting women) or (b) presenting sexual assault as a gender-neutral problem (affecting women and men alike). Differential effects of these two types of programs remain largely unexamined. Our analysis indicated that (a) the sex of victims/perpetrators (i.e., portrayed in programs as gender neutral or male perpetrator and female victim) and (b) whether programs were implemented in mixed- or single-sex settings were not significant moderators of program effects on bystander intervention. However, these findings are limited to a single outcome and they should be considered preliminary, as they are based on a small sample (n = 11). Our understanding of the differential effects of gendered versus gender neutral programs would benefit from the design and implementation of high-quality primary studies that make direct comparisons between these two types of programs (e.g., RCTs comparing the effects of two active treatment arms that differ in their gendered approach). Finally, our systematic review and meta-analysis demonstrate the lack of global evidence concerning bystander program effectiveness. Our understanding of bystander programs' generalizability to non-US contexts would be greatly enhanced by high quality research conducted across the world.

Example 2

From this review, it seems like the most prudent thing for school leaders, policymakers, and school mental health professionals to do would be proceed with caution in their embrace of a trauma-informed approach as an overarching framework and conduct rigorous evaluation of this approach. We simply do not have the evidence (yet) to know if this works, and indeed, we do not know if using a trauma-informed approach could actually have unintended negative consequences for traumatized youth and school communities. We also do not have evidence of other potential costs in implementing this approach in schools, whether they be financial, academic, or other opportunity costs, and whether benefits outweigh the costs of implementing and maintaining this approach in schools. That said, calling for caution in adopting trauma-informed care in schools does not preclude schools from continuing to implement evidence-informed programs that target trauma symptoms in youth, or that they should simply wait for the research to provide unequivocal answers. The benefit of the trauma-informed approach being made freely available by SAMHSA and other policymakers is that these components can form the basis for a school (or school district) to begin to adapt and apply this approach in schools.

Example 3

In order of priority, the assessment group suggests the following further research priorities:

  1. Clinical advice to the assessment group is that only radioactive iodine-refractory differentiated thyroid cancer patients experiencing symptoms, or those who have clinically significant progressive disease, are likely to be treated in routine clinical practice. Subgroup analyses suggest that the effects on progression-free survival are similar for patients treated with sorafenib regardless of whether they are symptomatic or asymptomatic. However, these findings are post hoc and include only a minority of symptomatic patients. It is unclear if other outcomes, such as overall survival, objective tumour response rate, adverse events and health-related quality of life, differ by symptomatic or asymptomatic disease. Future studies of patients should aim to include a greater proportion of patients with symptomatic disease and investigate possible differences. Consideration should be given to using the classification of patients as symptomatic or asymptomatic as a randomisation stratification factor.

  2. It would be useful to record, and report, health-related quality of life outcomes from any future clinical study of lenvatinib and sorafenib. In particular, data should be collected, using the EQ-5D questionnaire, throughout the whole trial period, not only from patients whose disease has not progressed. Further research on health-related quality of life from treating patients who have symptomatic disease compared with those who do not is also required.

  3. Currently, evidence does not allow a comparison of the effectiveness of treatment with lenvatinib with the effectiveness of treatment with sorafenib. A head-to-head trial considering these treatments and placebo would generate results that would be valuable to decision-makers.

  4. It would be useful to explore how lenvatinib, sorafenib and best supportive care should be positioned in the treatment pathway.

Other information

24a Registration and protocol

Provide registration information for the review, including register name and registration number, or state that the review was not registered.

Explanation

Stating where the systematic review was registered (such as PROSPERO, Open Science Framework) and the registration number or DOI for the register entry (see box 6) facilitates identification of the systematic review in the register. This allows readers to compare what was pre-specified with what was eventually reported in the review and decide if any deviations may have introduced bias. Reporting registration information also facilitates linking of publications related to the same systematic review (such as when a review is presented at a conference and published in a journal).

Essential elements

Provide registration information for the review, including register name and registration number, or state that the review was not registered.

Example/s:

Example 1

…this systematic review has been registered in the international prospective register of systematic reviews (PROSPERO) under the registration number: CRD42019128569

Example 2

The current protocol is registered with the Open Science Framework (url: https://osf.io/qe43p/)

Example 3

A protocol for this systematic review was developed before the research began; however, this review was not registered

Example 4

The present meta-analysis was not registered online while it was in the planning stage. This of course increases the probability of an unplanned duplication, and does not allow a verification that review methods were carried out as planned.

24b Registration and protocol

Indicate where the review protocol can be accessed, or state that a protocol was not prepared.

Explanation

The review protocol may contain information about the methods that is not provided in the final review report. Providing a citation, DOI, or link to the review protocol allows readers to locate the protocol more easily. Comparison of the methods pre-specified in the review protocol with what was eventually done allows readers to assess whether any deviations may have introduced bias. If the review protocol was not published or deposited in a public repository, or uploaded as a supplementary file to the review report, we recommend providing the contact details of the author responsible for sharing the protocol. If authors did not prepare a review protocol, or prepared one but are not willing to make it accessible, this should be stated to prevent users spending time trying to locate the document.

Essential elements

Indicate where the review protocol can be accessed (such as by providing a citation, DOI, or link) or state that a protocol was not prepared.

Example/s:

Example 1

The review protocol was registered with the International Prospective Register of Systematic Reviews (PROSPERO) database (registration number: CRD42014015219) and the protocol has been published [citation for protocol provided].

Example 2

…this systematic review and meta-analysis protocol has been published elsewhere [citation for the protocol provided].

24c Registration and protocol

Describe and explain any amendments to information provided at registration or in the protocol.

Explanation

Careful consideration of a review’s methodological and analytical approach early on is likely to lessen unnecessary changes after protocol development. However, it is difficult to anticipate all scenarios that will arise, necessitating some clarifications, modifications, and changes to the protocol (such as data available may not be amenable to the planned meta-analysis). For reasons of transparency, authors should report details of any amendments. Amendments could be recorded in various places, including the full text of the review, a supplementary file, or as amendments to the published protocol or registration record.

Essential elements

Report details of any amendments to information provided at registration or in the protocol, noting: (a) the amendment itself, (b) the reason for the amendment, and (c) the stage of the review process at which the amendment was implemented.

Example/s:

Example 1

Differences between protocol and review: In our protocol (Burry 2015), we planned the primary outcome to be duration of delirium, defined as the time from which it was first identified to when it was first resolved (i.e. screened negative as defined by study authors (e.g. first negative screen, two consecutive screenings)), measured in days, and our secondary outcome to be the total duration of delirium, measured in days. There was far more variability in the definition of the outcome used than we had anticipated. Only two trials reported on the duration of delirium's first episode, and the remaining trials reported days with delirium, time in delirium, or total duration of delirium; most did not report when delirium was identified or how trial authors defined resolution of delirium. We therefore chose to report the total duration of delirium as our primary outcome and to pool the variable definitions. We added the outcome number of days in coma, as this outcome was reported in four trials, and we believed it important to include it in this review, as it is a newer outcome that is likely to be included in subsequent studies.

Example 2

In a review examining the effects of pharmacologic therapies on patients with idiopathic sudden sensorineural hearing loss, the authors describe and explain several amendments to information provided in the protocol:

“We incurred no deviations from the a priori review protocol, with the exception of a minor modification of our modelling approach for: 1) continuous endpoints at baseline and at final follow-up with corresponding standard deviations (but without average changes and corresponding standard deviations per group) in certain studies; and 2) follow-up time due to variations in endpoints assessment time across studies.

Example 3

Differences from protocol: We modified the lower limit for age in our eligibility criteria from 12 years of age to 10 years of age because the age of adolescence was reduced. We used the WHO measures for severe anaemia, defined by haemoglobin levels < 80 g/L instead of < 70 g/L as stated in the protocol. We decided to add adverse events to our list of primary outcomes (instead of secondary) and we changed reinfection rate to a secondary outcome.

Example 4

Differences between protocol and review:

• Title: The original title of the protocol was “Interventions for pruritus of unknown cause”; it was changed to “Interventions for chronic pruritus of unknown origin” as this is currently the most familiar and widely used term among clinicians. We also changed the name of the condition throughout the review.

• Types of participants: Because the only included study had a (minority) subset of patients with a different diagnosis, we added to our inclusion criteria the following: “When we found studies with a subset of patients with a diagnosis of CPUO, we included them if data are presented separately for these patients, or if the majority (> 50%) of the included participants met the inclusion criteria. If data were not available for this subset of participants, we tried to retrieve this information from the investigators before excluding the study.”

• Types of interventions: In the published version of the protocol, we defined “aprepitant” as a systemic intervention in representation of the pharmacological group “substance P and neurokinin 1 receptor (NK1R) antagonist”; therefore for the report of the review, we changed this in the inclusion criteria. We found no studies evaluating the prioritised comparisons: emollient creams, cooling lotions, topical corticosteroids, topical antidepressants, systemic antihistamines, systemic antidepressants, systemic anticonvulsants, and phototherapy. Therefore, instead we created three SoF tables for comparisons of the only study we included.

• Search methods: Due to the large number of excluded studies (n=67), we did not screen the bibliographies of excluded studies for further references to relevant reviews.

• Risk of bias assessment: We have updated the ‘risk of bias’ methods with the new tool ROB 2.0 in line with guidance from the new version of the Cochrane Handbook for Systematic Reviews of Interventions, and based on the protocol “Therapeutic interventions for alcohol dependence in non-inpatient settings: a systematic review and network metaanalysis” (Cheng 2017).

• Methods not implemented: Because this review included only one study, we could not perform any meta-analyses, and hence could not assess publication bias nor perform sensitivity analysis or subgroup analyses. We did not impute missing data because we considered missing data to be minimal (see Appendix 5, item 3.1).

25. Support

Describe sources of financial or non-financial support for the review, and the role of the funders or sponsors in the review.

Explanation

As with any research report, authors should be transparent about the sources of support received to conduct the review. For example, funders may provide salary to researchers to undertake the review, the services of an information specialist to conduct searches, or access to commercial databases that would otherwise not have been available. Authors may have also obtained support from a translation service to translate articles or in-kind use of software to manage or analyse the study data. In some reviews, the funder or sponsor (that is, the individual or organisation assuming responsibility for the initiation and management of the review) may have contributed to defining the review question, determining eligibility of studies, collecting data, analysing data, interpreting results, or approving the final review report. There is potential for bias in the review findings arising from such involvement, particularly when the funder or sponsor has an interest in obtaining a particular result.

Essential elements

Describe sources of financial or non-financial support for the review, specifying relevant grant ID numbers for each funder. If no specific financial or non-financial support was received, this should be stated.

Describe the role of the funders or sponsors (or both) in the review. If funders or sponsors had no role in the review, this should be declared—for example, by stating, “The funders had no role in the design of the review, data collection and analysis, decision to publish, or preparation of the manuscript.”.

Example/s:

Example 1

Funding/Support: This research was funded under contract HHSA290201500009i, Task Order 7, from the Agency for Healthcare Research and Quality (AHRQ), US Department of Health and Human Services, under a contract to support the US Preventive Services Task Force (USPSTF). Role of the Funder/Sponsor: Investigators worked with USPSTF members and AHRQ staff to develop the scope, analytic framework, and key questions for this review. AHRQ had no role in study selection, quality assessment, or synthesis. AHRQ staff provided project oversight, reviewed the report to ensure that the analysis met methodological standards, and distributed the draft for peer review. Otherwise, AHRQ had no role in the conduct of the study; collection, management, analysis, and interpretation of the data; and preparation, review, or approval of the manuscript findings. The opinions expressed in this document are those of the authors and do not reflect the official position of AHRQ or the US Department of Health and Human Services.

Example 2

Funding for statistical analyses was provided by Ferring Pharmaceuticals. Dr. Toren is supported by a clinician-scientist award from Fonds de Recherche du Québec – Santé (#32774).

Example 3

The authors received no financial support for the research, authorship and/or publication of this article.

26. Competing interests

Declare any competing interests of review authors.

Explanation

Authors of a systematic review may have relationships with organisations or entities with an interest in the review findings (for example, an author may serve as a consultant for a company manufacturing the drug or device under review). Such relationships or activities are examples of a competing interest (or conflict of interest), which can negatively affect the integrity and credibility of systematic reviews. For example, evidence suggests that systematic reviews with financial competing interests more often have conclusions favourable to the experimental intervention than systematic reviews without financial competing interests. Information about authors’ relationships or activities that readers could consider pertinent or to have influenced the review should be disclosed using the format requested by the publishing entity (such as using the International Committee of Medical Journal Editors (ICMJE) disclosure form). Authors should report how competing interests were managed for particular review processes. For example, if a review author was an author of an included study, they may have been prevented from assessing the risk of bias in the study results.

Essential elements

Disclose any of the authors’ relationships or activities that readers could consider pertinent or to have influenced the review. If any authors had competing interests, report how they were managed for particular review processes.

Example/s:

Example 1

Declarations of interest: R Buchbinder was a principal investigator of Buchbinder 2009. D Kallmes was a principal investigator of Kallmes 2009 and Evans 2015. D Kallmes participated in IDE trial for Benvenue Medical spinal augmentation device. He is a stockholder, Marblehead Medical, LLC, Development of spine augmentation devices. He holds a spinal fusion patent license, unrelated to spinal augmentation/vertebroplasty. R Buchbinder and D Kallmes did not perform risk of bias assessments for their own or any other placebo-controlled trials included in the review.

Example 2

Competing interests: I have read the journal's policy and the authors of this manuscript have the following competing interests: NK chairs and contributes to a number of guidelines for self-harm and suicidal behaviour and sits on the main government advisory group for suicide prevention in England. NK and DK also advised the Sri Lankan Ministry of Health on their suicide prevention strategy. NK receives research funding from government and charity sources. NK does not receive industry funding or personal remuneration.

Example 3

Declaration of interests: NDS has sat on the paid advisory board of Highmark Interactive, received consulting or speaking fees from WorkSafeBC and Yukon WCB, the National Hockey League, and Major League Soccer, and has received fees for expert testimony in neuropsychology. WGH has received consulting fees or sat on paid advisory boards for the Canadian Agency for Drugs and Technology in Health, AlphaSights, Guidepoint, In Silico, Translational Life Sciences, Otsuka, Lundbeck, and Newron. WJP is the founder and chief executive officer of Translational Life Sciences, an early stage biotechnology company. He is also on the scientific advisory board of Medipure Pharmaceuticals and Vitality Biopharma, and in the past has been on the board of directors for Abattis Bioceuticals and on the advisory board for Vinergy Resources; these companies are early stage biotechnology enterprises with no relation to brain injury. All other authors declare no competing interests.

Example 4

Competing interest: The authors declare that they have no competing interests.

27. Availability of data, code, and other materials

Report which of the following are publicly available and where they can be found: template data collection forms; data extracted from included studies; data used for all analyses; analytic code; any other materials used in the review.

Explanation

Sharing of data, analytic code, and other materials enables others to reuse the data, check the data for errors, attempt to reproduce the findings, and understand more about the analysis than may be provided by descriptions of methods. Support for sharing of data, analytic code, and other materials is growing, including from patients163 and journal editors, including BMJ and PLOS Medicine.

Sharing of data, analytic code, and other materials relevant to a systematic review includes making various items publicly available, such as the template data collection forms; all data extracted from included studies; a file indicating necessary data conversions; the clean dataset(s) used for all analyses in a format ready for reuse (such as CSV file); metadata (such as complete descriptions of variable names, README files describing each file shared); analytic code used in software with a command-line interface or complete descriptions of the steps used in point-and-click software to run all analyses. Other materials might include more detailed information about the intervention delivered in the primary studies that are otherwise not available, such as a video of the specific cognitive behavioural therapy supplied by the study investigators to reviewers.73 Similarly, other material might include a list of all citations screened and any decisions about eligibility.

Because sharing of data, analytic code, and other materials is not yet universal in health and medical research,164 even interested authors may not know how to make their materials publicly available. Data, analytic code, and other materials can be uploaded to one of several publicly accessible repositories (such as Open Science Framework, Dryad, figshare). The Systematic Review Data Repository (https://srdr.ahrq.gov/) is another example of a platform for sharing materials specific to the systematic review community.165 All of these open repositories should be given consideration, particularly if the completed review is to be considered for publication in a paywalled journal. The Findable, Accessible, Interoperable, Reusable (FAIR) data principles are also a useful resource for authors to consult,166 as they provide guidance on the best way to share information.

There are some situations where authors might not be able to share review materials, such as when the review team are custodians rather than owners of individual participant data, or when there are legal or licensing restrictions. For example, records exported directly from bibliographic databases (such as Ovid MEDLINE) typically include copyrighted material; authors should read the licensing terms of the databases they search to see what they can share and to consider the copyright legislation of their countries.

Essential elements

Report which of the following are publicly available: template data collection forms; data extracted from included studies; data used for all analyses; analytic code; any other materials used in the review.

If any of the above materials are publicly available, report where they can be found (such as provide a link to files deposited in a public repository).

If data, analytic code, or other materials will be made available upon request, provide the contact details of the author responsible for sharing the materials and describe the circumstances under which such materials will be shared.

Example/s:

Example 1

All meta-analytic data and all codebooks and analysis scripts (for Mplus and R) are publicly available at the study’s associated page on the Open Science Framework (https://osf.io/r8a24/)...The precise sources (table, section, or paragraph) for each estimate are described in notes in the master data spreadsheet, available on the Open Science Framework page for this study (https://osf.io/r8a24/)

Example 2

All data and code are stored on a repository of the Open Science Framework (doi: 10.17605/OSF.IO/DZJT7)

Example 3

The dataset and script to perform the analyses are available at https://osf.io/q7v2d/?view_only=c3cdaf346298411eab9ed15e863c9f21.

To acknowledge this checklist in your methods, please state "We used the PRISMA checklist when writing our report [citation]". Then cite this checklist as Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, Shamseer L, Tetzlaff JM, Akl EA, Brennan SE, Chou R, Glanville J, Grimshaw JM, Hróbjartsson A, Lalu MM, Li T, Loder EW, Mayo-Wilson E, McDonald S, McGuinness LA, Stewart LA, Thomas J, Tricco AC, Welch VA, Whiting P, Moher D. The PRISMA 2020 statement: An updated guideline for reporting systematic reviews.

The PRISMA checklist is distributed under the terms of the Creative Commons Attribution License CC-BY