The results of our search strategy are presented in Fig. 1. In brief, 2872 records were identified through the searches of the bibliographic databases, of which 463 were duplicates. Six additional articles were identified from reviewing the reference lists of the eligible studies, and three more were obtained from searching government websites. After title-abstract screening, 285 sources were assessed for full-text screening and 29 were identified as eligible for inclusion. Of the 29 studies, 21 reported on the number of under-reported (incomplete) or misclassified deaths, and the total number of maternal deaths identified, thus were suitable for a meta-analysis, but one study  was excluded because their deaths were included in the aggregate of another included study , making the final number of studies included in the meta-analysis (n = 20). Two studies [10, 29] were excluded from subgroup analyses as they were considered to have high risk of bias.
The characteristics of included studies are summarised in Table 1. All the studies had a cross-sectional design, apart from one that was a nested case control study. The latter allowed for comparable data to be extracted and thus included in meta-analysis. More than two thirds of the studies were subnational (n = 19), and six studies were facility-based. More than half of the studies were considered to be of low risk of bias (n = 15), eight were medium and four were high. Most prevalent issues were the high percentage (> 10%) – or non-reporting – of deaths for which a cause of death could not be established. Furthermore, only seven studies had complete coverage (national and investigates all deaths to women of the reproductive age). Two manuscripts were in Spanish, and the remaining 27 were retrieved in English.
Fourteen studies came from high income countries, five from upper-middle-income countries, and the remaining 10 were from low and lower-middle-income countries.
In general, studies from higher income countries more frequently had national coverage and lower risk of bias and tended to compare against a CRVS. Nine out of the 19 high and upper-middle-income studies had national coverage, they mostly investigated a CRVS (14/19) and only one had a high risk of bias score. In contrast, the ten Low and Lower Middle-Income Country studies were all subnational (n = 10), three were of high bias score, and only one made a comparison against a CRVS system.
Of the 20 studies eligible for quantitative synthesis, 15 were of low risk of bias, three were medium and two were high risk (supplementary information 3). One study reported on pregnancy related deaths, and the rest reported on maternal deaths. The two high risk of bias studies – which included the one reporting pregnancy related deaths – were only included when calculating the total under-estimation prevalence, and when we disaggregated under-estimation by risk of bias score.
Methods used to assess the validity of maternal/pregnancy related death data
Thirteen studies reviewed medical/clinical records, sometimes in addition to other sources (forensic reports, death certificates, and criminal reports) to identify maternal deaths. Twelve studies linked or triangulated data from multiple sources to identify incompletely recorded or misclassified deaths. In most cases, the sources used were death certificates, birth and fetal death registers, and/or hospital records often using a unique identification number.
Three studies used the capture-mark-recapture methodology, namely, Haiti, Indonesia and the Philippines [10, 16, 32]. This method is used in public health to determine the size of populations that are difficult to identify , and requires four critical assumptions to be met: 1) a population is fixed,2) individuals from the two sources can be linked,3) capture in the second sample is independent of capture in the first sample; and 4) the probability of capture does not differ between individuals. The study from Indonesia used the District Health System, and interviews with village informants and health volunteers to capture all maternal deaths . In Haiti, the two sources were a register data capture form and the dossier data capture form . In the Philippines study, they used vital registration and the second source was a Reproductive Age Mortality Survey .
The three studies concluded that no single data source was able to capture all deaths. In Indonesia and the Philippines, 49% and 44% of deaths were missed by one of the two sources respectively. In Haiti, where both sources were facility-based, only about a quarter of deaths were captured by each source. The study from Haiti however could not guarantee the first third, and fourth assumptions of the capture-mark-recapture method were met in their study.
A smaller subset of studies (n = 4) used active surveillance and notification to identify maternal deaths, and only one study followed all pregnant women to identify any deaths.
In total, 19 studies conducted their own independent review of cause of death to quantify misclassification. The studies mostly reviewed cause of death using either a verbal autopsy, review of medical records, or a combination of both (Table 2).
Incompleteness of maternal death recording
There was a wide range in the extent of incompleteness, from 0 to 85% across 16 studies, and a pooled proportion of 34% (95% CI: 28–48). We found high between-study heterogeneity across all pooled estimates of incompleteness (I^2 = 91.2%; P < 0.001) (Table 3).
Across the six studies that stratified by cause of death, incompleteness for indirect deaths was higher than for direct deaths (42% and 22% respectively), though confidence intervals were very wide and overlapped substantially (10 -76% and 4—48% respectively). There was evidence of high between study variability in both categories (I2: 96.1% & 87.9% respectively; P < 0.001).
Among three studies stratifying by place of death, incompleteness was higher for death that occurred at home: 75% versus 27% incompleteness, albeit with a notable overlap in the confidence intervals (95%CI 20 to 100 & 6 to 58 respectively). There was strong evidence of between study variability (I^2 = 96.4; & 96.0 respectively) (Table 3).
Deaths occurring either during pregnancy or after 24 h postpartum had a higher incompleteness (52%) compared to deaths occurring during delivery or within 24 h postpartum (25%), with some overlap in the confidence intervals. Notably, there was no evidence of between study heterogeneity in the three categories (Table 3).
Only one study stratified unregistered deaths by maternal age: this study found that deaths at the extremes of maternal age (less than 20 and above 40) were more frequently missed; half of all maternal deaths among adolescents and more than half of all maternal deaths among women aged 40 and above were under-reported, while 28% were missed in the 20–39 age group .
Misclassification of maternal deaths
Sensitivity ranged from 10 to 86% across four studies, and the pooled estimate of sensitivity was 61% (95% CI 37–82) (Table 3). There was only one study reporting information about specificity and found it to be high (98.9%) .
Reported characteristics more prone to misclassification were the cause of maternal death being indirect, extremes of maternal age, the certifier being a physician rather than a coroner or medical examiner, and the deceased being a minority ethnic group (Lin et al., 2019b) [6, 11]. However, the number of deaths in these studies was too low to determine statistical significance.
Three studies (two from USA and one from China, Taiwan Province of China) looked at the impact of adding the pregnancy checkbox to the death certificate  [11, 26],they found it led to the identification of more maternal deaths and therefore an increase in the MMR (from 9 to 22 in the states it was implemented in in the US,from 55 to 82 in China, Taiwan Province of China, per 100,000 live births). However, they also found the checkbox led to an increase in the number of “false positives” and hence, may over-estimate maternal mortality if it is the sole reason for classifying a death as maternal.
Overall under-estimation of maternal deaths
Across 20 studies, underestimation ranged from 0% in Iceland to 85% in Mozambique, with a pooled proportion of 37%, due to incompleteness, misclassification (false negatives), or both. Heterogeneity between studies was high (I^2 = 93.3%; P < 0.001) (see Fig. 2). We found some evidence (P = 0.05) that studies with higher risk of bias score reported higher underestimation (9% 95%CI: 0—36) compared to medium and low risk of bias studies (28% 95%CI: 12—47 and 42% 95%CI: 33–52 respectively). When excluding the two studies with a high risk of bias score, pooled underestimation rose to 39% (95%CI: 30—48).
Under-estimation was higher in studies investigating District or Health Information Management Systems, compared to those investigating a CRVS or an HDSS (49%, 39% and 32% respectively), however there was no evidence of between group difference (p = 0.44), with a notable overlap in the 95% CI (Table 4). The under-estimation proportion was also higher among studies from low or lower middle-income settings (45%) compared to those from high/upper middle-income countries (36%), also with no evidence of between group heterogeneity (p = 0.36) (Table 4). Finally, under-estimation in studies where the mid-year of reporting is after 2010 was lower than those with mid-year between before 2000 and between 2000–2010, again with no evidence of heterogeneity between groups, and notable overlap in the confidence intervals.
Context specific challenges in classifying or registering maternal deaths
Broadly-speaking, there were three groups of challenges to recording maternal deaths. The first was lack of documentation and/or inadequate storage of medical records. One study from Haiti reported that out of 373 deaths to women of reproductive age, there was not sufficient information to determine cause of death for 56.3% of them, due to lack of documentation, or because medical records were damaged in storage . In Switzerland, they could not determine cause of death for seven deaths (of 117 total) due to paucity of information available for reviewers .
Second was, challenges related to stigma, such as cultural beliefs about pregnancy and/or its termination. In Tanzania, induced abortion is illegal, and researchers identified this as a limitation to capturing resulting deaths .
Third, some studies identified issues arising from how the process of death notification/recording was organised. In, there were two separate electronic systems for recording deaths, and they were not sufficiently synchronised leading to one system not including any questions about pregnancy status while the second does . Indonesia relies on a village midwife covering an area, urban or rural, for measuring maternal deaths, and with urban areas being more populated, the incompleteness was higher in urban areas compared to rural. Additionally, urban areas had more private clinics which may have led to midwives not being able to capture all deaths .