Scientific Papers

Protocol for spatial prediction of soil transmitted helminth prevalence in the Western Pacific region using a meta-analytical approach | Systematic Reviews


Data sources and search strategy

This protocol follows the Preferred Reporting Items for Systematic Review and Meta-Analysis Protocols (PRISMA-P) guidelines [19] (Additional file 1) [20]. Should there be a requirement to amend this protocol, the date and detail of each amendment will be described. The systematic review will be undertaken in accordance with the PRISMA-P statement [21] (Additional file 2).

A comprehensive systematic search for epidemiological surveys undertaken from 2000 and published up to 31 October 2023 will be undertaken in five biomedical databases: PubMed, Scopus, ProQuest, Embase, and Web of Science. The search will include grey literature and regional databases, and the reference lists from relevant studies will be hand-searched. Forward and backward citation searching will be used to identify related articles using Google Scholar. The WHO regional classification system will be used to define the countries within the WPR [22]. For each of the 37 countries within the WPR [22], the following search terms will be applied: “soil transmitted helminth*” OR STH OR Ascaris OR Trichuris OR Nectator OR Ancylostoma OR “Strongyloides stercoralis” OR “Strongyloides fuelleborni” OR hookworm* OR roundworm* OR whipworm* OR threadworm*.

Study selection

Studies identified from the systematic search will be uploaded into Endnote X9 (Clarivate Analytics) and duplicates removed. The title and the abstracts will be independently reviewed by two authors (BG and TT) on Rayyan QCRI [23], and short-listed full text articles then evaluated against the eligibility criteria. Any disagreements in the short-listing process will be resolved through discussion, and in the event that consensus cannot be reached, dialogue will be undertaken with a third author (EG).

Studies are required to meet the following inclusion and exclusion criteria:

Inclusion criteria are as follows:

  • Studies that relate to human infection and include the following STH species: A. lumbricoides (roundworms), T. trichiura (whipworms), N. americanus, A. duodenale, A. ceylanicum, A. caninum, and A. braziliense (hookworms) and S. stercoralis and S. fuelleborni (threadworms).

  • Surveys with random sampling techniques that report sufficient data to facilitate the calculation of STH prevalence.

  • Studies conducted within the WPR as defined by the WHO regional classification system [22].

  • Where studies undertake surveys pre and post intervention regimes, only pre-intervention baseline data will be recorded. Subsequent baseline studies will identify the effectiveness of previous interventions.

Exclusion criteria are as follows:

  • Case studies.

  • Case series with < 10 people.

  • Conference abstracts, posters, and scientific correspondence.

  • Literature or systematic reviews.

  • The geographic location of the survey is not provided at a higher resolution than regional level (i.e., country level reports will be excluded).

  • Surveys that do not represent the general population or PSAC/SAC.

  • Transient populations that do not represent the geography in which they are surveyed, e.g., recent refugee arrivals.

  • Due to resource constraints, articles not published in English.

Data extraction

Two authors (BG and TT) will independently extract data from the included studies into a Microsoft Excel (version 2016) spreadsheet. The data extraction spreadsheet will be piloted on five papers and refined if required. The proposed data extraction tool is provided in Additional file 3.

Where available, the following data will be extracted for each eligible study: first author, year of publication, year of study, study location including the name of the administrative region and longitude and latitude co-ordinates in decimal degrees format (with conversion done where required), study site (e.g., school, community), number of people screened for STH, number of people diagnosed with STH infection, species of STH, infection intensity (eggs/gram or WHO classification), diagnostic method, sample type, number of samples taken and analyzed per participant, demographic factors (age, sex), prevalence of co-infection, and name of co-infectious agent. The authors of the relevant papers will be contacted should there be a need for additional information. In the event that there are duplicate surveys for a given location, the study with the most recent and greatest amount of data will be included within the analysis.

Methodological quality and publication bias assessment

A modified version of the Newcastle-Ottawa Quality Assessment Scale [24], Additional file 4, will be used by two authors (BG and TT) to evaluate the methodological quality of the included studies. To ensure agreement between the two researchers, the quality assessment tool will be piloted on 10 randomly selected studies, and any differences in opinion will be resolved through discussion with a third author (EG). The quality assessment (QA) scores range from 0 to 9; scores between 1 and 4 will be defined as low quality, scores between 5 and 7 will be defined as medium quality, and scores between 8 and 9 will be defined as high quality. A sensitivity analysis will be employed to evaluate the impact of methodological quality upon results of the review.

Potential publication bias and small study effects will be detected with funnel plots. Egger’s method will be utilized to evaluate asymmetry, and publication bias will be considered significant when p ≥ 0.05 [25].

Covariate data sources

Covariate data for multivariable analysis will be obtained from publicly accessible records. Population data will be obtained from World Pop [26], and information on health care accessibility will be obtained from the Malaria Atlas Project (MAP) [27]. Data on climatic variables such as mean temperature, precipitation, and solar radiation will be obtained from the Global Climate Database [28]. Data on altitude will be obtained from the Shuttle Radar Topography Mission (SRTM) [29], and polygon shapefiles for the administrative boundaries of each country will be obtained from the Data-Interpolating Variational Analysis (DIVA)- Geographic Information System (GIS) [30].

Geocoding

Extracted STH survey data will be geolocated to a specific coordinate of latitude and longitude (in decimal degrees format) where possible or the smallest polygon available otherwise (village or district). When the STH prevalence survey data are reported at a district level, coordinates of the district centroid will be used for georeferencing. Village locations will be identified using Google Maps. In instances when the STH prevalence survey has been reported at a district level (i.e., a polygon), a centroid that is spatially weighted according to population density will be used. The survey locations for each study will be stored in a geographical information system, ArcGIS (ESRI, Redlands, CA, USA). Data on STH prevalence and covariates will be linked according to a location using ArcGIS, to produce a spatially referenced dataset for analysis.

Geospatial analysis

Bayesian model-based geostatistics (MBG) will be used to generate spatially continuous estimates of the national prevalence of each STH mapped at a resolution of 1 km2. Within the MBG framework, a logistic regression model will be fitted to the prevalence data using both fixed covariate effects and random spatial effects. Covariates for the spatial model will be selected using a fixed-effects logistic regression model (with an exclusion criterion of Wald p > 0.2). Covariates included in the model will be selected based on evidence of association with STH infection from previous studies and based on the availability of region-wide representative data. Before fitting the model, all covariates will be checked for multi-collinearity using variance inflation factors (VIF). Those variables with a VIF greater than 6 will be excluded from the final model.

Different geospatial models will be constructed independently for each species of STH. Here, we present how the model for the prevalence of a single species of STH will be constructed, but the approach will be identical for the other STH species. A Bayesian geospatial model will be fitted for the prevalence survey data that includes covariates (fixed effects) and spatial effects [31]. The proportion of cases at each surveyed location j will be the response variable and will be assumed to follow a binomial distribution: Yj~Binomial (nj, pj), where Yj is the observed prevalence of infection, nj is the number of individuals testing for infection, and pj is the predicted prevalence at location j, with j = (1, …, n). The predicted prevalence will be associated via a logit link function to a linear predictor defined as follows:

$$\textrm{logit}\left({p}_j\right)=\log \left(\frac{p_j}{1-{p}_j}\right)=\alpha +{\sum}_{z=1}^Z{\beta}_z{\boldsymbol{X}}_{z,j}+{\zeta}_{j\kern0.5em },$$

where α is the intercept, β is a matrix of covariate coefficients, X is a matrix of Z covariates, and ζj is a spatial random field modelled using a Gaussian process with mean 0 and a Matérn covariance function. The covariance function will be defined by two parameters: the range ρ, which represents the distance beyond which correlation becomes negligible, and σ will be the marginal standard deviation [32, 33]. Due to the Bayesian characteristics of the geospatial model, priors need to be defined for all parameters (and hyperparameters) in the model. Non-informative priors will be used for α (uniform prior with bounds –∞ and ∞), and we will set normal priors with mean = 0 and precision (the inverse of the variance) = 1 × 10−4 for each βz. We will use default priors for the parameters of the spatial random field [34]. Parameter estimation will be done using the Integrated Nested Laplace Approximation (INLA) approach in R (R-INLA) [32, 33]. A relatively large number of samples (15,000 samples) will be computed to ensure that a satisfactory characterization of the posterior distribution of all parameters can be obtained.

Prediction maps

Predictions of the prevalence of each infection at unsampled locations will be made at 1 km2 resolution by interpolating the spatial random effects and adding them to the sum of the products of the coefficients for the spatially variant fixed effects at each prediction location. The intercept will be added, and the overall sum will be back-transformed from the logit scale to the prevalence scale, providing prediction surfaces that show the estimated prevalence of disease for all prediction locations.

Co-distribution

To obtain a co-endemicity map, the spatial predicted prevalence surface for each STH species will be overlaid in the GIS software. This process allows for the identification of overlapping areas where the prevalence of two, three, or four species is above a selected threshold.



Source link