Scientific Papers

Child stunting prevalence determination at sector level in Rwanda using small area estimation | BMC Nutrition


Study setting

This study was conducted in Rwanda. Formally the Republic of Rwanda, is a landlocked country in Africa’s Great Rift Valley, where the African Great Lakes region and East Africa meet. Rwanda is bordered by Uganda, Tanzania, Burundi, and the Democratic Republic of the Congo, and is located a few degrees south of the Equator. Its landscape is dominated by mountains in the west and savanna in the east, with numerous lakes throughout the country, earning it the nickname “land of a thousand hills”. The climate ranges from temperate to sub-tropical, with two rainy and two dry seasons per year. Rwanda is the most densely populated mainland African country, with a population of over 12.6 million people living on \(26,338 km^2\) of territory. Kigali, is the capital and largest city of Rwanda, with a population of more than one million people.The Rwandan population is primarily rural and young.

Research design and data description

This study used the Rwanda Demographic and Health Surveys of 2019/2020 (RDHS) which is a large and rich dataset on stunting and socio-demographic characteristics. The 2019/2020 RDHS used a two stage sample design for a number of constrained indicators, allowing estimates of significant indicators for the country as a whole, as well as for urban and rural areas, five provinces, and each of Rwanda’s 30 districts. The initial step was to select randomly a sample of 500 clusters made up of EAs defined for the Fourth Rwanda Population and Housing Census (RPHC-2012). The second stage involved households sampling in a every sampled cluter by systematic manner. From June to August 2019, a household listing operation was conducted in all selected EAs, and a total of 13,000 households were selected. RDHS provides the variables as well as the total population and under five years children for only selected sectors while the Small Area Estimates (SAEs) technique are used to produce sector-level estimates of stunted children in all Rwandan sectors for the year 2019/2020. The lack of total children under five years in RDHS for all sector is addressed by a projection based on the RPHC of 2012. The variables of interest considered in this study are:

  • The Response variable is the Number of stunted children in a sector obtained from RDHS-2019/2020. A stunted child refers to a child who is too short for his or her age. The World Health Organization (WHO) defines a stunted child as a child whose height-for-age Z-score is below minus 2 standard deviations (SD) from the WHO Child Growth Standards median [13]. In this definition, the WHO uses a statistical measure called the Z-score, which indicates how much a child’s height for age deviates in standard deviations from the median of a reference population.

  • The sector level Covariates were selected from RPHC-2012 which are Proportion of Poverty Headcount, Average Household size, Number of under five children living in urban settlement, Number of male headed households, Number of female headed households, Number of heads of households who completed secondary education level and above, Number of heads of households who have low than secondary education level, Number of households who have access to improved water and Number of households who have access to improved toilet

Study population

The Rwanda 2019/2020 DHS is one of the accurate and legitimate source of data that was available and accessible, and it could better answer to the study’s goal. All children born in the five years leading up to the 2019/2020 RDHS are included in the study population. The current study’s sample includes all children aged 0 to 59 months at the time of the survey interviews, as well as their mothers, and the necessary anthropometric measurements (sex, height, and age) were obtained during the surveys studied. The total number of children under the age of five that are eligible for this study in the 2019/2020 RDHS is 8,092. The methodology employed in the RDHSs in the report is thoroughly documented [3].

Method of data analysis

Since the RDHS-2019/2020 was not designed to provide estimates of under five children at a lower level than the district, the direct estimates of childhood stunting are with high variability and thus not reliable due to small samples connected to a sector. Therefore, SAE techniques were used to produce accurate estimates of childhood stunting prevalence at sector level. This method consists of linking statistical models to the variables of interest together with relevant covariates to produce model-based estimates at domain level of interest. A generalized linear mixed model will be considered.

Let \(N_i\) and \(n_i\) be the population and sample sizes in the sector \(i (i=1,…,d)\), respectively, where \(d=416\) sectors in the population. The population \(N_i\) stands for the total number of children under 5 years in the i-th sector, while the sample size \(n_i\) represents the number of children sampled from the i-th sector to participate in RDHS-2019/2020. The total number of units in the population (children under 5) is \(N=N_1+N_2+…+N_d\) and the total sample n is \(n=n_1+n_2+…+n_k\). Moreover, The response variable \(y_{ij}\) that takes the value of the j-th child, in the i-th sector is a binary random variable that takes the value 1 if a child is stunted and 0 if a child is not. In addition, the response vector \(\textbf{y}_{i}\) for the i-th sector is partitioned into sampled \(\textbf{y}_{i}^s=\left( y_{ij}^s\right),~j=1,…,n_i\) and non-sampled \(\textbf{y}_{i}^r=\left( y_{ij}^r\right),~j=n_{i}+1,…,N_i\) parts. It follows that the total number of stunted children in sector i is \(T_i=T_i^s+T_i^r\), with \(T_i^s=\sum _{j=1}^{n_i}y_{ij}^s\) and \(T_i^r=\sum _{i}^{N_i-n_i}y_{ij}^r\), where \(T_i^s\) and \(T_i^r\) are independent variables assumed to follow a Poisson distribution. Let \(\varvec{x}_{ij}\) be a vector of covariates. The linking model to covariates is a log linear model of the form

$$\begin{aligned} \log (T_i)=\varvec{x}_{ij}’\varvec{\beta }+u_i, \end{aligned}$$

(1)

where, \(\varvec{\beta }\) is a k-vector of unknown parameters, and \(u_i\sim \mathcal {N}(0, \sigma ^2)\) is the random effect that accounts for the between variability other than the variability explained by the covariates included in the model. The estimators of unknown parameters \(\varvec{\beta }\) and \(u_i\) from model (1), can be obtained using maximum likelihood approach using the sample data and the total number of stunted children in the i-th sector [14]:

$$\begin{aligned} \nonumber \widehat{T}_i={} & {} T_i^s+\widehat{T}_i^r\\ \widehat{T}_i={} & {} T_i^s+(N_i-n_i)\ \textrm{exp}\left( \varvec{x}_{ij}’\widehat{\varvec{\beta }}+\widehat{u}_i\right) , \end{aligned}$$

(2)

where \(T_i^s\) is straightforward calculated from the sample data. The proportion \(p_i\) of stunted children in the i-th sector is estimated as

$$\begin{aligned} \widehat{p}_i= \frac{\widehat{T}_i}{N_i}. \end{aligned}$$

(3)

Here, \(N_i\) is computed using the following formula

$$\begin{aligned} N_i=pN_{i,Tot}, \end{aligned}$$

where p and \(N_{i,Tot}\), the proportion of under five children in the population and the total population of the i-th sector in 2019, are computed using medium population projection [15].

It should be noted that model (1) is based on an unweighted sample count, which assumes that sampling within areas is non-informative [16]. Therefore, Eq. (2) disregards the complex survey design such as Demographic Health Survey. In order to address this issue, several authors suggested that when analyzing area level estimates as a binomial proportion, one should use the effective sample size by incorporating sampling weights rather than the actual sample size in the model [17, 18]. Sampling weights are necessary in complex sampling designs like the stratified two-stage cluster design used for demographic health surveys in order to account for the complex sampling design, variations in selection probabilities, and potential biases introduced by non-response or other factors.

Let \(N_{ih}\) and \(n_{ih}\) denote the population size and sample size in cluster h in area i, respectively, such that \(N_i=\sum _hN_{ih}\) and \(n_i=\sum _hn_{ih}\) and the associated sampling weights noted by \(w_{ih}=\frac{N_{ih}}{n_{ih}}\). Define \(y_{ijh}\) to be the binary response for the characteristic of interest for unit j in cluster h in area i. The sample size \(n_i\) is replaced by effective sample size \(n_{i(e)}\) in Eq. (2) as proposed by Liu et al. [18], where

$$\begin{aligned} n_{i(e)}=\frac{P_i(1-P_i)}{n_i\text {Var}(p_{iw})}Deff_i, \end{aligned}$$

for small area proportion \(P_i\)

$$\begin{aligned} P_i=\frac{\sum _h\sum _jy_{ijh}}{N_{jh}}, \end{aligned}$$

whose direct survey estimator is given by

$$\begin{aligned} p_{iw}=\frac{\sum _h\sum _jw_{ih}y_{ijh}}{\sum _h\sum _jw_{ih}}, \end{aligned}$$

where

$$\begin{aligned} Deff_i=\frac{W_{ih}^2P_{ih}(1-P_{ih})/n_{ih}}{P_i(1-P_i)/n_i} \end{aligned}$$

for \(W_{ih}=\frac{N_{ih}}{N_i}\) and population proportion in cluster h in area i given by \(P_{ih}\). Moreover, the design count \(\widehat{T}_i^s\) is replaced by effective count \(\widehat{T}_{i(e)}^s\) in (2) defined by

$$\begin{aligned} \widehat{T}_{i(e)}^s=n_{i(e)}p_{iw}. \end{aligned}$$

Therefore, total number of stunted children in the i-th sector is computed as

$$\begin{aligned} \widehat{T}_i=T_{ie}^s+(N_i-n_{ie})\ \textrm{exp}\left(\varvec{x}_{ij}’\widehat{\varvec{\beta }}+\widehat{u}_i\right) \end{aligned}$$

(4)

Usually, for Demographic Health Survey (DHS), a stratified two-stage cluster design is used to make the sample. Enumeration Areas (EA) are typically selected from census files in the first stage. In the second stage, a sample of homes is drawn from an updated list of households in each EA that was selected. Given H strata, let \(y_{hi}\) be the response value for a characteristic of interest for i-th sampling unit in the h-th stratum, \(w_{hi}\) be the corresponding sampling weight, and \(n_{h}\) the sample size of the h-th stratum. The estimator of the population mean is the weighted mean given by

$$\begin{aligned} \bar{y}=\frac{\sum _{h=1}^H\sum _{i=1}^{n_h}w_{hi}y_{hi}}{\sum _{h=1}^H\sum _{i=1}^{n_h}w_{hi}}. \end{aligned}$$

The corresponding sampling variance estimator and standard error (SE) of the weighted mean can be expressed as

$$\begin{aligned} \widehat{\textrm{var}}(\bar{y})={} & {} \frac{\sum _{h=1}^H\frac{n_h}{n_h-1}\sum _{i=1}^{n_h}\Big [w_{hi}(y_{hi}-\bar{y})-\frac{1}{n_h}\sum _{j=1}^{n_h}w_{hj}(y_{hj}-\bar{y})\Big ]^2}{\Big ( \sum _{h=1}^H\sum _{i=1}^{n_h}w_{hi}\Big )^2}\\ SE(\bar{y})={} & {} \sqrt{\widehat{\textrm{var}}(\bar{y})} \end{aligned}$$



Source link