Scientific Papers

Ethnic variations in metabolic syndrome components and their associations with the gut microbiota: the HELIUS study | Genome Medicine

Description of Image

Study population

The HELIUS study is an ongoing prospective cohort study in Amsterdam, the Netherlands, which at baseline included 18–70 years old residents. Participants were randomly recruited from the municipal registry, after being stratified by their ethnic origin, being of either Surinamese, Ghanaian, Turkish, Moroccan, or Dutch descent. A detailed description of the study population, study design, and rationale are provided elsewhere [16, 17]. The Academic Medical Center (AMC) Medical Ethics Committee approved the HELIUS study, and all participants provided written informed consent.

Of the total 24,789 baseline participants, a number of 22,165 people participated in the physical examination, including collection of biological samples, and filled in the questionnaire as described in Snijder et al. [16]. Out of these 22,165 participants, we excluded Javanese Surinamese (n = 233), other Surinamese (n = 267), and those of other/unknown ethnic origin (n = 48) due to insufficient numbers of these ethnicities. We further excluded participants with missing data on the components of MetS or participants with diabetes (defined by either the use of antidiabetic medication, fasting HbA1c levels ≥ 48 mmol/L or fasting glucose levels ≥ 7.0 mmol/L, or with missing values for those criteria), and all participants on either antihypertensive or antilipidemic medication or unknown medication usage, leaving 16,209 participants for the total dataset.

For the analysis on the gut microbiota composition, we included the subset of the participants from the total dataset in whom gut microbiota data were available after quality control of this data (see below) [18]. Participants who used antibiotics in the past 3 months or of unknown use were excluded. A number of 3443 participants were finally included in the gut microbiota dataset.

Baseline data collection

After a positive response, subjects received a confirmation letter of an appointment for a physical examination and a digital or paper version of the questionnaire (depending on the preference of the subject) to fill out at home. At the research locations, participants underwent a physical examination, during which measurements of blood pressure and anthropometric (e.g., weight, height and waist circumference) characteristics were obtained. Measures of waist circumference, systolic blood pressure, and diastolic blood pressure were performed in duplicate and then averaged. Furthermore, participants were asked to bring their prescribed medications, which were coded according to the Anatomical Therapeutic Chemical (ATC) classification. Fasting blood samples were drawn after an overnight fast and were analyzed by the main laboratory department of the Academic Medical Center in Amsterdam to determine glucose, lipid (total cholesterol, HDL-cholesterol and triglyceride levels), and HbA1c profiles. More detailed information about the measurements is described elsewhere [19].


Ethnicity of the participant was defined according to his/her country of birth as well as that of his/her parents, which is currently the most widely accepted and most valid assessment of ethnicity in the Netherlands [20]. Specifically, a participant is considered to be of non-Dutch ethnic origin if he/she fulfills either of the following criteria: (1) he or she was born in another country and has at least one parent born in another country (first generation) or (2) he or she was born in the Netherlands but both his/her parents were born in another country (second generation). Of the Surinamese immigrants in the Netherlands, approximately 80% are of either African or South-Asian origin. After data collection, Surinamese subgroups were classified according to self-reported ethnic origin. Participants were considered to be of Dutch origin if the person and both parents were born in the Netherlands.

Gut microbiota profiling and processing

Stool samples were collected, sequenced, and processed as previously described in detail in another study [21]. In short, DNA was extracted from the home-collected stool samples (n = 6056) after which the V4 region of the 16S rRNA gene was sequenced on an Illumina MiSeq instrument. After merging paired-end reads and quality filtering the raw reads with USEARCH [22] (v11.0.667_i86linux64), an Amplicon Sequence Variant (ASV) table was obtained using the UNOISE3 algorithm from USEARCH. Taxonomy was assigned with “dada2” [23] (v1.12.1) on the SILVA reference database [24] (v.132), and a phylogenetic tree was obtained using MAFFT [25, 26] (v. 7.427) and FastTree [27] (v. 2.1.11). In the end, the ASV table was rarefied to 14,932 counts per sample. Out of the 6056 sequenced samples, 6032 samples remained after the total quality control and were used as starting point for the above-described inclusion in our gut microbiota cohort.

MetS definition

MetS definition was based on the definition by Alberti et al. [2]. Participants were classified as having MetS, if they fulfilled at least 3 of the following criteria:

  1. 1)

    High blood pressure, defined by systolic blood pressure ≥ 130 mmHg and/or diastolic blood pressure ≥ 85 mmHg

  2. 2)

    Central obesity, defined by waist circumference ≥ 80 cm (in females) or ≥ 90 cm (in males from South-Asian Surinamese descent) or ≥ 94 cm (in males not from South-Asian Surinamese descent)

  3. 3)

    High triglycerides, defined by triglycerides ≥ 1.7 mmol/L

  4. 4)

    High glucose, defined by glucose ≥ 5.6 mmol/L

  5. 5)

    Low HDL, defined by HDL cholesterol < 1.29 mmol/L (in females) or < 1.03 mmol/L (in males)

The same criteria were used during the analysis on the individual components of MetS.


Apart from age and sex, we considered the following covariates obtained via the questionnaire: socioeconomic status (highest obtained educational level, occupational level and employment status), lifestyle (physical activity, smoking and alcohol use), and dietary habits (sugar intake and fruit intake). In gut microbiota analyses, we also took proton pump inhibitor (PPI) use into account, as this is a known confounder of the gut microbiota.

The highest educational level obtained in the Netherlands or in the country of origin was categorized as higher (higher vocational schooling or university), intermediate (intermediate vocational schooling or intermediate/higher secondary schooling), lower (lower vocational schooling or lower secondary schooling), or elementary (never been to school or elementary schooling only). Current employment status was indicated as either working, not in work force, unemployed, or unable to work. The categories academic, higher, intermediate, lower, and elementary were used to indicate occupational status. For the lifestyle-related variables, we used a binary indicator for physical activity (i.e., 30 min of moderate/intensive exercise for at least 5 days a week, which is conform the Dutch Standard for Health exercise) and alcohol use (used alcohol in the last 12 months). Smoking was categorized into yes, former, and never. Since we did not have the same Food Frequency Questionnaire for all ethnicities, we derived composite variables as proxies for dietary habits. We used regularly fruit intake (yes/no) as a proxy for a healthy diet, which was indicated as eating at least one piece of fruit for at least 5 days/week. In regard to an unhealthy diet, we used the daily ingestion (yes/no) of sugar drinks as a proxy. This variable was considered to be present if participants responded that they had a daily consumption of either fruit juice, tea with sugar, regular soft drink, sports drink, fruit syrup, fruit drink, malt beer, or coffee with sugar or when a participant consumed 7 of those drinks 1 to 6 days a week.

Statistical analysis

Clinical and anthropometric values are summarized as mean ± standard deviation or as median (interquartile range) for normally and non-normally distributed values, respectively. Categorical variables are presented with either counts or percentages.

For the subsequent analyses, except for analyses on combinations of components, all analyses were performed for the binarized outcomes of all MetS components and MetS itself as well as on the continuous outcomes of the components.

Differences in MetS outcomes across ethnicities were assessed with general linear models (GLM) (family “binomial” for binarized outcomes, family “gaussian” for continuous outcomes). Models were run for the total dataset and adjusted for age and sex (male as reference). Statistical significance of the ethnicity variable (Dutch as reference) was assessed with the likelihood ratio test (LRT). In addition, potential sex-dependent ethnic differences in MetS outcomes were tested with the inclusion of an interaction term between sex and ethnicity in the previous model, again using a LRT. To assess the potential influence of known confounders on the MetS outcomes, the same models were subsequently run with adjustment for socioeconomic factors, lifestyle, and dietary habits, in which higher educational level, academic occupational level, working employment status, never smoked, no alcohol use, no regular physical activity, no regular fruit intake, and no daily sugar drinks intake were set as reference.

Differences in prevalence of all possible combinations of components across ethnicities were assessed with the chi-squared test, performed separately on males and females from both the total dataset and MetS only subjects.

Analyses on the gut microbiota composition were only performed on samples from the gut microbiota dataset. The diversity of the gut microbiota per participant was indicated with several α-diversity indices calculated at the ASV level, including Shannon index (R package vegan 2.6–4 [28]; function “diversity”), richness (number of unique ASVs; R package vegan; function “specnumber”), and Faith’s PD (R package picante v.1.8.2 [29]; function “pd”). To assess the effect of α-diversity on MetS outcomes, logistic regression (GLM with binomial family; for binarized outcomes) and linear regression (GLM with gaussian family; for continuous outcome) were performed for each diversity index separate (independent variable). Triglyceride levels were log transformed to account for their non-normal distribution. Models were adjusted for age, sex, ethnicity, and the interaction between sex and ethnicity (if this interaction was significant during analyses on the total cohort), assuming an ethnic-independent effect of α-diversity (i.e., ethnic-independent model). To test if the effect of α-diversity on the outcomes was different across ethnicities, an interaction term between ethnicity and α-diversity was added to the ethnic-independent model and tested for significance with a LRT. Those models were considered as baseline models (model 1). In addition, additive adjustment for socioeconomic factors (model 2; model 1 + socioeconomic), lifestyle-related variables (model 3; i.e., model 2 + lifestyle), and dietary-related variables (model 4; i.e., model 3 + diet) was performed to assess the influence of known confounders on the MetS outcomes. We also adjusted for PPI use in models 2, 3, and 4, since this is a known confounder of the gut microbiome composition. Coefficients and standard errors for each ethnicity were obtained from the model output, including the coefficients and variance–covariance matrix, if the interaction was significant.

Similar to the α-diversity, we also assessed the effect of individual ASVs in regard to MetS outcomes. To account for the bias in ethnic sample size, ASVs were included if they fulfilled the following criteria in at least one ethnicity, in either males or females: present in > 5% of the samples and a mean relative abundance > 0.02%. This resulted in the inclusion of 604 ASVs. ASVs were included in the models as arcsin square-root transformed relative abundance, to account for the non-normality of the distribution. The same ethnic-independent models (i.e., logistic or linear regression, adjusted for age, sex, ethnicity, and optionally sex:ethnicity as baseline models, and additional adjusted for PPI use, socioeconomic, lifestyle, and diet variables) were performed for all ASVs (independent variable). Per outcome, either binarized or continuous, correction for multiple testing was performed using the Benjamini–Hochberg correction (p.adjust) [30]. All ASVs were also tested for ethnic specific effects by including an interaction term between ethnicity and ASV to the ethnic-independent models and tested for significance with a LRT. Correction for multiple comparisons was performed in a similar manner as described above.

Subsequently, an analysis was performed on the ASVs that were significant for at least 3 components (combining binary and continuous outcomes and considering MetS itself as a component) in the ethnic-independent models. ASVs were clustered based on their Spearman’s correlation, using hierarchical linkage clustering (Euclidian distance, average agglomeration method) with hclust. Abundances of ASVs belonging to clusters were summed, arcsin square-root transformed, and tested for effects on MetS outcomes in the same way as the α-diversity measures.

Statistical analyses were performed in R 4.0.3 [31] (using RStudio v 1.3.1093). p-values < 0.05 (either BH adjusted (ASVs) or unadjusted (other models); either for single terms or interaction terms) were considered to be statistically significant.

Description of Image

Source link