Participants and devices
The research was approved by the Medical Ethics Committee of the Chinese PLA General Hospital for clinical research (No. S2022-341-01). Participants with good sleep quality [44], defined as no insomnia, latency to sleep onset of less than 15 min, total sleep time of approximately 8 h, and number of awakenings after sleep onset of less than 2 were randomly recruited offline. Each participant signed an informed consent form. We also recorded some clinical factors that might influence intestinal motility, including age, gender and BMI [45]. The participants were confirmed the absence of intestinal disease to eliminate abnormal changes in BSs caused by gastrointestinal dysfunction. Participants were also instructed to eat little or no dinner and avoid medications and foods that affect sleep. The study was conducted in the Sleep Monitoring Center of the First Medical Center of the Chinese PLA General Hospital. Finally, 14 participants were recruited to complete one night of data collection, as shown in Table 9.
We used the BS recorder and the PSG device to acquire BS data and physiological signals during sleep simultaneously. The BS recorder, which was self-developed, had two channels based on the Knowles’ SiSonic MEMS microphone ((SPU1410LR5H-QB) with the port hole at the bottom. The channel for BSs (BS-channel) was to collect the raw BSs through the microphone chip on the front side of the circuit board (the port hole facing the human body), and the channel for noise sounds (NS-channel) was to collect the external ambient noise through the microphone chip on the back side of the circuit board (the port hole facing the outside). The MEMS microphone had a tightly matched sensitivity of ± 3 dB and an ultra-wide band flat frequency response of ± 2 dB in the 10–10 kHz frequency domain. The sound signals input to the 12 bits analog–digital converter (ADC) of MCU (STM32L151) which was chosen from STMicroelectronics. The sample rate was 8000 Hz. After processing of MCU, the data were stored on a Micro-SD card. In order for the device to adhere to the surface of the abdomen, we also designed the sticking splice and a buckle to fix the device.
PSG is an objective approach that is considered the gold standard for sleep measurement, especially for diagnosing physiologically based sleep disorders. A standard PSG montage was used following International 10–20 System guidelines [46]. The PSG device (EMBLA N7000) includes16-channel electroencephalography leads (EEG; F3, F4, C3, C4, O1, O2, M1, M2, GRD/REF), electromyography (EMG; 3 sub-mental leads), and electrocardiography (ECG; R/L Arm). Participants also wore thoracic and abdominal respiratory belts to monitor respiration.
Before the test, the technician wore the PSG sensors as requested and debugged them to ensure that each channel worked normally. The BS recorder was adhered on the surface of the right lower abdomen.
Synchronization of the two devices was achieved by creating a synchronization signal by tapping the sensors against each other. Specifically, PSG was selected to use the leg myoelectric sensor, which can form a pulse signal for the tapping, and the pickup port of the BS recorder was used to tap with the leg myoelectric electrodes to form a synchronization signal for the two independent devices, which in turn achieves synchronization of the devices.
Data annotation
In our study, data annotation included the EBSs and sleep stages annotation. EBSs annotation was achieved by two experienced clinicians in a double-blind manner. 30 min of bowel sound data from each participant’s entire night were selected for labelling. The reason for selecting a participant’s 30-min BSs was to reduce the manual annotation workload compared to the full night’s data and, by manually annotating the local segment data, to set different HOS thresholds for each subject to improve the generalization ability of the effective bowel sound detection algorithm. While the HOS could assess the non-Gaussianity of the signal, the main factors affecting the non-Gaussianity of the bowel sound signal were the differences in the internal structure of the abdominal intestinal tract in different subjects, which mainly included the thickness of the abdominal intestinal wall, the distance between the intestinal tract and the surface of the abdomen, the thickness of the abdominal fat, and so on. In addition, the 30 min could reflect the whole night state as the test environment remained virtually unchanged throughout the night. Furthermore, to ensure consistency in the selection of labelled segments across subjects, we chose a period during each subject’s stage N3 as the labelled segment. As stage N3 generally lasted 20–40 min [47], 30 min was chosen as the labelled segment.
The specific labelling rule was that the EBSs were longer than 10 ms and two EBSs with an interval of less than 100 ms were regarded as one EBS. The EBSs labeled by both clinicians would be the final EBSs, otherwise they would be abandoned.
The sleep stages were labeled in 30-s epochs in compliance with standard criteria [48] by an experienced technician who had passed the Registered Polysomnographic Technologist (RPSGT) examination. The sleep stages included an awake stage (w), a transition stage from wakefulness to sleep (N1), a light sleep stage (N2), a deep sleep stage (N3) and a rapid eye movement sleep stage (REM).
EBSs recognition algorithm
In our study, the EBSs recognition algorithm first determined the optimal threshold based on the annotated 30-min BS audio of each participant. Then, with the optimal threshold, the recognition algorithm could be used to identify the whole night’s EBSs. Figure 4 shows the flow chart of EBSs recognition.
Data denoising
The noise of bowel sounds during sleep mainly includes heart sounds, background noise and noise caused by turning over. In this section, the processing of heart sounds and background noise is mainly implemented, and the noise caused by turning over is eliminated after subsequent identification. The denoising methods included the bandpass filtering, adaptive filtering and wavelet threshold denoising.
The bandpass filter filtered out the noise outside the frequency band of BSs. Some existing research described that the BS signals were mainly distributed in the range of 100–500 Hz [49, 50], 5–600 Hz [29], and 100–1000 Hz (mostly 100–800 Hz)[36]. To ensure the retention frequency range of BSs and the removal of the main noise, we selected the bandpass filter of 100–1000 Hz. More importantly, such a bandpass filter filters out the influence of the first and second heart sounds on the bowel sounds [51].
The adaptive filter accomplished the denoising of ambient noise based on the two-channel configuration, as shown in Fig. 5. In this study, we chose NLMS [52] to achieve the adaptive noise canceller (ANC) [53].The parameters that had to be set for NLMS included the filter order and the step size [54], and the parameters that measured the performance of the filter were mainly MSE, SNR and correlation coefficient [53, 55].The filter order affected the computation time of the filter, the larger the filter order, the longer the computation time required; for the step size setting, the MSE decreased when the step size was small, and the convergence speed was accelerated when the step size was large. We considered the computation time, convergence speed and filtering effect simultaneously and set the filter length to 64 and the step size to 0.001 [53]. To evaluate the filtering effect, we mainly used the parametric correlation coefficient. Since the experimental procedure was performed in the same monitoring environment for all subjects, the ambient noise was kept essentially constant, which happened to be the reference input of the filter, and the output of the filter was the estimate of the ambient noise in the raw bowel sound signal, and the correlation coefficient was the correlation between the ambient noise and the estimate of the ambient noise. We performed adaptive filtering on 10 randomly selected 10 s segments of data from each of the 14 subjects to ensure that the correlation coefficient was greater than 0.9 to verify that the filter parameters were set to ensure that the filter had good filtering performance.
Wavelet threshold denoising (WTD) was used to suppress the noise component concentrated in the detailed component of wavelet decomposition using the Mallat pyramidal algorithm. The WTD process included decomposition processes, thresholding processes and reconstruction processes, as shown in Fig. 6. According to the comparison of the filtering effect, the sym6 wavelet basis was selected and the number of decomposition layers was determined as 6. In addition, the threshold was calculated using the Birge–Massart algorithm for the threshold denoising [56].
Determination of optimal thresholds based on the 30-min labeled BSs
After data denoising, EBSs recognition was carried out. In the previous study [29], the modified iterative kurtosis-based detector (IKD) was used for the separation of the EBSs based on the kurtosis of a sliding-window BSs and a histogram analysis of the kurtosis time series (K). The percentage of the total frequency of K was fixed as 90% based on experience which may affect the recognition accuracy. Therefore, we devised the m-HOS based on the annotation EBSs. Specifically, for the 30-min labeled BSs of each participant, the histogram analysis of the HOS time series was obtained, and then the percentage was dynamically adjusted from 90 to 100% to determine the optimal threshold until the best detection performance was obtained based on the labelled EBSs. Figure 7 shows the flow chart of the m-HOS algorithm.
Obtaining the HOS time series
The 30-min BSs were labeled manually. The sliding window was \(MM=0.003\times Fs\), where \(Fs\) was the sampling rate, and the constant 0.003 was set based on experience. The HOS time series (H) was calculated within the sliding window. In our study, the third-order-statistics cumulant of the BSs was calculated using the cum3est function [57] in the MATLAB software.
Determining the optimal threshold for each participant
The temporary threshold was used for the EBSs recognition from the raw BSs. First, the histogram of H and the frequencies (freq) was calculated. Then, the sum of the freq reached the setPortion of the total frequencies and the temporary threshold was set as twice the recent index. The final optimal threshold was adjusted with the setPortion from 0.900 to 0.999.
With the temporary threshold, the corresponding EBSs were recognized from H. The target EBSs whose corresponding H values should be larger than the temporary threshold. In addition, the recognized rules were be consistent with the labeling rules.
The EBSs detected with the temporary threshold were compared with the manually labeled EBSs to evaluate the performance. Specifically, we used the \(Accurary\) to evaluate the recognition performance with different temporary thresholds. While the \(Accuracy\) achieved the maximum value, the corresponding temporary threshold was the optimal threshold for the participant. The \(Sensitivity\) and \(Specificity\) could help to reflect recognition performance.
Figure 8 shows the definition of evaluation parameters. The red part was the manually labeled EBS, and the green part was the detected EBS. Several parameters were defined as follows: TP (the annotated EBS was correctly recognized as a true EBS), FP (the non-annotated segment was incorrectly identified as EBS), TN (the non-annotated segment was correctly identified as the noise part), FN (the non-annotated segment was incorrectly recognized as the noise part). The \(Accurary\), \(Sensitivity\) and \(Specificity\) were defined as Eqs. (1), (2) and (3):
$$Accuracy=\frac{{E}_{TP}+{E}_{TN}}{{E}_{TP}+{E}_{FP}+{E}_{FN}+{E}_{TN}}$$
(1)
$$Sensitivity=\frac{{E}_{TP}}{{E}_{TP}+{E}_{FN}}$$
(2)
$$Specificity=\frac{{E}_{TN}}{{E}_{FP}+{E}_{TN}}$$
(3)
where \({E}_{(.)}\) denoted the energy of the corresponding segment. The reason for using the energy to calculate the evaluation parameters was that it could better express the recognition effect of transient signals with sudden energy aggregation-like EBSs. On one hand, it could avoid the recognition bias caused by the artificially labeled boundaries, and on the other hand, for the long-term BSs recognition, some particularly low-energy EBSs that were not recognized had little effect on the whole night BSs.
EBSs recognition and CVs extraction of the whole-night’s data
The optimal threshold of each participant was used for the EBSs identification of the whole night. The same as the EBSs recognition of labeled 30-min BSs, the HOS time series of the whole-night BSs after denoising were obtained. Then, the determined optimal thresholds of different participants, combined with the labeling rules, were applied to the EBSs recognition of the whole-night’s BSs.
The recognition and removal of turning-over segments
Before the whole night’s EBSs recognition, we also recognized the turning-over segments for further improving the recognition performance. Specifically, we used the NS-channel data to recognize the turning-over segments. First, we calculated the HOS time series of the denoised NS-channel data and the length of sliding window was still \(MM=0.003\times Fs\). Because the turning-over segments were evident and were a small amount, the parameters were directly determined by experience. The threshold was set at 50, the minimum length of turning-over segments at 1 s and the maximum length of interval between two turning-over segments at 1 s. Figure 9 shows one example of turning-over segment recognition. After the turning-over segments were recognized, the EBSs, while the turning-over segments happened were discarded.
Turning-over segment recognition: A BSs with manually labeling information of EBSs. B BSs with automatic recognition information of EBSs. C noise sounds with automatic recognition information of a turning-over segment. Green lines: the start points of the segments. Red lines: the end points of the segments
CVs extraction of EBSs during different sleep stages
In this study, CVs of time domain, frequency domain and nonlinear dynamics were extracted which described the intestinal states from various angles.
The time-domain CVs involved \(cv\), \(E0\), \(duration\), and \(frequency\). \(cv\) was the coefficient of variation indicating the fluctuation of data. \(E0\) was the energy of each EBS. \(duration\) was the length of each EBS indicating the duration time. \(frequency\) was the number of EBSs during a corresponding sleep stage.
The frequency-domain CVs involved \(FC\), \({ER}_{5-300}\),\({ER}_{300-500}\),\({ER}_{500-1000}\). \(FC\) was the centroid frequency describing the frequency with large components in the power spectrum. \({ER}_{5-300}\), \({ER}_{300-500}\) and \({ER}_{500-1000}\) represented the power in the 5–300 Hz, 300–500 Hz and 500–1000 Hz to the total power, respectively.
The nonlinear CVs included fractal dimension (\(FD\)) and sample entropy (\(SampEn\)) which were based on the concepts of fractals and entropy like for EEG signals during sleep [41. \(FD\) was calculated by Katz algorithm based on the box-counting dimension, which could measure the unevenness and complexity of signals. \(FD\) was defined as the following equation:
$$FD=\frac{{\mathrm{log}}_{10}(L)}{{\mathrm{log}}_{10}(d)}$$
(4)
where \(L\) was the total length of the time series and \(d\) was the estimated diameter, which was regarded as the distance between the first point of the series and the farthest point of the series.
Entropy values reflect the number of times the patterns in a signal are repeated and thus measure the randomness and predictability of stochastic process and in more general terms, increase with greater randomness [43]. In our study, the sample entropy was chosen to express entropy. Compared with approximate entropy, the sample entropy did not include the comparison with its own data segment when calculating the approximation, so the calculation error was small and did not depend on the data length, which was suitable for EBSs of different lengths. The sample entropy could be computed as Eqs. (5) and (6):
$$SampEn\left(m,r,N\right)=-\mathrm{ln}[\frac{{U}^{m+1}(r)}{{U}^{m}(r)}]$$
(5)
$${U}^{m}\left(r\right)={[N-m\tau ]}^{-1}\sum_{i=1}^{N-m\tau }{C}_{i}^{m}(r)$$
(6)
where \(N\) was the length of a time series, \(m\) was the pattern length, \(r\) was the tolerance value, and \(\tau \) is the time delay. \({C}_{i}^{m}\left(r\right)\) is defined as the following equation:
$${C}_{i}^{m}\left(r\right)=\frac{{B}_{i}}{N-(m+1)\tau }$$
(7)
where \({B}_{i}=number \,of\, j\, where d\left|{X}_{i},{X}_{j}\right|\le r\).
After the CVs extraction of all the EBSs, we classified the CVs corresponding to different sleep stages according to the timepoint, where the EBSs were located.
Statistical analysis
Statistical analyses were completed for CVs of the five sleep stages using IBM SPSS Statistics 25. Before statistical analysis, a normal distribution test was performed using the Kolmogorov–Smirnov test. For data satisfying the normal distribution, the homogeneity of variance test should be performed. The value of \(p<0.05\) was considered to indicate statistical significance and the trend of different CVs during different sleep stages was expressed by the mean scores plot. For data not satisfying normal distribution, nonparametric tests should be performed. Similarly, \(p<0.05\) indicated statistical significance and the data distribution could be expressed as median and quartile values.
Add Comment