Scientific Papers

Instrumented timed up and go test and machine learning-based levodopa response evaluation: a pilot study | Journal of NeuroEngineering and Rehabilitation


Subjects

Patients with Parkinsonism who were hospitalized in the Department of Neurology of Beijing Hospital from April 2022 to March 2023 were screened consecutively. They were admitted for the purpose of establishing diagnoses or adjusting treatments. The inclusion criteria were as follows: (1) Diagnosis of Parkinsonism according to the 2015 MDS Clinical Diagnostic Criteria for Parkinson’s Disease [3]; (2) Signed informed consent.

The exclusion criteria were as follows: (1) Unable to walk independently or complete the evaluation with the wearable device (Hoehn-Yahr stage of 4 or 5); (2) Having history of ischemic or hemorrhagic stroke, head trauma or other focused brain injuries; (3) Neuroimaging indicate the existence of diseases that may impact gait, such as intracranial space-occupying lesions, hydrocephalus or severe white matter lesions ( severe white matter lesions was defined as either confluent white matter hyperintensities (Fazekas score 2 or 3) or irregular periventricular white matter hyperintensities extending into the deep white matter (Fazekas score 3) [34]; (4) Medical history indicates having musculoskeletal disease or other neurological diseases that may affect gait and balance; (5) Having severe cognitive dysfunction with a score of Mini-Mental State Examination (MMSE) ≤ 17 [35]; (6) Medical history indicates contraindications to the use of levodopa-benzyl serine.

Instrumented TUG test

TUG test included sequentially standing up from a specific chair, walking five meters straight, turning 180 degrees around, walking back in a straight line to the chair, turning around another 180 degrees, and sitting down on the chair. The iTUG was carried out with wearable inertial sensors. The GYENNO MATRIX (GYENNO SCIENCE CO., LTD., Shenzhen, China) was utilized to detect changes in both speed and direction during motion. The MATRIX consists of 10 inertial sensors (i.e., 10 data recording channels) sampling at 100 Hz. Each inertial sensor consists of a (1) tri-axial accelerometer with range =  ± 16 g and sensitivity = 16,384 LSB/g, and a (2) tri-axial gyroscope with range =  ± 2000 dps and sensitivity = 131 LSB/dps. Two wrist sensors were bilaterally placed on the dorsal side of the wrist. The chest sensor was placed on the sternum of the chest, and the lumbar sensor was attached to the fifth lumbar vertebra. Two thigh sensors were bilaterally placed 7 cm above the knee, while two shank sensors were bilaterally placed 7 cm below the knee. Two-foot sensors were bilaterally placed at the instep (dorsal side of the metatarsus) of each foot. All sensors were tightened to designated locations by straps. Please refer to previous research literature for specific attached position [36]. Signal data were stored in computers for feature extraction.

Clinical assessment

For all patients, we recorded the following data: age, gender, disease duration, height, thigh length, calf length, score of Mini-Mental State Examination (MMSE) and Montreal Cognitive Assessment (MoCA), levodopa equivalent dose (LED).

The acute ALCT was performed in the morning, following withdrawal of dopamine receptor agonists for 72 h, other antiparkinsonian medications for 12 h and an overnight fast. The state of the patients at this time was defined as the OFF-medication state. We then conducted the first MDS-UPDRS III assessment, and the first iTUG.

After the first assessment, the patients were administered with levodopa. In drug-naive patients, the recommended dose was 250 mg (levodopa/benserazide 200/50 mg) [1]. In patients under chronic treatment, a levodopa dose 50% higher than the regular morning dose was administered to perform a suprathreshold challenge [1, 32].

After approximately 1 h, the patients were asked to describe their subjective feelings on their levodopa intake. When they felt the best response, it was defined as the ON-medication state. Then we conducted the second MDS-UPDRS III assessment and iTUG. Two MDS-UPDRS assessments for each patient with PD were independently assessed by two neurology specialists, and the final results were averaged. Both assessors possess the qualification for MDS-UPDRS III scoring, have similar years of experience, and we have compared the consistency between the scores given by the two raters (ICC = 0.86, P < 0.05).

The time 45 min was considered as the minimum for being On-state by referring to the pharmacokinetic characteristics of levodopa; the peak efficacy of levodopa occurs at 45–90 min after ingestion [1].

Feature extraction

The GYENNO MATRIX consists of 10 inertial sensors, and each sensor is assembled with a 3-axis accelerometer and a 3-axis gyro, containing 6 separate signals corresponding to the single axis of the accelerometer and gyro, as illustrated in Eq. 1. Therefore, Eq. 2 shows that 60 signals (10 sensors × 6 signal/sensor) were recorded for each participant in a single iTUG trial.

$$S_{j} = \left( {\begin{array}{*{20}c} {a_{x1} } & {a_{y1} } & {a_{z1} } & {g_{x1} } & {g_{y1} } & {g_{z1} } \\ {a_{x2} } & {a_{y2} } & {a_{z2} } & {g_{x2} } & {g_{y2} } & {g_{z2} } \\ {…} & {…} & {…} & {…} & {…} & {…} \\ {a_{xN} } & {a_{yN} } & {a_{zN} } & {g_{xN} } & {g_{yN} } & {g_{zN} } \\ \end{array} } \right)$$

(1)

$$W = \left( {S_{leftwrist} ,S_{rightwrist} ,S_{chest} ,S_{lumbar},S_{leftshank} ,S_{rightshank} ,S_{leftthigh} ,S_{rightthigh} ,S_{{leftf{\text{oo}}t}} ,S_{{rightf{\text{oo}}t}} } \right)$$

(2)

The iTUG test was divided into standing up from chair, straight walk, turning and sitting down on the chair. We used prebuilt algorithms to extract kinematic features for these four stages. Standing and sitting were recognized using sensors of bilateral thighs and shanks. The change in the lubmar horizontal rotation angle identify the start and end moments of the two turns. During the straight walk section, individual gait cycles were detected and 156 gait parameters were analyzed across the whole trial. During the turning, standing up from the chair, and sitting stages, 12, 5 and 5 parameters were investigated, respectively. Additionally, two features represent the duration of individual iTUG tests. Thus, we synthesized 178 kinematic parameters for each iTUG trial by gait event (such as toe-off, heel-strike, gait cycle) recognition, illustrating iTUG trial duration, motion profiles of the arms, lumbar spine, trunk, feet, and shanks and representing motion asymmetry for bilateral limbs, kinematic variability (standard deviation of parameters), and task-related spatial/temporal characteristics (Table S1 in supplementary file). Considering the effect of the dominant side, parameters related to limbs were calculated as the mean, maximum, minimum, and absolute difference between the 2 sides of the body. And the detail about feature construction on these 178 kinematic parameters are in the feature construction section of the supplementary file. Thus, a total of 170 kinematic features were included in the final analysis after feature construction. These synthesized kinematic parameters are originated from a set of kinematic parameters which have been disclosed in the supplement file of our previous work [36, 37].

In addition, we introduced signal features, including 22 features in the time domain and 45 features in the frequency domain (Table S1 in supplementary file). A one-second (Wi = 1 s) sliding window with a 0.5-s overlap is selected for processing the Sj data. A 402 (67 × 6) feature vector was obtained for each window, and the average value of all the windows was calculated to represent the entire signal.

Thus, 4190 (67 × 6 × 10 + 170) time domain, frequency domain, and kinematic parameters were used to describe a single iTUG test (Table S1 in supplementary file).

To interpret the pattern of motion clearly, parameters were categorized into 8 types: amplitude, asymmetry, axial, pace, variability, speed, frequency domain, and complexity. The amplitude, asymmetry, axial, pace, variability, speed parameters have been defined in our previous study [37]. Frequency domain parameters referred to the characteristics extracted after converting a signal from the time domain to the frequency domain in signal processing. This transformation is typically achieved through the Fourier Transform, which reveals the composition of the signal at different frequencies, such as components and distribution of frequency, power spectral density. Complexity parameters were used to measure the complexity of the signal in the time domain or the frequency domain. These features help to analyze the structure, patterns, and dynamic changes of the signal, including impulse factor, waveform factor, clearance factor, skewness coefficient, autocorrelation coefficient, kurtosis coefficient, Euclidean amplitude fusion and so forth.

Response

\(\% \Delta_{MDS{-}UPDRSIII}\) is defined as the measure of LR in classic ALCT and calculated with Eq. 3 (Table 1). An improvement of more than 30% in the total score on the MDS-UPDRS III after oral drug administration indicates a good response to dopaminergic drugs [1]. Patients with \(\% \Delta_{MDS – UPDRSIII} \ge 30\%\) had a clear benefit from dopaminergic therapy (LR +); otherwise, there was no benefit from dopaminergic therapy (LR−).

Table 1 Comparison between the LRR model and MSE model

We developed and compared two algorithms based on wearable sensors: the levodopa response regression model (LRR model) and the utility of motor symptom evaluation model (MSE model). For the LRR model, \(\% \Delta_{{{\text{LRR}}}}\) represents the predicted LR; for the MSE model, subjects performed iTUG tests under both ON- and OFF-medication statuses, and then his or her motor symptom severity scores were calculated with the MSE model for each iTUG (Table 1). \(\% \Delta_{{{\text{MSE}}}}\) was finally calculated with Eq. 4 (Table 1) to represent the predicted LR.

We examined agreement between \(\% \Delta_{{\text{MDS – UPDRSIII}}}\) measured in the classic ALCT (i.e., LR in the classic ALCT) and LRs measured by the LRR model and MSE model. The intraclass correlation coefficients (ICC(1,1)s) [38], root mean squared error (RMSE), mean absolute error (MAE), and correlation coefficients (Rho) [39] were used to measure the agreement between \(\% \Delta_{{\text{MDS – UPDRSIII}}}\) and \(\% \Delta_{{{\text{LRR}}}}\) or \(\% \Delta_{{{\text{MSE}}}}\). Measures for \(\% \Delta_{LRR}\)and \(\% \Delta_{{\text{MDS – UPDRSIII}}}\) were calculated as follows, and measures for \(\% \Delta_{MSE}\) and \(\% \Delta_{{\text{MDS – UPDRSIII}}}\) were calculated with \(\% \Delta_{LRR}\) replaced by \(\% \Delta_{MSE}\):

$$MSE = \frac{1}{N}\sum\limits_{i = 1}^{N} {\left( {\% \Delta_{LRRi} – \% \Delta_{MDS – UPDRSIIIi} } \right)}^{2}$$

(5)

$$RMSE = \sqrt {MSE}$$

(6)

$$MAE = \frac{1}{N}\sum\limits_{i = 1}^{N} {\left| {\% \Delta_{LRRi} – \% \Delta_{MDS – UPSRSIIIi} } \right|}$$

(7)

$$Rho = \frac{{{\text{cov}} \left( {\% \Delta_{LRR} ,\,\% \Delta_{MDS – UPDRSIII} } \right)}}{{\sqrt {{\text{var}} \left( {\% \Delta_{LRR} } \right){\text{var}} \left( {\% \Delta_{MDS – UPDRSIII} } \right)} }}$$

(8)

Recall, precision and accuracy, calculated as follows, were used to measure the performance of those algorithms for distinguishing patients who were positive or negative for levodopa (1 = LR + (positive for levodopa), 0 = LR− (negative for levodopa)).

True positive (TP) = the number of cases correctly classified as LR + ;

False positive (FP) = the number of cases incorrectly classified as LR + ;

True negative (TN) = the number of cases correctly classified as LR−;

False negative (FN) = number of cases incorrectly classified as LR−;

$$Recall = \frac{TP}{{TP + FN}}$$

(9)

$$Precision = \frac{TP}{{TP + FP}}$$

(10)

$$Accuracy = \frac{TP + TN}{{TP + FP + TN + FN}}$$

(11)

$$Specificity = \frac{TN}{{TN + FP}}$$

(12)

Motor symptom evaluation model

For motor symptom evaluation models (MSE model), patients were evaluated under both the OFF-medication state and the ON-medication state. Total scores on the MDS-UPDRS III were the main endpoints of the MSE model. Extreme gradient boosting models (XGBoost) [40] were used to map features to the scores. The “objective” hyperparameter was set as “reg:squarederror”, “eta” was set as 0.25, “min_child_weight” was set as 5, “max_depth” was set as 4 while keeping other hyperparameters as default.

Feature selection was embedded in the training process. First, feature importance was assessed through XGBoost algorithm so that feature importance score of each feature for all the iTUG features could be obtained from a trained XGBoost predictive model. Second, the first 50 features with the largest importance score (gain) were selected for further analysis. Third, tenfold cross-validation as training validation and leave-one-subject-out cross-validation (LOOCV) as testing validation were used to evaluate the performance of the models by starting with the top 5 highest gain features and adding 5 more features at a time until all the 50 features were tried to be included in the predictive model (5 features, 10 features, 15 features, …), yielding 10 feature sets totally. For individual candidate feature set, we built models for 42 epochs, as 42 participants were included in this study. Detailly, for each epoch, one subject with 2 records (before and post drug) was left out as testing validate sample, other 41 subjects were used to developed XGBoost model with tenfold cross-validation evaluating training performance, and total scores of the left-out subject were predicted with the developed models. In order to eliminate collinear features, we adopt the Pearson correlation coefficient as the measure of feature correlation, and randomly remove one of the two features if the correlation coefficient is greater than or equal to 0.6. Thus, we constructed 42 models predicted 42 individuals, training and testing validations were performed with tenfold cross-validation and LOOCV, respectively. MAE, RMSE, and R-squared between the predicted and original MDS-UPDRS III total scores were calculated to illustrate the performance. The model structure with the highest R-squared value indicates the best fit to the data and selected as the best MSE model. During the feature and model selection section, we used constant hyperparameters aimed to ensure feasibility.

$$R – squared = 1 – \frac{{\sum\limits_{i = 1}^{n} {\left( {y_{origin\_i} – \widehat{y}_{predict\_i} } \right)^{2} } }}{{\sum\limits_{i = 1}^{n} {\left( {y_{origin\_i} – \frac{1}{n}\sum\limits_{i = 1}^{n} {\widehat{y}_{predict\_i} } } \right)^{2} } }}$$

(13)

As we mentioned in “Response” section participants’ symptom severity scores were calculated with the above selected best MSE model for each iTUG (ON, OFF). And \(\% \Delta_{{{\text{MSE}}}}\) was finally calculated with Eq.  4 (Table 1) to represent the predicted LR. Finally, ICC, RMSE, MAE, and Rho were calculated between LR in classic ALCT,\(\% \Delta_{{\text{MDS – UPDRSIII}}}\) and the predicted ones \(\% \Delta_{{{\text{MSE}}}}\).

Levodopa response regression model

LR (\(\% \Delta_{{\text{MDS – UPDRSIII}}}\) was calculated with Eq. 3) was defined as the effect on the total MDS-UPDRS III score induced by the classic ALCT. A total of 8380 (4190 × 2) motion features were used to represent movement changes among medication statuses, calculated with Eqs.  14 and 15. \(Feature_{OFF}\) represents 4190 features extracted for the off-medication iTUG test, and \(Feature_{ON}\) represents 4190 features extracted for the on-medication iTUG test.

$$\% \Delta F_{relative} = (Feature_{OFF} – Feature_{ON} )/Feature_{OFF}$$

(14)

$$\% \Delta F_{absolute} = (Feature_{OFF} – Feature_{ON} )$$

(15)

XGBoost algorithm was used to map the above movement change features to \(\% \Delta_{{\text{MDS – UPDRSIII}}}\). The same methods as the MSE model were employed in the levodopa regression model (LRR model), including the feature selection procedure and validation method. Then ICC, RMSE, MAE, and Rho between the predicted LR, \(\% \Delta_{LRR}\) and LR in classic ALCT, \(\% \Delta_{{\text{MDS – UPDRSIII}}}\) were calculated to illustrate the performance. The model structure with the highest R-squared value indicates the best fit to the data and selected as the best in tenfold cross-validation. LOOCV was used as testing validation, the high performance in which was regarded as high generalizability.

Statistical analysis

Demographics and clinical characteristics were summarized using either means and standard deviations or frequencies and percentages as appropriate. Statistical significance was achieved for results in which P < 0.05 (2-sided). The importance of the selected features is measured by the correlation coefficient and the gain index in the XGBoost importance function. In addition, the features are classified into several categories, and the importance of a certain feature category is measured by the proportion of the sum of gains of features in the class over the sum of gains of all features. Statistical analyses were conducted using R version 4.1.0 (R Foundation for Statistical Computing, Vienna, Austria) with RStudio version 1.4.1717 (RStudio, PBC., Boston, MA).



Source link