Scientific Papers

Cultural translation of the ethical dimension: a study on the reliability and validity of the Chinese nurses’ professional ethical dilemma scale | BMC Nursing


The sample consisted of 448 registered nurses, with the majority (73.4%) being female. The age distribution showed that 50.4% were between 20-30 years old, 38.4% were 31-40 years old, with the remaining 11.2% aged 41 years or above. Regarding education, 67.9% held an undergraduate degree, 28.8% had a community college diploma, and 3.3% possessed a master’s degree. In terms of work experience, 40.0% had 6-10 years of nursing experience, 28.8% had 1-5 years, 25.9% had over 10 years, and 5.3% had less than 1 year of experience. The median score for participants’ MD-APPS is 33 points, with a minimum score of 18 points and a maximum score of 48 points. Detailed descriptive information is provided in Table 1.

Table 1 Demographic and clinical characteristics of the study population (N = 448)

Construct validity

Exploratory factor analysis

In this study, to assess the structural validity of the scale, Exploratory Factor Analysis (EFA) was utilized. Before conducting the EFA, a normality test for the questionnaire items was performed using the Shapiro-Wilk test, which is widely recognized for its power in detecting departures from normality. We utilized R’s built-in function for the Shapiro-Wilk test to assess normality. Additionally, we examined the skewness and kurtosis values for each item using the ‘psych’ package in R.

The results indicated that the scores for all items did not conform to a normal distribution (P < 0.0500 for all items in the Shapiro-Wilk test). This non-normality is likely due to the characteristic distribution of data often seen in social science research, particularly with Likert-scale items. We conducted skewness and kurtosis analyses on all eight items to assess the normality of data distribution. For large samples (n > 300), an absolute skewness value greater than 2 or an absolute kurtosis value greater than 7 indicates significant non-normality [23]. Our analysis showed that the skewness and kurtosis values for all items did not exceed these thresholds. Specific skewness and kurtosis values can be found in Supplementary Table 1, Although the skewness and kurtosis values of all items did not exceed the threshold for significant non-normality, the data still exhibited slight characteristics of non-normal distribution. All items displayed negative skewness, indicating that the data distribution is slightly tilted to the right. This mild non-normality should be taken into account when interpreting subsequent analyses and results.

Given this observation, a polychoric correlation matrix was chosen for conducting EFA corrections, a method suitable for analyzing the relationships between non-normally distributed variables with inherent ordinal levels. To determine whether the sample was suitable for factor analysis, the Bartlett’s Test of Sphericity and Kaiser–Meyer-Olkin (KMO) test were conducted. The KMO test result showed a KMO coefficient of 0.7600 for this study, indicating that the sample has moderate to high commonality, meaning that there is a significant shared variance among variables, making them suitable for factor analysis. The Bartlett’s Test of Sphericity produced a highly significant statistical result (χ2 (28) = 1080.56, p < 0.0500), suggesting that the variable correlations in the data are non-random and statistically significant.

Typically, the closer the KMO (Kaiser-Meyer-Olkin) coefficient is to 1, the more suitable the sample is for factor analysis. For the Bartlett test, an ideal outcome would be a p-value less than 0.0500, which was achieved in this study. Based on these criteria, we can conclude that the data from this Chinese version of the scale is highly suitable for factor analysis.

In the factor analysis step, the two-factor model identified for the Chinese version of the scale accounted for a total variance of 56.34% (see Fig. 1), indicating appropriate structural validity. Moreover, the factor loadings of individual items ranged from 0.63 to 0.845 (see Table 2), suggesting that, with a few exceptions, all items were highly related to their corresponding factors. However, in our analysis, item 1 had a relatively low factor loading of 0.21, which may be due to cultural differences [24]. Based on this result, we decided to remove item 1 from the scale to improve the overall quality of the instrument.

Fig. 1
figure 1

Screen plot of exploratory factor analysis for the Chinese version of MD-APPS

Table 2 Factor analysis and total correlation of MD-APPS Chinese version (N = 448)

Non-response bias analysis or early-late respondent analysis

The analysis of potential non-response bias revealed several significant findings. With regard to gender distribution, no significant difference was observed between early and late respondents (χ2 = 0.4536, p = 0.5006). Similarly, the distribution of satisfaction levels showed no statistically significant variation between the two groups (χ2 = 1.7233, p = 0.1893).

However, significant differences were noted in other key variables. The age distribution exhibited a marked difference between early and late respondents (W = 4850.5, p = 0.0011). Additionally, the distribution of years of professional experience differed significantly between the two groups (W = 5251.5, p = 0.0270).

The most pronounced disparity was observed in the total scores. Early and late respondents demonstrated a highly significant difference in this measure (t = -4.9396, p < 0.0010). Early respondents had a mean total score of 32.4 (SD = 5.08), whereas late respondents showed a higher mean score of 36.3 (SD = 6.52).

These findings suggest that while gender and satisfaction levels remained consistent across response timing, other crucial factors such as age, professional experience, and overall scores varied significantly between early and late respondents.

Common method bias analysis

To address potential common method bias, Harman’s single-factor test was conducted. The analysis revealed that one factor accounted for 27.94% of the variance, which is well below the 50% threshold. This result suggests that common method bias is not a major concern in our data, lending additional credibility to the survey methodology employed in this study.

Confirmatory factor analysis

A Confirmatory Factor Analysis (CFA) was conducted on an independent sample of 225 participants utilizing the maximum likelihood robust (MLR) estimation method within the lavaan package. The MLR method not only corrects for bias in non-normal data but is also suitable for smaller sample sizes, allowing for more accurate model estimation and inference. This is attributed to the MLR estimation method’s adjustment for standard errors and fit statistics to accommodate the non-normality and kurtosis of the data distribution. The selection of this method ensured the robustness and reliability of our model evaluation results. The analysis demonstrated that factor loadings ranged from 0.5900 to 0.8400 [25], with all factor loadings exceeding the critical value of 0.7, except for the second factor (Fig. 2). The revised Chinese version of the MD-APPS model presented a good model fit, as evidenced by: a Chi-square (χ2) value of 20.05, degrees of freedom (df) of 13, a Robust Root Mean Square Error of Approximation (Robust RMSEA) of 0.050, and a Standardized Root Mean Square Residual (SRMR) of 0.0410 [26].

Fig. 2
figure 2

Factor analysis path diagram the Chinese version of MD-APPS

A chi-square to degrees of freedom ratio (χ2/df) below 2 is generally considered indicative of good model compatibility. In this analysis, the Mandarin version of MD-APPS demonstrated a χ2/df of 1.5420, suggesting a potentially good fit. Additionally, values for the adjusted Tucker-Lewis Index (Robust TLI), reaching 0.9740, and for the adjusted Comparative Fit Index (Robust CFI), achieving 0.9840, both surpass traditional acceptance benchmarks, indicating a promising fit of the model. The significance level associated with the adjusted chi-square statistic was recorded at 0.0940, suggesting that the differences between the hypothesized model and the observed data were not statistically significant at the conventional p < 0.0500 level. While these results are encouraging, it’s important to note that model fit indices should be interpreted cautiously and in conjunction with other validity evidence. Further validation studies with diverse samples would be beneficial to fully establish the robustness of these preliminary findings.

Reliability analysis

The reliability of the Chinese version of the MD-APPS was assessed using both Cronbach’s alpha and McDonald’s omega coefficients, providing a comprehensive evaluation of the scale’s internal consistency. The standardized Cronbach’s alpha coefficient for the overall Chinese version of the MD-APPS was 0.74, suggesting acceptable internal consistency. Specifically, the standardized α coefficients for the first and second dimensions of the scale were 0.82 and 0.78, respectively, indicating good reliability for these subscales. We also calculated McDonald’s omega coefficients. The overall McDonald’s omega for the Chinese MD-APPS was 0.73, closely aligning with the Cronbach’s alpha result. For the individual dimensions, the omega coefficients were 0.74 for the first dimension and 0.71 for the second dimension. These results provide additional support for the scale’s reliability, as omega is considered to be a more robust estimate of reliability, especially when the assumptions of tau-equivalence may not be fully met. The total correlation coefficient among all items ranged from 0.409 to 0.496, revealing moderate correlations among the items. These statistical performances suggest that the Chinese version of the MD-APPS demonstrates promising reliability, potentially suitable for related research and practice fields, though further validation would be beneficial. The consistency between Cronbach’s alpha and McDonald’s omega results provides preliminary evidence for the stability and internal consistency of the scale as a measurement tool. This dual approach to reliability assessment offers encouraging support for the psychometric quality of the Chinese MD-APPS. However, it’s important to note that while these initial results are positive, additional studies with diverse samples would be valuable to further establish the scale’s reliability and validity in various Chinese nursing contexts.

Content validity

In the process of culturally adapting the tool, based on assessments from eight specialists, the individual item validity scores (I-CVI) for the Mandarin adaptation of MD-APPS ranged between 0.8 and 1.0. Simultaneously, the overall measure of content agreement (S-CVI/UA) achieved a score of 0.9. These results suggest that, following detailed assessments by experienced experts, each aspect of the instrument was positively evaluated, and collectively, the instrument demonstrated good content validity.

Test–retest reliability

To evaluate the scale’s stability, thirty nursing professionals were chosen for a follow-up assessment after a two-week interval. This specific timeframe was selected based on recommendations in psychometric literature [27] suggesting that a period of 2-4 weeks is optimal for test-retest reliability in psychological measures. This interval is considered long enough to minimize memory effects while being short enough to avoid significant changes in the construct being measured. To address potential memory effects, participants were asked about their recall of previous responses at the time of the retest. The majority reported little to no recollection of their exact answers, suggesting minimal impact of memory on the retest results. Additionally, the order of items was randomized in the retest to further mitigate any potential memory effects.

Observing that the dataset diverged from normal distribution assumptions, Spearman’s rank-order correlation was applied to accommodate the dataset’s non-parametric characteristics. This statistical approach is designed to determine the magnitude and orientation of monotonic relationships among paired data, especially apt for datasets deviating from a Gaussian distribution.

The findings revealed that the Spearman coefficients for individual item consistency over the two sessions varied between 0.9000 and 0.9920 (P < 0.0500), indicating exceptionally high reliability across various items. By calculating the average of these individual coefficients, the aggregate mean consistency coefficient was discerned to be 0.9640. Such a notable mean coefficient underscores the instrument’s temporal stability, affirming its capability to yield dependable and steady outcomes even when applied to datasets that are not normally distributed.



Source link