Significant expansion in participants from 2022 to 2023
Initially, 173 institutions signed up to participate in this national PT program. However, we finally collected valid results from 169 institutions across 30 provinces/municipalities/autonomous regions in China (Fig. 1a). It demonstrated an approximate 76% increase, from 96 participant institutions in 2022 to a striking 169 in 2023. Moreover, the geographic distribution of our PT scheme expanded from 26 to 30 provinces/municipalities/autonomous regions across China. Particularly, Guangdong province shows a dramatic increase from 4 to 19 participant institutions, matching Jiangsu province, that also increased to 19 participants in 2023 (Fig. 1b). Meanwhile, among the 169 institutions, 84% of them were general hospitals, and 15% were specialized hospitals, and the remaining one (1%) was an independent pathology center (Fig. 1c). Furthermore, 93% of the total participating institutions were tertiary grade A hospitals which represent the highest level of healthcare facilities in China, offering comprehensive and specialized medical services, advanced teaching, and research capabilities (Fig. 1d). This broader engagement affords a more comprehensive assessment of pathologists’ diagnostic proficiencies in China.
Statistic-based assigned values generation
In this round of PT scheme, statistic-based assigned values were evaluated based on a two-stage process, as outlined in the methodology section. The detailed consensus of each testing item, assessed by 22 attending pathologists, is depicted in Fig. 2a and b. All five cases of histologic type and grade achieved a consensus of 70% or higher among the 22 attending pathologists. Regarding the HER2 IHC scoring, cases A1 and A3 did not reach the 70% agreement threshold, leading to the assignment of values of HER2 2+/3+, which were subsequently confirmed as HER2 FISH positive. Case A4 also exhibited less than 70% agreement, resulting in an assigned value of HER2 1+/2+. As to ER expression, case A2 exhibited an assigned range of < 1%, whereas the remaining four cases displayed an assigned range of > 10%. Similarly, for PR expression, case A2 showed an assigned range of < 1%, while the other four cases exhibited an assigned range of ≥ 1% (Fig. 2a and b; Table 1). The representative images of A1 to A5 can be found in additional file (Additional file S2).
Among the fifteen cases of HER2 IHC slides evaluated, two cases (P1 and P9) displayed 100% consensus on a HER2 3 + score. Five cases (P4 ~ P8) demonstrated a HER2 2 + score with at least 70% agreement and were subsequently confirmed as HER2-FISH negative. Similarly, another five cases (P2, P3, P11, P13, and P15) achieved a HER2 1 + score with at least 70% agreement. Two cases (P10, P12) were assigned a HER2 0 score, also achieving at least 70% agreement. However, the remaining one case (p14) did not reach the 70% agreement threshold, resulting in the assignment of an equivocal HER2 score of 0/1+ (Fig. 2c; Table 2). The representative HER2 images of P1 to P15 can be found in additional file (Additional file S3).
Concordant rates of six key items evaluated by participants and their overall performance scores
In the five cases (A1 ~ A5), the PT program demonstrated satisfactory overall concordant rates (OCR) for histologic type (84.9%), histologic grade (81.9%), ER (99.4%), PR (99.1%), HER2 (88.8%), and Ki67 (95.9%) (Fig. 3a). Regarding the individual cases, IMC (A1) and C-AD (A2) exhibited the lowest two OCRs for histologic type, achieving 63.3% and 72.2%, respectively. Similarly, both cases also exhibited the lowest two OCRs for histologic grade (71.0% and 74.6%, respectively). As to the HER2 IHC scoring, A5 demonstrated the least OCR, which is only 58.0% (Fig. 3a).
While, among the fifteen HER2 IHC slides, the OCR was 80.9% (Fig. 3b). Within the subset categorized as HER2 0, 1 + and 2+/FISH-, there was a decrease in OCR, with a rate of 78.1%. As regarding to the individual HER2 slide, the lowest concordant rate was observed in cases P5 and P8 with HER2 2 + score, showing 45.6% and 50.3%, respectively. Conversely, the remaining thirteen cases achieved a concordant rate exceeding 70% for their respective scores. As expected, the cases (P1 and P9) with HER2 3 + score demonstrated excellent concordant rates, with rates exceeding 98% (Fig. 3b).
Among the 169 participants, the median overall performance scores were 90.0. The maximum and minimum scores were 99 and 72, respectively, displaying a left-skewed distribution that fit well with the Weibull distribution model, characterized by a shape parameter of 19.89 and a scale of 92.32 (Fig. 3c). Consequently, all participants successfully passed this round of the PT scheme. Notably, 57% of them showed excellent performance, and 38% showed good performance (Fig. 3d). However, as the Weibull distribution analysis, nine participants were identified with overall performance scores below 79.5, exhibiting statistical significance with p < 0.05 (Fig. 3c). This finding highlighted the necessity for further identifying and addressing their specific areas of weakness to improve their diagnostic accuracy and competency.
Analysis of existing issues of HER2 0, 1 + and 2+/FISH- from the PT results
In the evaluation of HER2 0, 1 + and 2+/FISH- cases, some inaccuracies were observed among the 169 participants. For the HER2 0 cases, an average of 21.6% of participants erroneously classified those as HER2 1+. In the HER2 1 + category, an average of 3.8% of participants misclassified these as HER2 0, and 16.2% misclassified them as HER2 2+. Furthermore, for the HER2 2 + category, an average of 24.9% of participants incorrectly classified these as HER2 1+, and 2.5% misclassified them as HER2 3+(Fig. 4a). The detail results of each category can be found in Fig. 4b.
Regarding the accuracy of sixteen cases of HER2 0, 1 + and 2+/FISH-, the median accuracy was 81.2% (13/16) (Fig. 4c). When it come to the individual participant, two institutions provided correct answers only for 8 out of 16 cases, achieving the lowest accuracy rate of 50% (Fig. 4c). Conversely, the other four institutions correctly answered all 16 cases, reaching the highest accuracy rate of 100% (16/16) (Fig. 4c). Based on the Weibull distribution, characterized by a shape parameter of 8.58 and a scale of 83.21, accuracy levels below 59% were considered unsatisfactory since they were statistically significant (p < 0.05) (Fig. 4c). Accordingly, ten participants were identified.
Interestingly, upon detailed analysis of these ten participants, systemic biases in the evaluation of HER2 0, 1 + and 2+/FISH- cases were demonstrated (Fig. 4d). Among them, four participants (U8, U9, U10, and U11) exhibited a tendency to overestimate the HER2 score, frequently classifying HER2 1 + as HER2 2+, and HER 0 as HER2 1+, as depicted in Fig. 4d. Conversely, two participants (U7 and U12) tended to classify HER2 2 + as HER2 1+ (Fig. 4d). Therefore, these findings indicated the presence of systemic bias in the evaluation of HER2 0, 1 + and 2+/FISH- in a few institutions.
Meanwhile, our PT scheme also revealed that combining HER2 1 + and 2+/FISH- categories together could result in a higher concordance rate. For the combined group (HER2 1+ & 2+/FISH-), an average of 96.8% of participants demonstrated correct classification. While, for the HER2 0 cases, an average of 78.1% of participants demonstrated accurate classification (Fig. 5a). The detail results can be found in Fig. 5b.
Analysis of existing issues of histologic type and grade from the PT results
There were several issues identified in the evaluation of histologic type and grade. Regarding histologic type, 31.6% of participants incorrectly diagnosed IMC as IBC-NST, and 24.0% misclassified C-AD as IBC-NST (Fig. 6a). Notably, one participant (U1) exhibited a low concordant rate of 20% for histologic type evaluation, diagnosing all cases as IBC-NST (Fig. 6b).
For histologic grading, an average of 7.1% of participants inaccurately classified grade 2 tumors as grade 1, while an average of 13.1% misclassified them as grade 3. The detailed misclassification percentages for the five cases were depicted in Fig. 6c. As to the individual institution, participant U5 did not grade ILC and assigned a grade 3 to all the other three cases (Fig. 6d). Another participant (U6) consistently graded all five cases as grade 3 (Fig. 6d). Conversely, participant U7 tended to assign grade 1 to all five cases, representing the opposite extreme in grading tendencies (Fig. 6d). Therefore, these findings indicated the presence of systemic bias in the diagnosis of histologic type and grade in a few institutions.
Add Comment