Scientific Papers

Challenges and improvements in HER2 scoring and histologic evaluation: insights from a national proficiency testing scheme for breast cancer diagnosis in China | Breast Cancer Research


Significant expansion in participants from 2022 to 2023

Initially, 173 institutions signed up to participate in this national PT program. However, we finally collected valid results from 169 institutions across 30 provinces/municipalities/autonomous regions in China (Fig. 1a). It demonstrated an approximate 76% increase, from 96 participant institutions in 2022 to a striking 169 in 2023. Moreover, the geographic distribution of our PT scheme expanded from 26 to 30 provinces/municipalities/autonomous regions across China. Particularly, Guangdong province shows a dramatic increase from 4 to 19 participant institutions, matching Jiangsu province, that also increased to 19 participants in 2023 (Fig. 1b). Meanwhile, among the 169 institutions, 84% of them were general hospitals, and 15% were specialized hospitals, and the remaining one (1%) was an independent pathology center (Fig. 1c). Furthermore, 93% of the total participating institutions were tertiary grade A hospitals which represent the highest level of healthcare facilities in China, offering comprehensive and specialized medical services, advanced teaching, and research capabilities (Fig. 1d). This broader engagement affords a more comprehensive assessment of pathologists’ diagnostic proficiencies in China.

Fig. 1
figure 1

Overall characteristics of participating institutions across China. a A noteworthy expansion in the number of participants is observed from 2022 to 2023. b The geographical distribution of the PT scheme has extended from 26 to 30 provinces/municipalities/autonomous regions across China. c The classification of institution types among the 169 participating entities is depicted in the donut charts. d The classification of institution levels among the 169 participants is presented in the donut charts

Statistic-based assigned values generation

In this round of PT scheme, statistic-based assigned values were evaluated based on a two-stage process, as outlined in the methodology section. The detailed consensus of each testing item, assessed by 22 attending pathologists, is depicted in Fig. 2a and b. All five cases of histologic type and grade achieved a consensus of 70% or higher among the 22 attending pathologists. Regarding the HER2 IHC scoring, cases A1 and A3 did not reach the 70% agreement threshold, leading to the assignment of values of HER2 2+/3+, which were subsequently confirmed as HER2 FISH positive. Case A4 also exhibited less than 70% agreement, resulting in an assigned value of HER2 1+/2+. As to ER expression, case A2 exhibited an assigned range of < 1%, whereas the remaining four cases displayed an assigned range of > 10%. Similarly, for PR expression, case A2 showed an assigned range of < 1%, while the other four cases exhibited an assigned range of ≥ 1% (Fig. 2a and b; Table 1). The representative images of A1 to A5 can be found in additional file (Additional file S2).

Fig. 2
figure 2

The detailed consensus of each testing item assessed by 22 attending pathologists. a The detailed consensus of histologic type, grade and HER2 scoring across five cases (A1 ~ A5). b The median values along with the minimum and maximum values of ER, PR, and Ki67 for the same five cases (A1-A5). c The detailed consensus of HER2-IHC scoring among the fifteen cases of HER2 IHC slides

Among the fifteen cases of HER2 IHC slides evaluated, two cases (P1 and P9) displayed 100% consensus on a HER2 3 + score. Five cases (P4 ~ P8) demonstrated a HER2 2 + score with at least 70% agreement and were subsequently confirmed as HER2-FISH negative. Similarly, another five cases (P2, P3, P11, P13, and P15) achieved a HER2 1 + score with at least 70% agreement. Two cases (P10, P12) were assigned a HER2 0 score, also achieving at least 70% agreement. However, the remaining one case (p14) did not reach the 70% agreement threshold, resulting in the assignment of an equivocal HER2 score of 0/1+ (Fig. 2c; Table 2). The representative HER2 images of P1 to P15 can be found in additional file (Additional file S3).

Table 1 Assigned values of six key pathological items across five cases
Table 2 Assigned values of HER2-IHC scoring among fifteen cases of HER2 IHC slides

Concordant rates of six key items evaluated by participants and their overall performance scores

In the five cases (A1 ~ A5), the PT program demonstrated satisfactory overall concordant rates (OCR) for histologic type (84.9%), histologic grade (81.9%), ER (99.4%), PR (99.1%), HER2 (88.8%), and Ki67 (95.9%) (Fig. 3a). Regarding the individual cases, IMC (A1) and C-AD (A2) exhibited the lowest two OCRs for histologic type, achieving 63.3% and 72.2%, respectively. Similarly, both cases also exhibited the lowest two OCRs for histologic grade (71.0% and 74.6%, respectively). As to the HER2 IHC scoring, A5 demonstrated the least OCR, which is only 58.0% (Fig. 3a).

Fig. 3
figure 3

Concordant rates of six key items evaluated by participants and their overall performance scores. a Overall concordant rates (OCR) for histologic type, histologic type, ER, PR, HER2, and Ki67 across five cases (A1 ~ A5). b OCR for HER2 IHC scoring among the fifteen HER2 IHC slides. c The Weibull distribution of overall performance scores across 169 participants. d The overall performance scores are divided into three categories, delineating pass, good, and excellent levels

While, among the fifteen HER2 IHC slides, the OCR was 80.9% (Fig. 3b). Within the subset categorized as HER2 0, 1 + and 2+/FISH-, there was a decrease in OCR, with a rate of 78.1%. As regarding to the individual HER2 slide, the lowest concordant rate was observed in cases P5 and P8 with HER2 2 + score, showing 45.6% and 50.3%, respectively. Conversely, the remaining thirteen cases achieved a concordant rate exceeding 70% for their respective scores. As expected, the cases (P1 and P9) with HER2 3 + score demonstrated excellent concordant rates, with rates exceeding 98% (Fig. 3b).

Among the 169 participants, the median overall performance scores were 90.0. The maximum and minimum scores were 99 and 72, respectively, displaying a left-skewed distribution that fit well with the Weibull distribution model, characterized by a shape parameter of 19.89 and a scale of 92.32 (Fig. 3c). Consequently, all participants successfully passed this round of the PT scheme. Notably, 57% of them showed excellent performance, and 38% showed good performance (Fig. 3d). However, as the Weibull distribution analysis, nine participants were identified with overall performance scores below 79.5, exhibiting statistical significance with p < 0.05 (Fig. 3c). This finding highlighted the necessity for further identifying and addressing their specific areas of weakness to improve their diagnostic accuracy and competency.

Analysis of existing issues of HER2 0, 1 + and 2+/FISH- from the PT results

In the evaluation of HER2 0, 1 + and 2+/FISH- cases, some inaccuracies were observed among the 169 participants. For the HER2 0 cases, an average of 21.6% of participants erroneously classified those as HER2 1+. In the HER2 1 + category, an average of 3.8% of participants misclassified these as HER2 0, and 16.2% misclassified them as HER2 2+. Furthermore, for the HER2 2 + category, an average of 24.9% of participants incorrectly classified these as HER2 1+, and 2.5% misclassified them as HER2 3+(Fig. 4a). The detail results of each category can be found in Fig. 4b.

Fig. 4
figure 4

Analysis of existing issues of HER2 0, 1 + and 2+. a Average percentages of 169 participants evaluation in individual categories of HER2 0, 1 + and 2+. b The detailed misclassification percentages of HER2 0, 1 + and 2 + evaluation among the 169 participants. c The Weibull distribution of accuracy of sixteen cases with HER2 0, 1 + and 2+. d Detailed scoring results of sixteen cases evaluated by ten participants who showed accuracy rates below 59%

Regarding the accuracy of sixteen cases of HER2 0, 1 + and 2+/FISH-, the median accuracy was 81.2% (13/16) (Fig. 4c). When it come to the individual participant, two institutions provided correct answers only for 8 out of 16 cases, achieving the lowest accuracy rate of 50% (Fig. 4c). Conversely, the other four institutions correctly answered all 16 cases, reaching the highest accuracy rate of 100% (16/16) (Fig. 4c). Based on the Weibull distribution, characterized by a shape parameter of 8.58 and a scale of 83.21, accuracy levels below 59% were considered unsatisfactory since they were statistically significant (p < 0.05) (Fig. 4c). Accordingly, ten participants were identified.

Interestingly, upon detailed analysis of these ten participants, systemic biases in the evaluation of HER2 0, 1 + and 2+/FISH- cases were demonstrated (Fig. 4d). Among them, four participants (U8, U9, U10, and U11) exhibited a tendency to overestimate the HER2 score, frequently classifying HER2 1 + as HER2 2+, and HER 0 as HER2 1+, as depicted in Fig. 4d. Conversely, two participants (U7 and U12) tended to classify HER2 2 + as HER2 1+ (Fig. 4d). Therefore, these findings indicated the presence of systemic bias in the evaluation of HER2 0, 1 + and 2+/FISH- in a few institutions.

Meanwhile, our PT scheme also revealed that combining HER2 1 + and 2+/FISH- categories together could result in a higher concordance rate. For the combined group (HER2 1+ & 2+/FISH-), an average of 96.8% of participants demonstrated correct classification. While, for the HER2 0 cases, an average of 78.1% of participants demonstrated accurate classification (Fig. 5a). The detail results can be found in Fig. 5b.

Fig. 5
figure 5

Analysis of the concordant rate of cases in HER2 0 and combined (HER2 1+&2+/FISH-) groups. a Average percentages of evaluation by 169 participants in individual categories of HER2 0 and combined groups. b Detailed misclassification percentages of HER2 0 and combined groups among the 169 participants

Analysis of existing issues of histologic type and grade from the PT results

There were several issues identified in the evaluation of histologic type and grade. Regarding histologic type, 31.6% of participants incorrectly diagnosed IMC as IBC-NST, and 24.0% misclassified C-AD as IBC-NST (Fig. 6a). Notably, one participant (U1) exhibited a low concordant rate of 20% for histologic type evaluation, diagnosing all cases as IBC-NST (Fig. 6b).

Fig. 6
figure 6

Analysis of existing issues of histologic type and grade. a The detailed misclassification percentages of histologic type among the 169 participants for each case (A1 ~ A5). b The categories of histologic type among 169 participants, and their detailed evaluation results of those participants categorized in the least two accuracy groups. c The detailed misclassification percentages of histologic grade among the 169 participants for each case (A1 ~ A5). d The categories of histologic grade among 169 participants, and their detailed evaluation results of those participants categorized in the least two accuracy groups

For histologic grading, an average of 7.1% of participants inaccurately classified grade 2 tumors as grade 1, while an average of 13.1% misclassified them as grade 3. The detailed misclassification percentages for the five cases were depicted in Fig. 6c. As to the individual institution, participant U5 did not grade ILC and assigned a grade 3 to all the other three cases (Fig. 6d). Another participant (U6) consistently graded all five cases as grade 3 (Fig. 6d). Conversely, participant U7 tended to assign grade 1 to all five cases, representing the opposite extreme in grading tendencies (Fig. 6d). Therefore, these findings indicated the presence of systemic bias in the diagnosis of histologic type and grade in a few institutions.



Source link