### Patient dataset

110 left-sided breast cancer patient undergone breast-conserving surgery (BCS) and eligible for whole breast irradiation (WBI) plus boost irradiation were enrolled in this study. The median age of patients was 50 years (range, 44–59 years), and the pathological diagnosis was all invasive ductal carcinoma with a stage of T1-T2N0M0. No patient received oncoplastic surgery. All patients underwent a lumpectomy with sentinel lymph node dissection. Tumor-negative margins were ensured during a single operation. Equal or more than 5 surgical clips were used to mark the boundaries of the lumpectomy cavity. All enrolled patients had either no seroma or a seroma clarity score of ≤ 3 in the lumpectomy cavity. This study was approved by the Institutional Ethics Committee of Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College. Consent was waived due to the retrospective nature of the study.

The patient dataset consisted of 110 pairs of preoperative and postoperative CTs, which were acquired in the supine position. The preoperative CT was acquired averagely one week before surgery. They were reconstructed with dimensions of 512 × 512, slice thickness of 5.0 mm, and pixel size of 0.68–0.94 mm. The postoperative CT was acquired averagely 10 weeks after surgery and used for the purpose of radiotherapy treatment planning. They were reconstructed with dimensions of 512 × 512, slice thickness of 5.0 mm, and pixel size of 1.18–1.37 mm. All CTs were pre-processed using 3D Slicer (RRID:SCR_005619) [13, 14]. They were first resampled to an isotropic resolution of 1 × 1 × 5 mm and then cropped to dimensions of 256 × 256 × 32 around the breast’s centroid [15].

### Contour delineation

The distribution of multiple regions of interest (ROIs) on preoperative and postoperative CTs is illustrated in Fig. 1. Before surgery, patient was CT scanned for diagnostic purpose and the location of primary tumor (PT) was manually delineated by radiation oncologist. After surgery, the volume of actually excised tissue (pathological volume, PV) was estimated by its maximum diameter in three dimensions (provided in pathological report). And the excision volume (EV) on preoperative CT image is estimated by adding a given margin to PT as shown in Fig. 1A. Different margin was tested from 1 to 3 cm. And it was found that the volume of PT with 2 cm margin is closest to the volume of PV. Thus, 2 cm margin to PT on preoperative CT image was used to represent excision volume (EV).

In a few weeks after surgery, patient was CT scanned again and proceeded to radiotherapy. As shown in Fig. 1B, clinical target volume of tumor bed (CTV-TB) is generated by adding 1 cm margin to the contour of tumor bed (TB). The margin was used to account for the subclinical lesions and potential invaded regions. In practice, the contour of TB was manually delineated by radiation oncologist according to the surgical marks and postoperative changes. Due to the poor clarity of lumpectomy cavity and relatively low soft tissue contrast, TB contouring is difficult and challenging.

### Prior information

As the location of TB is at the same place of EV before surgery, the TB contour on postoperative CT would highly correlates with the EV contour on preoperative CT. Accordingly, the TB contour plus 1 cm margin (TB_{1cm}), i.e. CTV-TB, on postoperative CT would highly correlate with the EV contour plus 1 cm margin (EV_{1cm}) on preoperative CT. Therefore, it would be reasonable to create virtual EV_{1cm} on postoperative CT, and used it as prior location information in searching for CTV-TB contour on postoperative CT. For reaching this goal, the deformable image registration (DIR) between preoperative and postoperative CTs was performed on Elastix (RRID:SCR_009619) [16, 17]. As a result, the deformation vector field (DVF) was obtained and used to generate the transformed EV_{1cm} (T-EV_{1cm}) on postoperative CT from the EV_{1cm} on preoperative CT.

To enhance the effect of tumor contour on CTs, the regions of EV_{1cm} and T-EV_{1cm} were processed via image enhancement tool. In detail, the pixel values within these ROIs were multiplied by an integer number such as 25, while the pixel values outside them was multiplied by a fraction number such as 0.1. The effect of CT images before and after image enhancement is shown in Fig. 2. The preoperative and postoperative CTs before image enhancement are shown in Fig. 2A, B, while the preoperative and postoperative CTs after image enhancement are shown in Fig. 2C, D. Clearly, the intensities of tumor contours on CTs were significantly enlarged comparing with those of the surrounding tissue.

### Deep-learning model

A 3D U-Net used to solve many segmentation problems was employed in this study [18,19,20]. The detail of network architecture and setting was described in Additional file 1. In brief, it has an encoder part to analyze the whole image and a decoder part to produce full resolution segmentation. 3D U-Net takes 3D volume as inputs and applies 3D convolution, 3D max-pooling and 3D up-convolutional layers which has an entirely 3D architecture. In this study, there were two 3D input channels (enhanced preoperative and postoperative CT) and one 3D output channel (predicted label) in the deep learning model. A five-fold cross-validation was applied to the 110 patient dataset. One fold (22 patients) was used for testing, and the remaining four folds (88 patients) were used for training.

The weights of convolution layers are initialized by a normal distribution according to the published studies [18, 19]. The Dice similarity coefficient (DSC) was used as the loss function [19]. The Adaptive moment estimation (Adam) with batch size of 4 and weight decay of 3e−5 was used for optimization [21]. The initial learning rate was set as 0.0005, the learning rate drop factor as 0.95, and the validation frequency as 20. The network was implemented with Matlab (version 2020a) (MathWorks, Natick, MA 01760) and trained with maximal 50 epochs. The test was performed on a workstation equipped with one NVIDIA Geforce GTX 1080 TI GPU.

### Auto-segmentation of CTV-TB

The overall workflow for segmenting CTV-TB on postoperative CT is shown in Fig. 3 and the main steps are labeled by numbers. (1) Both preoperative and postoperative CTs were registered by DIR. As a result, the DVF was obtained. (2) T-EV_{1cm} on postoperative CT was generated by deforming EV_{1cm} on preoperative CT via the obtained DVF. (3) Both EV_{1cm} and T-EV_{1cm} were processed by image enhancement tool and the resulting 3D images were fed into the deep-learning model. (4) The CTV-TB contour on postoperative CT was predicted by the deep learning model. (5) The similarity between the predicted and clinically approved CTV-TB contours was evaluated.

### Evaluations

The DSC and Hausdorff distance (HD) were used to evaluate the similarity between the predicted and clinically approved contours of CTV-TB on postoperative CT. The DSC is defined as follows [22]:

$${\text{DSC}}\left( {{\text{A}},{\text{B}}} \right) = \frac{{2\left| {{\text{A}} \cap {\text{B}}} \right|}}{{\left| {\text{A}} \right| + \left| {\text{B}} \right|}}$$

(1)

where *A* is the clinically approved CTV-TB contour manually delineated by the radiation oncologist and *B* is the predicted CTV-TB contour by the model. *A* ∩ *B* is the volume that *A* and *B* have in common. The DSC results in values between 0 and 1, where 0 represents no intersection and 1 reflects perfect overlap. The HD is defined as [23]:

$${\text{HD}}\left( {{\text{A}},{\text{B}}} \right) = {\text{max}}\left( {{\text{h}}\left( {{\text{A}},{\text{B}}} \right),{\text{h}}\left( {{\text{B}},{\text{A}}} \right)} \right)$$

(2)

where

$${\text{h}}\left( {{\text{A}},{\text{B}}} \right) = \mathop {\max }\limits_{{{\text{a}} \in {\text{A}}}} \left( {\mathop {\min }\limits_{{{\text{b}} \in {\text{B}}}} \parallel{\text{a}} – {\text{b}}} \parallel\right)$$

(3)

and \(\left\| \cdot \right\|\) is some underlying norm on the points of *A* and *B* (e.g., the L_{2} or Euclidean norm). \({\text{h}}\left( {{\text{A}},{\text{B}}} \right)\) identify the point *a* \(\in\) *A* that is farthest from any point of *B* and measures the distance from *a* to its nearest neighbor in *B*. The Hausdorff distance \({\text{HD}}\left( {{\text{A}},{\text{B}}} \right)\) is the maximum of \({\text{h}}\left( {{\text{A}},{\text{B}}} \right)\) and \({\text{h}}\left( {{\text{B}},{\text{A}}} \right)\) and measures the largest degree of mismatch between *A* and *B*. The overlap between *A* and *B* increases with smaller \({\text{HD}}\left( {{\text{A}},{\text{B}}} \right)\).

The performance of the deep-learning model with prior information (using Fig. 2C, D as inputs) was compared with the same 3D U-Net without prior information (using Fig. 2B, C as inputs). This will investigate the effect of prior information on segmentation accuracy of the deep-learning model. Five-fold cross-validation was used to tune the hyperparameters and the testing data were used to evaluate the performance of the final models. In addition, the performance of the traditional gray-level threshold method was also investigated. The gray-level threshold method partitions the gray levels in an image into two classes: those below a user-defined threshold and those above. In our study, CT values above threshold (40 HU) within the breast region were auto-segmented as CTV-TB contour. For statistical analysis, the paired t-test was performed if the data were normally distributed. Otherwise, the Wilcoxon Signed-Rank Test for Paired Samples (non-parametric test) was performed. A level of *P* < 0.05 was considered statistically significant. All statistical analyses were performed in R Project for Statistical Computing (RRID:SCR_001905) (version 3.6.3).

## Add Comment