Simulation design
We conducted comprehensive simulation studies to evaluate the performance of the proposed method. There were three covariates \({X}_{1},{X}_{2}, {X}_{3}\) distributed as standard normal with mean zero and unit variance, where \(\mathrm{corr}\left({X}_{1},{X}_{3}\right)=0.2\) and \({X}_{2}\) was independent of the \({X}_{1}\) and \({X}_{3}\). The treatment indicator \(Z\) was simulated from the Bernoulli distribution according to the following propensity score
$$\mathrm{logit}\left[\pi \left({\varvec{X}}\right)\right]= {(X}_{1}+{X}_{2}+{X}_{3})/3$$
The survival times were simulated from a CoxWeibull model with \({\widetilde{T}}_{1}={\left[\frac{\mathrm{log}\left(u\right)}{{\lambda }_{1}\mathrm{exp}\left(L\gamma \right)}\right]}^{\frac{1}{{v}_{1}}}\) for treatment group and \({\widetilde{T}}_{0}={\left[\frac{\mathrm{log}\left(u\right)}{{\lambda }_{0}\mathrm{exp}\left(L\right)}\right]}^{\frac{1}{{v}_{0}}}\) for control group, where \(L= 3+0.5*\left({X}_{1}+{X}_{2}\right)\), \(\gamma =1\), \(u\) was simulated from Uniform(0,1). And we defined the proportional hazard setting via setting the shape and scale parameters [(\({v}_{1}=0.005, {\lambda }_{1}=3\)), (\({v}_{0}=0.005, {\lambda }_{0}=3\))] in the treatment and control groups (Figure S1). An independent censoring time was simulated from an exponential distribution with rates, and we set the different rates [\(\mathrm{exp}\left(4\right)or\mathrm{exp}\left(3\right)\)] to obtain censoring rates (approximately 25% or 50%).
In this simulation, we specified two models
$${\mathbb{A}}=\left\{\begin{array}{c}logit\left[{\pi }^{1}\left({\varvec{X}};{{\varvec{\beta}}}^{1}\right)\right]=\left({X}_{1},{X}_{2},{X}_{3}\right){{\varvec{\beta}}}^{1}\\ logit\left[{\pi }^{2}\left({\varvec{X}};{{\varvec{\beta}}}^{2}\right)\right]=\left({X}_{1}^{2},{X}_{2}^{2},{X}_{3}^{2}\right){{\varvec{\beta}}}^{2}\end{array}\right\}$$
for propensity score, and two models
$${\mathbb{B}}=\left\{\begin{array}{c}{m}^{1}\left({\varvec{X}},{\varvec{Z}};{{\varvec{\gamma}}}^{1}\right)=\left(1,{X}_{1},{X}_{2},Z\right){{\varvec{\gamma}}}^{1}\\ {m}^{2}\left({\varvec{X}},{\varvec{Z}};{{\varvec{\gamma}}}^{2}\right)=\left(1,{X}_{1}^{2},{X}_{2}^{2},Z\right){{\varvec{\gamma}}}^{2}\end{array}\right\}$$
for outcome regression. According to the datagenerating process, \({\pi }^{1}\left({\varvec{X}};{{\varvec{\beta}}}^{1}\right)\) and \({m}^{1}\left({\varvec{X}},Z;{{\varvec{\gamma}}}^{1}\right)\) were correctly specified, while \({\pi }^{2}\left({\varvec{X}};{{\varvec{\beta}}}^{2}\right)\) and \({m}^{2}\left({\varvec{X}},Z;{{\varvec{\gamma}}}^{2}\right)\) were incorrectly specified.
We were interested in estimating the difference in survival functions \(\Delta \left({t}^{*}\right)\) at \({t}^{*}=10 and 20\). We performed IPW, direct confounding adjustment, and multiply robust (MR) methods to estimate the \(\Delta \left(t\right)\). To distinguish these estimation methods, two IPWbased estimators were denoted as “IPW.model1” and “IPW.model2”; two directconfoundingadjustmentbased estimators were denoted as “OR.model1” and “OR.model2”; for the MR estimators, each estimator is denoted as “MR0000”, where each digit of the four numbers, from left to right, indicates if \({\pi }^{1}\left({\varvec{X}};{{\varvec{\beta}}}^{1}\right)\), \({\pi }^{2}\left({\varvec{X}};{{\varvec{\beta}}}^{2}\right)\), \({m}^{1}\left({\varvec{X}},Z;{{\varvec{\gamma}}}^{1}\right)\) or \({m}^{2}\left({\varvec{X}},Z;{{\varvec{\gamma}}}^{2}\right)\) is included in the estimator (“1” means yes and “0” means no).
In addition, we defined the nonproportional hazard setting via setting the shape scale parameters \({[(v}_{1}=0.005, {\lambda }_{1}=3), {(v}_{0}=0.005, {\lambda }_{0}=3.2)]\) in the treatment and control groups, respectively (Figure S1); and we also specified two models incorrectly
$${\mathbb{A}}=\left\{\begin{array}{cc}\mathrm{logit}[{\pi }^{1}({\varvec{X}};{{\varvec{\beta}}}^{1})]= ({X}_{1}^{2}, {X}_{2}^{2}){{\varvec{\beta}}}^{1}\\ \mathrm{logit}[{\pi }^{2}({\varvec{X}};{{\varvec{\beta}}}^{2})]=({X}_{1}^{2}, {X}_{2}^{2}, {X}_{3}^{2}){{\varvec{\beta}}}^{1}\end{array}\right\}$$
for propensity score. In the study, we investigated a sample size of \(n=200\) and \(n=500\) with 1000 replications. We used three evaluation criteria: mean relative bias, root mean square error (RMSE), and coverage rate. The boxplots of mean relative bias, RMSE and coverage rate were shown in the main article (Figs. 12, 3 and 4) and the table of estimation results were shown in the supplementary material (Table S1– S4).
Simulation results
Figure 1 and Table S1 showed the simulation results of the difference between treatment and control groups in the proportional hazard setting with the different censoring rates (25% or 50%) and different sample sizes (200 or 500). We could draw a few conclusions from the simulation results.

1.
When specifying a propensity score model \(\pi \left({\varvec{X}};{\varvec{\beta}}\right)\), or an outcome regression model \(m\left({\varvec{X}},Z;{\varvec{\gamma}}\right)\): the biases of IPW and MR were negligible if \(\pi \left({\varvec{X}};{\varvec{\beta}}\right)\) was correctly specified (IPW.model1, MR1000). The biases of OR and MR were small if \(m\left({\varvec{X}},Z;{\varvec{\gamma}}\right)\) was correctly modeled (OR.model1, MR0010), and MR0010 had the smallest RMSE among all estimators. The biases and RMSEs were large, and the coverage rates were much less than 95% if \(\pi \left({\varvec{X}};{\varvec{\beta}}\right)\) or \(m\left({\varvec{X}},Z;{\varvec{\gamma}}\right)\) were incorrectly specified (IPW.model2, OR.model2, MR0100, MR0001).

2.
When specifying a propensity score model \(\pi \left({\varvec{X}};{\varvec{\beta}}\right)\), and an outcome regression model \(m\left({\varvec{X}},Z;{\varvec{\gamma}}\right)\): the biases MR were ignorable when either \(\pi \left({\varvec{X}};{\varvec{\beta}}\right)\) or \(m\left({\varvec{X}},Z;{\varvec{\gamma}}\right)\) was correctly specified (MR1010, MR1001, MR0110). The biases and RMSEs were severe large, and the coverage rates were well below 95% when the \(\pi \left({\varvec{X}};{\varvec{\beta}}\right)\) and \(m\left({\varvec{X}},Z;{\varvec{\gamma}}\right)\) were both incorrectly specified (MR0101). MR1010, MR1001 and MR0110 had almost the same RMSEs, which were smaller than IPW.model1 and OR.model1.

3.
When specifying multiply propensity score models \(\pi \left({\varvec{X}};{\varvec{\beta}}\right)\), and multiply outcome regression model models \(m\left({\varvec{X}},{\varvec{Z}};{\varvec{\gamma}}\right)\): the small biases of MR1100, MR0011, MR1110, MR1101, MR1011, MR0111, and MR1111 well suggested the multiple robustness of our proposed methods. Based on our simulation results, the RMSEs of those estimators were almost all smaller than IPW.model1 and OR.model1, suggesting that adding more models did not lead to a significant increase in RMSE.
In addition, Fig. 2 and Table S2 showed the estimation results in the nonproportional hazard setting. Figures 3 and 4 and Tables S3 and S4 showed the simulation results in proportional and nonproportional hazard settings with two PS models specified incorrectly. Similar patterns were observed in Figs. 2, 3 and 4 and Tables S2S4.
In conclusion, the simulation results showed that our MR method could provide more protection against model misspecification even when PS models are misspecified in the proportional or nonproportional hazard settings.
Empirical study
To illustrate the proposed method, we analyzed the realworld data on triplenegative breast cancer (TNBC) from Surveillance, Epidemiology, and End Results (SEER) [20]. TNBC, characterized by an absence of estrogen, progesterone receptors, and human epidermal growth factor receptor 2, has high invasiveness, high metastatic potential, proneness to relapse, and poor prognosis [21]. Chemotherapy remains the primary treatment for TNBC and improves the prognosis of TNBC patients [22]. The objective of the study was to estimate the effect of chemotherapy on TNBC. Baseline information included marital status, race, laterality, grade, The American Joint Committee on Cancer (AJCC) stage, distant metastasis, age, surgery, and chemotherapy. Inclusion criteria are: (1) patients diagnosed between 2010 and 2012, (2) patients aged between 20 and 79, (3) one primary tumor only, (4) complete baseline and survival information. A total of 10,613 patients were included in the analysis, with 79.9% opting for chemotherapy and a censoring rate of 80%. The baseline information was summarized in Table 1. We found that the chemotherapy group had a higher proportion of patients with III/IV stage, \(\ge 50\) years old, receiving radiotherapy and surgery compared to the nonchemotherapy group.
To identify the candidate models for MR method, we explored the association of chemotherapy with all covariates through the multivariate logistic regression, and the association of survival with all covariates through the multivariate Cox model. Two sets of covariates for each model were identified using the significant level at 0.05 and 0.1. Two sets of covariates in \([\{{\pi }^{1}\left({\varvec{X}};{{\varvec{\beta}}}^{1}\right),{\pi }^{2}\left({\varvec{X}};{{\varvec{\beta}}}^{2}\right)]\) are: (i) marital status, grade, stage, age, radiotherapy, and surgery; (ii) marital status, race, grade, stage, age, radiotherapy, and surgery. Two sets of covariates in \(\left[{m}^{1}\left({\varvec{X}},Z;{{\varvec{\gamma}}}^{1}\right), {m}^{2}\left({\varvec{X}},Z;{{\varvec{\gamma}}}^{2}\right)\right]\) are: (i) marital status, race, grade, stage, distant metastasis, age, radiotherapy, surgery, and chemotherapy; (ii) marital status, race, laterality, grade, stage, distant metastasis, age, radiotherapy, surgery, and chemotherapy. We applied the IPW, direct confounding adjustment, and MR methods to estimate the \(\Delta \left(t\right)\) at the time point \({t}^{*}=\mathrm{year\;}1, 2, 3, 4, 5\) and 6. The results were shown in Fig. 5 and Table S5.
From Fig. 5 and Table S5, all estimators indicated that chemotherapy could improve the prognosis of TNBC. There were some differences among the values of different estimators. For example, the estimates of MR1000, MR0100 and MR1100 are larger than the other MR estimates at \({t}^{*}=\mathrm{year }1, 2, 3, 4, 5, 6\); and the estimates of MR0010, MR0001 and MR0000 seem to be smaller than the other MR estimates at \({t}^{*}=\mathrm{year }1, 2, 3\) year. Yet, the proposed method could include multiple PS models and OR models, which were more robust to the model specification to a certain extent and might yield more accurate estimates. And the estimated chemotherapy effects using MR1111 are 0.0455(0.0278–0.0633), 0.0379(0.0173–0.0586), 0.0337(0.0116–0.0559), 0.0364(0.0127–0.0601), 0.0340(0.0061–0.0620) and 0.0423(0.0101–0.0745) at t^{*}= year 1, 2, 3, 4, 5, 6, respectively. The results suggests that the chemotherapy group had a survival rate of 4.55%, 3.79%, 3.37%, 3.64%, 3.40% and 4.23% higher than the controls at t^{*}= year 1, 2, 3, 4, 5, 6, respectively; and the effects were all statistically significant at the 95% level.
Add Comment