Scientific Papers

Development of the multivariate administrative data cystectomy model and its impact on misclassification bias | BMC Medical Research Methodology


Study setting

The study took place at The Ottawa Hospital (TOH), a 1000-bed teaching hospital with two campuses that is the tertiary referral and trauma center for a region of approximately 1.3 million people. Annually, TOH has more than 175,000 emergency department visits, 40,000 non-psychiatric admissions, and 50,000 surgical cases. During the study period, more than 95% of cystectomies in the region were conducted at TOH; near the end of the study period, all such procedures were performed at TOH.

Ethics approval and consent to participate

The study was approved by the Ottawa Health Science Network Research Ethics Board (File: OHRI REB 20220112-01H). This study was a secondary data analysis; except for patient medical record review, all analyses involved deidentified data. The ethic board review waived the need to directly approach patients and retrieve consent for the study. The study period was 1 January 2009 to 1 June 2019. This period corresponded to the time that our hospital’s operation registry (used for case identification) existed.

Health administrative datasets used for study

This study used five health administrative datasets (Table 1). The Surgical Information Management System (SIMS) dataset is a registry of all primary surgical procedures conducted at TOH. Following all operations, the surgical team enters the procedure that was performed. Completion rates for these forms are essentially 100% since hospital remuneration is a function of these data and nurses cannot close cases without these data. Data from these forms are inputted by hospital records staff into the operation registry with the primary procedure recorded by a 10-digit alpha-numeric code. The Discharge Abstract Database (DAD) is a population-based health administrative dataset that captures all Ontario hospitalizations, recording patient-based information (age, sex, pre-admission diagnostic codes [using the International Classification of Disease-10th Revision, ICD-10]) and hospitalization information (including admission service, procedural codes [using the Canadian Classification of Interventions, CCI] with dates, admission and discharge dates). The Ontario Health Insurance Plan (OHIP) database contains almost all surgeon service claims (by which surgeons are remunerated) that record the date and procedure type. The Ontario Cancer Registry (OCR) records the date and type of all index cancers diagnosed in Ontario. The Registered Persons Database (RPDB) records the death date of all Ontarians. DAD, OHIP, OCR, and RPDB are stored at ICES (formerly known as the Institute for Clinical Evaluative Sciences). ICES is an independent non-profit organization that houses population-based collections of health administrative datasets for the province of Ontario.

Table 1 Description of datasets used for study

Case identification

Our first goal was to identify all cystectomies performed at TOH during the study period. This was done by querying SIMS for all potential primary cystectomy and urinary diversion procedures using the codes listed in Appendix A. To ensure complete capture of cystectomies, procedure codes for surgeries under which true cystectomies might be misclassified – such as partial cystectomy without urinary diversion or nephroureterectomy – were also included in the query (Appendix A).

True cystectomy-urinary diversion status of these potential cases were determined using manual chart review by a single reviewer (JR). This was determined by reviewing the operative note on the surgical date recorded in SIMS. If an operative note was incomplete, unclear, or missing, supplemental review of associated progress notes and discharge summary was performed. Patients who were classified with cystectomy were subclassifed with either continent or incontinent urinary diversion. The few patients who underwent cystectomy alone without urinary diversion (because both kidneys were simultaneously removed or the patient was already on dialysis for renal failure and anuric) were classified with incontinent diversion. Any case that was unclear regarding its cystectomy-urinary diversion status was reviewed with a second expert reviewer (LL) to provide a final opinion; this occurred in two cases.

We excluded patients less than 18 years of age and cases where cystectomy-urinary diversion was a secondary procedure and part of a larger surgery (such as a total pelvic exoneration for locally invasive rectal cancer). The latter cases were very uncommon and were excluded since they represent a patient cohort that was distinct from those having primary cystectomy-urinary diversion. Finally, patients without valid Ontario health card numbers were also excluded since they could not be linked to Ontario health data for analysis.

These steps identified all primary cystectomy cases at TOH during the study period; therefore, any TOH hospitalization that was not included in this group did not have a primary cystectomy. For all cystectomy cases, we recorded the: 1) Ontario health card number, 2) cystectomy date, and 3) diversion type (incontinent vs. continent). This reference cystectomy dataset was transferred to ICES via a encrypted data portal where the health card number was encrypted to permit linkage with ICES datasets.

Creating the Administrative Data Cystectomy Model (ADCM)

We created our study’s analytical dataset by retrieving from the DAD all adult TOH hospitalizations during the study period. This dataset was linked to our reference cystectomy dataset via encrypted health card number and admission date to determine which TOH admissions truly had a cystectomy and, if so, its diversion type.

We reviewed CCI coding manuals to identify all CCI cystectomy codes (Appendix B). CCI codes used to identify cystectomy by urinary diversion within the DAD were reviewed with a Health Records expert at TOH to ensure completeness. Hospitalizations were classified as ‘coded with cystectomy by diversion type’ if they were assigned at least one CCI procedure code within the DAD (during the hospitalization) or at least one Ontario Health Insurance Plan (OHIP) billing code (with the service date being equal to the true operative date) for the cystectomy-diversion type.

We then reviewed the other co-variables in the DAD, OHIP database, and the OCR to identify patient-, hospital-, and procedure-level factors that might be associated with true cystectomy status. These potential covariates were ranked independently by two surgeons (JR, LL) based on their potential ability to identify cystectomy using health administrative data. These ranks were averaged to return the final covariate priority ranking (Appendix C).

To create the ADCM, multivariable multinomial logistic regression was used to model patient cystectomy-diversion type status. We used the methods proposed by Riley et. al [9]. to determine the number of degrees of freedom that our model could contain. This calculation assumed that there would be 429,000 admissions to TOH during the study period with an estimated 250 cystectomies of each diversion type, giving a prevalence of each cystectomy-diversion type of 0.0583%. We then calculated the number of degrees of freedom (df) permitted in the regression model given four criteria: 10 outcomes per df (25 df permitted); mean error around individual predictions of ± 0.05% (42.6 df permitted); target shrinkage of 99.9% (32.1 df permitted); and target optimism in model fit of 0.0005 (106 df permitted). The final allowable degrees of freedom in the model for each cystectomy-diversion type was the minimum of these calculations (25 df).

The outcome for the ADCM was true cystectomy-diversion type status and it had three values: cystectomy-incontinent diversion, cystectomy-continent diversion, or no cystectomy. The ADCM was constructed by adding covariates in rank order of perceived importance for cystectomy identification (Appendix C). Variables were retained if the likelihood ratio test following variable addition was significant at a p-value ≤ 0.05. We used a SAS macro described by Sauerbrei et. al [10]. to identify best single fractional polynomial transformations for continuous candidate variables (age, acute length of stay, and operative time). Model building ended when all candidate variables (Appendix C) had been offered to the model. Creation and performance of the ADCM was reported using methods suggested by the TRIPOD statement (Appendix E).

Analysis

Model performance was internally validated using optimism-corrected c-statistic (for discrimination) and optimism-corrected integrated calibration index (ICI—for calibration) [11] using methods described by Steyerberg [12] with 1000 bootstrap samples.

To quantify misclassification bias, we first used the reference standard cystectomy-diversion type status to calculate true values of 30 statistics: cystectomy-diversion type prevalence in study cohort; the association of cystectomy with 3 continuous covariables [patient age, operation time, acute hospital length of stay] measured using linear regression; and the association of cystectomy with 27 binary covariables [sex, admission urgency, general anaesthetic status, transfusion status, discharge status, 28-day death or unplanned readmission status and 21 comorbidities from the Elixhauser morbidity scale] measured using logistic regression. With the exception of the Elixhauser comorbidities, these covariables did not require administrative database codes and are accurately measured in the DAD [13].

We then repeated the measurement of these 30 statistics after assigning cystectomy-diversion status using three methods:

  • 1. CCI / OHIP Code for Cystectomy: Patients with a CCI or OHIP procedure code for cystectomy (Appendix B) were classified with cystectomy-diversion by code.

  • 2. ADCM-Categorical:Youden’s method was used to determine the ADCM-based predicted probability of cystectomy by diversion type that optimized classification accuracy [14]. This threshold corresponds to the predicted cystectomy-diversion type probability that is closest to the top left-hand corner of the corresponding receiver operating characteristic (ROC) curve. Patients with expected cystectomy probabilities equal to or above this threshold were classified with cystectomy-diversion by ADCM-categorical.

  • 3. ADCM-Bootstrap Imputation:This method used the ADCM predicted cystectomy probability to impute cystectomy status using bootstrap imputation (BI) [6,7,8, 15]. BI started by creating 1000 random bootstrap samples (with replacement) of the study cohort with each having a sample size identical to the original cohort. For each hospitalization within each bootstrap sample, a uniformly distributed number between 0 and 1 was randomly selected; cystectomy-diversion status was then imputed to be present if the random number was below the ADCM-based predicted cystectomy-diversion type probability for that patient. Within each bootstrap sample, we measured all 30 statistics; the mean value of all 1000 bootstrap samples was used as the final BI point estimate and the 2.5th and 97.5th percentiles as the confidence intervals.

We quantified misclassification bias for each of these three methods to assign cystectomy-diversion status using the standardized mean squared error (SMSE):

$$SMSE= \left(\frac{{(\beta -{\beta }_{T})}^{2}}{{\beta }_{T}}\right)$$

where: β is the parameter estimate of the covariable’s association with cystectomy-diversion determined by the cystectomy-diversion status assignment method (CCI/OHIP code for cystectomy, ADCM-categorical, or ADCM-BI); and \({\beta }_{T}\) is the parameter estimate of the covariable’s true association with cystectomy-diversion. We compared misclassification bias between cystectomy-diversion status assignment methods using ANOVA on log transformed SMSEs of all 30 variables. Differences between assignment methods was determined using Tukey’s studentized range test for ANOVA. All analyses were conducted using SAS 9.4.



Source link