Abstract
BACKGROUND: Many research investigations for pulmonary embolism (PE) rely on the International Classification of Diseases 10th Revision (ICD-10) codes for analyses of electronic databases. The validity of ICD-10 codes in identifying PE remains uncertain.
OBJECTIVES: The objective of this study was to validate an algorithm to efficiently identify pulmonary embolism using ICD-10 codes.
METHODS: Using a prespecified protocol, patients in the Mass General-Brigham hospitals (2016-2021) with ICD-10 principal discharge codes for PE, those with secondary codes for PE, and those without PE codes were identified (n = 578 from each group). Weighting was applied to represent each group proportionate to their true prevalence. The accuracy of ICD-10 codes for identifying PE was compared with adjudication by independent physicians. The F1 score, which incorporates sensitivity and positive predictive value (PPV), was assessed. Subset validation was performed at Yale-New Haven Health System.
RESULTS: A total of 1712 patients were included (age: 60.6 years; 52.3% female). ICD-10 PE codes in the principal discharge position had sensitivity and PPV of 58.3% and 92.1%, respectively. Adding secondary discharge codes to the principal discharge codes improved the sensitivity to 83.2%, but the PPV was reduced to 79.1%. Using a combination of ICD-10 PE principal discharge codes or secondary codes plus imaging codes for PE led to sensitivity and PPV of 81.6% and 84.7%, respectively, and the highest F1 score (83.1%; P < .001 compared with other methods). Validation yielded largely similar results.
CONCLUSION: Although the principal discharge codes for PE show excellent PPV, they miss 40% of acute PEs. A combination of principal discharge codes and secondary codes plus PE imaging codes led to improved sensitivity without severe reduction in PPV.