Comparison of AI software tools for automated detection, quantification and categorization of pulmonary nodules in the HANSE LCS trial
Authors
Rimma Kondrashova, Filip Klimeš, Till Frederik Kaireit, Katharina May, Jörg Barkhausen, Susanne Stiebeler, Jonathan Sperl, Sabine Dettmer, Frank Wacker, Jens Vogel-Claussen
Kondrashova et al. (2023) compared two AI software tools (S1 and S2) for detecting and classifying pulmonary nodules in CT scans within the HANSE lung cancer screening trial. Using 946 baseline CT scans, they evaluated each tool’s sensitivity, volume measurement accuracy, and classification consistency (Lung-RADS). S1 achieved higher sensitivity (88%) compared to S2 (66%) for clinically significant nodules (≥34mm³). S2, however, consistently reported larger nodule volumes than S1, leading to a 38% discrepancy in Lung-RADS scores between the tools. This difference could significantly impact patient management, as Lung-RADS classification determines follow-up intervals and intervention levels. Limitations included potential bias toward S1, since it was used prospectively in initial readings, and a low representation of part-solid nodules, which affected subgroup sensitivity analysis. The findings underscore the need for standardized, high-accuracy AI tools in national screening programs to ensure reliable detection and consistent clinical outcomes across institutions.