A critical, often overlooked barrier to implementing automated analysis tools in the clinical setting is identifying the subset of acquired in a scanning session DICOM objects appropriate for automated analysis. Although the DICOM standard has rich metadata with specific fields describing the collected sequence, the text input description fields are often unreliable due to the lack of rigorous constraints. Automating clinical and research applications requires better identification and selection processes for the increasing utilization of image processing applications, CAD systems, and the need for huge multi-site datasets with data from multiple source devices and manufacturers. The medical imaging field urgently needs a tool for automated image-type classification. In this work, we developed a robust, easily extensible classification framework that extracts key features from well-characterized DICOM header fields to identify image modality and acquisition plane. Utilizing classical machine learning paradigms and a heterogeneous dataset of over 250 thousands scan volumes collected over 50 sites, using 77 scanners models, we achieved 98.9% accuracy during the K-Fold Cross-Validation for classifying 12 image modalities and 99.96% accuracy on image acquisition plane classification. Furthermore, we demonstrated model generalizability by achieving 95.7% accuracy on out-of-sample animal data. Our proposed framework can be crucial in eliminating error-prone human interaction, allowing automatization, and increasing imaging applications’ reliability and efficiency. The proposed framework has been released as an open-source project and is readily accessible as a Python pip package under the name dcm-classifier.
|