KEYWORDS: Mammography, Machine learning, Education and training, Diagnostics, Deep learning, Performance modeling, Data modeling, Cross validation, Medical imaging, Image segmentation
PurposeAccurate interpretation of mammograms presents challenges. Tailoring mammography training to reader profiles holds the promise of an effective strategy to reduce these errors. This proof-of-concept study investigated the feasibility of employing convolutional neural networks (CNNs) with transfer learning to categorize regions associated with false-positive (FP) errors within screening mammograms into categories of “low” or “high” likelihood of being a false-positive detection for radiologists sharing similar geographic characteristics.ApproachMammography test sets assessed by two geographically distant cohorts of radiologists (cohorts A and B) were collected. FP patches within these mammograms were segmented and categorized as “difficult” or “easy” based on the number of readers committing FP errors. Patches outside 1.5 times the interquartile range above the upper quartile were labeled as difficult, whereas the remaining patches were labeled as easy. Using transfer learning, a patch-wise CNN model for binary patch classification was developed utilizing ResNet as the feature extractor, with modified fully connected layers for the target task. Model performance was assessed using 10-fold cross-validation.ResultsCompared with other architectures, the transferred ResNet-50 achieved the highest performance, obtaining receiver operating characteristics area under the curve values of 0.933 (±0.012) and 0.975 (±0.011) on the validation sets for cohorts A and B, respectively.ConclusionsThe findings highlight the feasibility of employing CNN-based transfer learning to predict the difficulty levels of local FP patches in screening mammograms for specific radiologist cohort with similar geographic characteristics.
Early detection of breast cancer through screening mammography is crucial. However, the interpretation of mammograms is prone to high error rates, and radiologists often exhibit common errors specific to their practice regions. It is essential to identify prevalent errors and offer tailored mammography training to address these region-specific challenges. This study investigated the feasibility of leveraging Convolutional Neural Networks (CNN) with transfer learning to identify areas in screening mammograms that may contribute to a high proportion of false positive diagnoses by radiologists from the same geographical region. We collected mammography test sets evaluated by a cohort of Australian radiologists and segmented error-related patches based on their assessments. Each patch was labeled as “easy” or “difficult”, and subsequently, we proposed a patch-wise ResNet model to predict the difficulty level of each patch. Specifically, we employed the pre-trained ResNet-18, ResNet-50, and ResNet-101 as feature extractors. During training, we modified and fine-tuned the fully connected layers for our target task while keeping the convolutional layers frozen. The model’s performance was evaluated using 10-fold cross-validation, and the transferred ResNet-50 obtained the highest performance, achieving Receiver Operating Characteristics Area Under the Curve (AUC) values of 0.975 (±0.011) on the validation sets. In conclusion, our study demonstrated the feasibility of employing CNN-based transfer learning to identify the prevalent errors in specific radiology communities. This approach shows promise in automating the customization of mammography training materials to mitigate errors among radiologists in a region.
The global radiomic signature extracted from mammograms can indicate that malignancy appearances are present within an image. This study focuses on a set of 129 screen-detected breast malignancies, which were also visible on the prior screening examinations (i.e., missed cancers based on the priors). All cancer signs on the prior examinations were actionable based on the opinion of a panel of three experienced radiologists, who retrospectively interpreted the prior examinations (knowing that a later screening round had revealed a cancer). We investigated if the global radiomic signature could differentiate between screening rounds: when the cancer was detected (“identified cancers”), from the round immediately before (“missed cancers”). Both identified cancers and “missed cancers” were collected using a single vendor technology. A set of “normals”, matched based on mammography units, was also retrieved from a screening archive. We extracted a global radiomic signature, containing first and second-order statistics features. Three classification tasks were considered: (1) “identified cancers” vs “missed cancers”, (2) “identified cancers” vs “normals”, (3) “missed cancers” vs “normal”. To train and validate the models, leave-one-case-out cross-validation was used. The classifier resulted in an AUC of 0.66 (95%CI=0.60-0.73, P<0.05) for “missed cancers” vs “identified cancers” and an AUC of 0.65 (95%CI=0.60-0.69, P<0.05) for “normals” vs “identified cancers”. However, the AUC of the classifier for differentiating “normals” from “missed cancers” was at chance-level (AUC=0.53 (95%CI=0.48-0.58, P=0.23). Therefore, eliminating some of these “missed” cancers in clinical practice would be very challenging as the global signal of the malignancy that help with a diagnosis, are at best weak.
Previous studies reported that the cancer subtypes radiologists struggling to detect successfully varied across countries in mammography interpretation. However, little is known whether such variation is also in radiologists’ perception of local cancer-free areas. This study compared the cancer-free areas incorrectly flagged as cancer by radiologists from two populations in reading dense screening mammograms. We collected reading data from 20 Chinese and 16 Australian radiologists who previously evaluated 60 dense screening cases. For each cohort, findings from all readers were pooled together, and the local cancer-free areas classified as cancer were identified. Particularly the areas misclassified by readers from both cohorts were recognized and displayed on the mammograms as overlaps. For each overlap, we counted the error rate, the proportion of readers failing to distinguish between normality and abnormality, as a measure of the actual difficulty level for each reader cohort. Afterward, the Spearman correlation was performed to explore whether the calculated cohort-specific difficulty levels were correlated. A similar analysis was conducted on two geographically-distant groups within China. Results showed that between Chinese and Australian radiologists, the correlation was only found in the cancer-free views of cancer cases (r=0.902, p=0.004). However, between the two groups within China, we found strong correlations in both cancer-containing (r=0.833, p=0.333) and cancer-free views (r=0.955, p=0.022) of cancer cases, despite an insignificant correlation in normal cases. In conclusion, radiologists from different populations display various error-making patterns in reading dense screening mammograms, while those with similar demographic characteristics share the diagnosis to a certain degree.
This study investigated whether radiologists from different countries share the same sensitivity to certain mammographic features. Retrospective data were collected from Chinese and Australian radiologists reading a high-density test set which contained 40 normal and 20 cancerous mammographic cases. Sixteen Australian radiologists, and 30 Chinese radiologists, including 18 from Nanchang and 12 from Hong Kong SAR/Shenzhen, were asked to read all images in this test set using the Royal Australian and New Zealand College of Radiologists (RANZCR) rating system and annotate the suspicious lesion(s). For each case and each radiologist group, the percentage of radiologists making the correct diagnoses was calculated. For cancer cases, we also calculated the percentage of radiologists who located the lesion correctly. Spearman correlation coefficient was used to explore the association between two radiologist groups. Data demonstrated a high correlation between Chinese and Australian radiologists in identifying cancer cases (r=0.839, p<0.0001), and locating lesions (r=0.802, p<0.0001), but no statistically significant relationship in identifying normal cases (r=0.236, p=0.142). However, between radiologists from two geographic regions of China, strong correlations were found in detecting cancer cases (r=0.686, p=0.0008), marking lesions (r=0.803, p<0.0001) and recognizing normal cases (r=0.562, p=0.0002). In conclusion, although Chinese and Australian radiologists may share the same difficulty in diagnosing and locating cancers, a difference in the challenge of identifying normal cases between them was shown. However, the performance by radiologists within China, although from different regions, remained consistent when reading high-density mammograms.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.