Z-disks are complex structures that delineate repeating sarcomeres in striated muscle. They play significant roles in cardiomyocytes such as providing mechanical stability for the contracting sarcomere, cell signalling and autophagy. Changes in Z-disk architecture have been associated with impaired cardiac function. Hence, there is a strong need to create tools to segment Z-disks from microscopy images, that overcome traditional limitations such as variability in image brightness and staining technique. In this study, we apply deep learning based segmentation models to extract Z-disks in images of striated muscle tissue. We leverage a novel Airyscan confocal dataset, which comprises high resolution images of Z-disks of healthy heart tissue, stained with Affimers for specific Z-disk proteins. We employed an interactive labelling tool, Ilastik to obtain ground truth segmentation masks and use the resulting data set to train and evaluate the performance of several state-of-the-art segmentation networks. On the test set, UNet++ achieves best segmentation performance for Z-disks in cardiomyocytes, with an average Dice score of 0.91 and outperforms other established segmentation methods including UNet, FPN, DeepLabv3+ and pix2pix. However, pix2pix demonstrates improved generalisation, when tested on an additional dataset of cardiomyocytes with a titin mutation. This is the first study to demonstrate that automated machine learning-based segmentation approaches may be used effectively to segment Z-disks in confocal microscopy images. Automated segmentation approaches and predicted segmentation masks could be used to derive morphological features of Z-disks (e.g. width and orientation), and subsequently, to quantify disease-related changes to cardiac microstructure.
Medical image interpretation is central to most clinical applications such as disease diagnosis, treatment planning, and prognostication. In clinical practice, radiologists examine medical images (e.g. chest x-rays, computed tomography images, etc.) and manually compile their findings into reports, which can be a time-consuming process. Automated approaches to radiology report generation, therefore, can reduce radiologist workload and improve efficiency in the clinical pathway. While recent deep-learning approaches for automated report generation from medical images have seen some success, most studies have relied on image-derived features alone, ignoring non-imaging patient data. Although a few studies have included the word-level contexts along with the image, the use of patient demographics is still unexplored. On the other hand, prior approaches to this task commonly use encoder-decoder frameworks that consist of a convolution vision model followed by a recurrent language model. Although recurrent-based text generators have achieved noteworthy results, they had the drawback of having a limited reference window and identifying only one part of the image while generating the next word. This paper proposes a novel multi-modal transformer network that integrates chest x-ray (CXR) images and associated patient demographic information, to synthesise patient-specific radiology reports. The proposed network uses a convolutional neural network (CNN) to extract visual features from CXRs and a transformer-based encoder-decoder network that combines the visual features with semantic text embeddings of patient demographic information, to synthesise full-text radiology reports. The designed network not only alleviates the limitations of the recurrent models but also improves the encoding and generative processes by including more context in the network. Data from two public databases were used to train and evaluate the proposed approach. CXRs and reports were extracted from the MIMIC-CXR database and combined with corresponding patients’ data (gender, age, and ethnicity) from MIMIC-IV. Based on the evaluation metrics used (BLEU 1-4 and BERTScore), including patient demographic information was found to improve the quality of reports generated using the proposed approach, relative to a baseline network trained using CXRs alone. The proposed approach shows potential for enhancing radiology report generation by leveraging rich patient metadata and combining semantic text embeddings derived thereof, with medical image-derived visual features.
Weakly-supervised classification of histopathology slides is a computationally intensive task, with a typical whole slide image (WSI) containing billions of pixels to process. We propose Discriminative Region Active Sampling for Multiple Instance Learning (DRAS-MIL), a computationally efficient slide classification method using attention scores to focus sampling on highly discriminative regions. We apply this to the diagnosis of ovarian cancer histological subtypes, which is an essential part of the patient care pathway as different subtypes have different genetic and molecular profiles, treatment options, and patient outcomes. We use a dataset of 714 WSIs acquired from 147 epithelial ovarian cancer patients at Leeds Teaching Hospitals NHS Trust to distinguish the most common subtype, high-grade serous carcinoma, from the other four subtypes (low-grade serous, endometrioid, clear cell, and mucinous carcinomas) combined. We demonstrate that DRAS-MIL can achieve similar classification performance to exhaustive slide analysis, with a 3-fold cross-validated AUC of 0.8679 compared to 0.8781 with standard attention-based MIL classification. Our approach uses at most 18% as much memory as the standard approach, while taking 33% of the time when evaluating on a GPU and only 14% on a CPU alone. Reducing prediction time and memory requirements may benefit clinical deployment and the democratisation of AI, reducing the extent to which computational hardware limits end-user adoption.
This study explores the use of the Dirichlet Variational Autoencoder (DirVAE) for learning disentangled latent representations of chest X-ray (CXR) images. Our working hypothesis is that distributional sparsity, as facilitated by the Dirichlet prior, will encourage disentangled feature learning for the complex task of multi-label classification of CXR images. The DirVAE is trained using CXR images from the CheXpert database, and the predictive capacity of multi-modal latent representations learned by DirVAE models is investigated through implementation of an auxiliary multi-label classification task, with a view to enforce separation of latent factors according to class-specific features. The predictive performance and explainability of the latent space learned using the DirVAE were quantitatively and qualitatively assessed, respectively, and compared with a standard Gaussian prior-VAE (GVAE). We introduce a new approach for explainable multi-label classification in which we conduct gradient-guided latent traversals for each class of interest. Study findings indicate that the DirVAE is able to disentangle latent factors into class-specific visual features, a property not afforded by the GVAE, and achieve a marginal increase in predictive performance relative to GVAE. We generate visual examples to show that our explainability method, when applied to the trained DirVAE, is able to highlight regions in CXR images that are clinically relevant to the class(es) of interest and additionally, can identify cases where classification relies on spurious feature correlations.
Mammographic breast density is an important risk marker in breast cancer screening. The ACR BI-RADS guidelines (5th ed.) define four breast density categories that can be dichotomized by the two super-classes dense" and not dense". Due to the qualitative description of the categories, density assessment by radiologists is characterized by a high inter-observer variability. To quantify this variability, we compute the overall percentage agreement (OPA) and Cohen's kappa of 32 radiologists to the panel majority vote based on the two super-classes. Further, we analyze the OPA between individual radiologists and compare the performances to an automated assessment via a convolutional neural network (CNN). The data used for evaluation contains 600 breast cancer screening examinations with four views each. The CNN was designed to take all views of an examination as input and trained on a dataset with 7186 cases to output one of the two super-classes. The highest agreement to the panel majority vote (PMV) achieved by a single radiologist is 99%, the lowest score is 71% with a mean of 89%. The OPA of two individual radiologists ranges from a maximum of 97.5% to a minimum of 50.5% with a mean of 83%. Cohen's kappa values of radiologists to the PMV range from 0.97 to 0.47 with a mean of 0.77. The presented algorithm reaches an OPA to all 32 radiologists of 88% and a kappa of 0.75. Our results show that inter-observer variability for breast density assessment is high even if the problem is reduced to two categories and that our convolutional neural network can provide labelling comparable to an average radiologist. We also discuss how to deal with automated classification methods for subjective tasks.
KEYWORDS: Rigid registration, 3D modeling, Image segmentation, Medical imaging, Magnetism, Magnetic resonance imaging, Statistical analysis, Statistical modeling, Probability theory, Detection and tracking algorithms, Data modeling, Expectation maximization algorithms, Image registration, Medicine
A probabilistic framework for robust, group-wise rigid alignment of point-sets using a mixture of Students t-distribution especially when the point sets are of varying lengths, are corrupted by an unknown degree of outliers or in the presence of missing data. Medical images (in particular magnetic resonance (MR) images), their segmentations and consequently point-sets generated from these are highly susceptible to corruption by outliers. This poses a problem for robust correspondence estimation and accurate alignment of shapes, necessary for training statistical shape models (SSMs). To address these issues, this study proposes to use a t-mixture model (TMM), to approximate the underlying joint probability density of a group of similar shapes and align them to a common reference frame. The heavy-tailed nature of t-distributions provides a more robust registration framework in comparison to state of the art algorithms. Significant reduction in alignment errors is achieved in the presence of outliers, using the proposed TMM-based group-wise rigid registration method, in comparison to its Gaussian mixture model (GMM) counterparts. The proposed TMM-framework is compared with a group-wise variant of the well-known Coherent Point Drift (CPD) algorithm and two other group-wise methods using GMMs, using both synthetic and real data sets. Rigid alignment errors for groups of shapes are quantified using the Hausdorff distance (HD) and quadratic surface distance (QSD) metrics.
The use of biomechanics-based numerical simulations has attracted growing interest in recent years for computer-aided diagnosis and treatment planning. With this in mind, a method for automatic mesh generation of brain structures of interest, using statistical models of shape (SSM) and appearance (SAM), for personalised computational modelling is presented. SSMs are constructed as point distribution models (PDMs) while SAMs are trained using intensity profiles sampled from a training set of T1-weighted magnetic resonance images. The brain structures of interest are, the cortical surface (cerebrum, cerebellum & brainstem), lateral ventricles and falx-cerebri membrane. Two methods for establishing correspondences across the training set of shapes are investigated and compared (based on SSM quality): the Coherent Point Drift (CPD) point-set registration method and B-spline mesh-to-mesh registration method. The MNI-305 (Montreal Neurological Institute) average brain atlas is used to generate the template mesh, which is deformed and registered to each training case, to establish correspondence over the training set of shapes. 18 healthy patients’ T1-weightedMRimages form the training set used to generate the SSM and SAM. Both model-training and model-fitting are performed over multiple brain structures simultaneously. Compactness and generalisation errors of the BSpline-SSM and CPD-SSM are evaluated and used to quantitatively compare the SSMs. Leave-one-out cross validation is used to evaluate SSM quality in terms of these measures. The mesh-based SSM is found to generalise better and is more compact, relative to the CPD-based SSM. Quality of the best-fit model instance from the trained SSMs, to test cases are evaluated using the Hausdorff distance (HD) and mean absolute surface distance (MASD) metrics.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.