Positron emission tomography (PET) provides valuable functional information that is widely used in clinical domains such as oncology and neurology. However, the structural quality of PET images may not be sufficient to effectively evaluate small regions of interest. Image super-resolution techniques aim to recover a high-resolution image from an input low-resolution version. We study adaptations of deep convolutional neural network architectures for improving the spatial resolution of PET images. The proposed super-resolution model involves a deep architecture that uses convolutional blocks together with various residual connections for more effective and efficient training. We use the supervised setting where the downscaled versions of the original PET images are given as the low-resolution input to the deep networks and the original images are used as the high-resolution target data to be recovered. Experiments show that the proposed model performs better than a multi-scale convolutional architecture according to both quantitative performance metrics and visual qualitative evaluation.
Whole slide image (WSI) classification methods typically use fixed-size patches that are processed separately and are aggregated for the final slide-level prediction. Image segmentation methods are designed to obtain a delineation of specific tissue types. These two tasks are usually studied independently. The aim of this work is to investigate the effect of region of interest (ROI) detection as a preliminary step for WSI classification. First, we process each WSI by using a pixel-level classifier that provides a binary segmentation mask for potentially important ROIs. We evaluate both single-resolution models that process each magnification independently and multi-resolution models that simultaneously incorporate contextual information and local details. Then, we compare the WSI classification performances of patch-based models when the patches used for both training and testing are extracted from the whole image and when they are sampled from only within the detected ROIs. The experiments using a binary classification setting for breast histopathology slides as benign vs. malignant show that the classifier that uses the patches sampled from the whole image achieves an F1 score of 0.68 whereas the classifiers that use patches sampled from the ROI detection results produced by the single- and multi-resolution models obtain scores between 0.75 and 0.83.
The common method for histopathology image classification is to sample small patches from large whole slide images and make predictions based on aggregations of patch representations. Transformer models provide a promising alternative with their ability to capture long-range dependencies of patches and their potential to detect representative regions, thanks to their novel self-attention strategy. However, as a sequence-based architecture, transformers are unable to directly capture the two-dimensional nature of images. While it is possible to get around this problem by converting an image into a sequence of patches in raster scan order, the basic transformer architecture is still insensitive to the locations of the patches in the image. The aim of this work is to make the model be aware of the spatial context of the patches as neighboring patches are likely to be part of the same diagnostically relevant structure. We propose a transformer-based whole slide image classification framework that uses space-filling curves to generate patch sequences that are adaptive to the variations in the shapes of the tissue structures. The goal is to preserve the locality of the patches so that neighboring patches in the one-dimensional sequence are closer to each other in the two-dimensional slide. We use positional encodings to capture the spatial arrangements of the patches in these sequences. Experiments using a lung cancer dataset obtained from The Cancer Genome Atlas show that the proposed sequence generation approach that best preserves the locality of the patches achieves 87.6% accuracy, which is higher than baseline models that use raster scan ordering (86.7% accuracy), no ordering (86.3% accuracy), and a model that uses convolutions to relate the neighboring patches (81.7% accuracy).
Deep learning-based approaches have shown highly successful performance in the categorization of digitized biopsy samples. The commonly used setting in these approaches is to employ convolutional neural networks for classification of data sets consisting of images all having the same size. However, the clinical practice in breast histopathology necessitates multi-class categorization of regions of interest (ROI) in biopsy samples where these regions can have arbitrary shapes and sizes. The typical solution to this problem is to aggregate the classification results of fixed-sized patches cropped from these images to obtain image-level classification scores. Another limitation of these approaches is the independent processing of individual patches where the rich contextual information in the complex tissue structures has not yet been sufficiently exploited. We propose a generic methodology to incorporate local inter-patch context through a graph convolution network (GCN) that admits a graph-based ROI representation. The proposed GCN model aims to propagate information over neighboring patches in a progressive manner towards classifying the whole ROI into a diagnostic class. The experiments using a challenging data set for a 4-class ROI-level classification task and comparisons with several baseline approaches show that the proposed model that incorporates the spatial context by using graph convolutional layers performs better than commonly used fusion rules.
We propose a framework for learning feature representations for variable-sized regions of interest (ROIs) in breast histopathology images from the convolutional network properties at patch-level. The proposed method involves fine-tuning a pre-trained convolutional neural network (CNN) by using small fixed-sized patches sampled from the ROIs. The CNN is then used to extract a convolutional feature vector for each patch. The softmax probabilities of a patch, also obtained from the CNN, are used as weights that are separately applied to the feature vector of the patch. The final feature representation of a patch is the concatenation of the class-probability weighted convolutional feature vectors. Finally, the feature representation of the ROI is computed by average pooling of the feature representations of its associated patches. The feature representation of the ROI contains local information from the feature representations of its patches while encoding cues from the class distribution of the patch classification outputs. The experiments show the discriminative power of this representation in a 4-class ROI-level classification task on breast histopathology slides where our method achieved an accuracy of 66.8% on a data set containing 437 ROIs with different sizes.
Digitization of full biopsy slides using the whole slide imaging technology has provided new opportunities for understanding the diagnostic process of pathologists and developing more accurate computer aided diagnosis systems. However, the whole slide images also provide two new challenges to image analysis algorithms. The first one is the need for simultaneous localization and classification of malignant areas in these large images, as different parts of the image may have different levels of diagnostic relevance. The second challenge is the uncertainty regarding the correspondence between the particular image areas and the diagnostic labels typically provided by the pathologists at the slide level. In this paper, we exploit a data set that consists of recorded actions of pathologists while they were interpreting whole slide images of breast biopsies to find solutions to these challenges. First, we extract candidate regions of interest (ROI) from the logs of pathologists' image screenings based on different actions corresponding to zoom events, panning motions, and fixations. Then, we model these ROIs using color and texture features. Next, we represent each slide as a bag of instances corresponding to the collection of candidate ROIs and a set of slide-level labels extracted from the forms that the pathologists filled out according to what they saw during their screenings. Finally, we build classifiers using five different multi-instance multi-label learning algorithms, and evaluate their performances under different learning and validation scenarios involving various combinations of data from three expert pathologists. Experiments that compared the slide-level predictions of the classifiers with the reference data showed average precision values up to 62% when the training and validation data came from the same individual pathologist's viewing logs, and an average precision of 64% was obtained when the candidate ROIs and the labels from all pathologists were combined for each slide.
High spectral and high spatial resolution images acquired from new generation satellites have enabled new
applications. However, the increasing amount of detail in these images also necessitates new algorithms for
automatic analysis. This paper describes a new approach to discover compound structures such as different
types of residential, commercial, and industrial areas that are comprised of spatial arrangements of primitive
objects such as buildings, roads, and trees. The proposed approach uses a robust Gaussian mixture model (GMM)
where each Gaussian component models the spectral and shape content of a group of pixels corresponding to a
primitive object. The algorithm can also incorporate spatial constraints on the layout of the primitive objects in
terms of their relative positions. Given example structures of interest, a new learning algorithm fits a GMM to
the image data, and this model can be used to detect other similar structures by grouping pixels that have high
likelihoods of belonging to the Gaussian object models while satisfying the spatial layout constraints without
any requirement for region segmentation. Experiments using WorldView-2 data show that the proposed method
can detect high-level structures that cannot be modeled using traditional techniques.
We describe a system for interactive classification and retrieval of microscopic tissue images. Our system models tissues in pixel, region and image levels. Pixel level features are generated using unsupervised clustering of color and texture values. Region level features include shape information and statistics of pixel level feature values. Image level features include statistics and spatial relationships of regions. To reduce the gap between low-level features and high-level expert knowledge, we define the concept of prototype regions. The system learns the prototype regions in an image collection using model-based clustering and density estimation. Different tissue types are modeled using spatial relationships of these regions. Spatial relationships are represented by fuzzy membership functions. The system automatically selects significant relationships from training data and builds models which can also be updated using user relevance feedback. A Bayesian framework is used to classify tissues based on these models. Preliminary experiments show that the spatial relationship models we developed provide a flexible and powerful framework for classification and retrieval of tissue images.
We describe a system for interactive training of models for semantic labeling of land cover. The models are build based on three levels of features: 1) pixel level, 2) region level, and 3) scene level features. We developed a Bayesian algorithm and a decision tree algorithm for interactive training. The Bayesian algorithm enables training based on pixel features. The scene level summaries of pixel features are used for fast retrieval of scenes with high/low content of features and scenes with low confidence of classification. The decision tree algorithm is based on region level features that are extracted based on 1) spectral and textural characteristics of the image, 2) shape descriptors of regions that are created through segmentation process, and 3) auxiliary information such as elevation data. The initial model can be created based on a database of ground truth and than be refined based on the feedback supplied by a data analyst who interactively trains the model using the system output and/or additional scenes. The combination of supervised and unsupervised methods provides a more complete exploration of model space. A user may detect the inadequacy of the model space and add additional features to the model. The graphical tools for the exploration of decision trees allow insight into the interaction of features used in the construction of models. The preliminary experiments show that accurate models can be build in a short time for a variety of land covers. The scalable classification techniques allow for fast searches for a specific label over a large area.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.