Clinically acquired, multimodal and multi-site MRI datasets are widely used for neuro-oncology research. However, manual preprocessing of such data is extremely tedious and error prone due to high intrinsic heterogeneity. Automatic standardization of such datasets is therefore important for data-hungry applications like deep learning. Despite rapid advances in MRI data acquisition and processing algorithms, only limited effort was dedicated to automatic methodologies for standardization of such data. To address this challenge, we augment our previously developed Multimodal Glioma Analysis (MGA) pipeline with automation tools to achieve processing scale suitable for big data applications. This new pipeline implements a natural language processing (NLP) based scan-type classifier, with features constructed from DICOM metadata based on bag-ofwords model. The classifier automatically assigns one of 18 pre-defined scan types to all scans in MRI study. Using the described data model, we trained three types of classifiers: logistic regression, linear SVM, and multi-layer artificial neural network (ANN) on the same dataset. Their performance was validated on four datasets from multiple sources. ANN implementation achieved the highest performance, yielding an average classification accuracy of over 99%. We also built a Jupyter notebook based graphical user interface (GUI) which is used to run MGA in semi-automatic mode for progress tracking purposes and quality control to ensure reproducibility of the analyses based thereof. MGA has been implemented as a Docker container image to ensure portability and easy deployment. The application can run in a single or batch study mode, using either local DICOM data or XNAT cloud storage.
|