Presentation + Paper
6 June 2022 Predicting classifier performance using distributional separation measures
Samuel T. Borden, Katie Rainey
Author Affiliations +
Abstract
In real world applications, machine learning classifiers are only as good as their performance on real world data. In practice, this data comes without truth labels — that is why you need a classifier. In this paper we consider the scenario where we have a trained classifier, and a set of unlabeled test data that we wish to run through it. If the unlabeled data is not within the distribution of the classifier’s training data, we should not expect the classifier to work well on the test set. We explore the use of the Henze-Penrose divergence, a measure of separation between two multivariate distributions, as a way to predict performance of a classifier on a dataset and to detect distributional shift, and we find that by computing Henze-Penrose scores between the training and test sets first in the input space, and then in the feature space of the classifier, we can get an indication that the test data is out of distribution and that classification accuracy will be unreliable.
Conference Presentation
© (2022) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Samuel T. Borden and Katie Rainey "Predicting classifier performance using distributional separation measures", Proc. SPIE 12113, Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications IV, 1211312 (6 June 2022); https://doi.org/10.1117/12.2622159
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Neural networks

Feature extraction

Image classification

Machine learning

Satellite imaging

Sensors

Back to Top