The use of ground truth (GT) data in the learning and/or assessment of classification algorithms is essential. Using a biased or simplified GT attached to a remote sensing image to partition does not allow a rigorous explanation of the physical phenomena reflected by such images. Unfortunately, this scientific problem is not always treated carefully and is generally neglected in the relevant literature. Furthermore, the impacts of obtained classification results for decision-making are negative. This is inconsistent when considering investments in both the development of sophisticated sensors and the design of objective classification algorithms. Any GT must be validated according to a rigorous protocol before utilization, which is unfortunately not always the case. The evidence of this problem is provided, using two popular hyperspectral images (Indian Pine and Pavia University) that misleadingly are frequently used without care by the remote sensing community since the associated GTs are not accurate. The heterogeneity of the spectral signatures of some GT classes was proven using a semisupervised and an unsupervised classification method. Through this critical analysis, we propose a general framework for a minimum objective assessment and validation of the GT accuracy, before exploiting them in a classification method.
The proposed reduction approach is deterministic and iterative. It includes a connectivity criterion between bands which uses the Manhattan distance. This criterion allows the automatic partitioning of M spectral bands, leading to an identification of the most relevant spectral bands to keep in the further pixel classification process. Moreover, the use of this criterion avoids classes with only one band. The spectral band selected to represent a given class is the closest to all the other bands of this class, with respect to the used metric.
The spectral bands reduction developed has been evaluated and validated with our unsupervised descending hierarchical classification pixel method (UDHC), with the addition of a regularization step. A real hyperspectral image composed of 100 spectral bands has been used for the experimental study.
The persistence in using data sets from a biased ground truth does not allow objective comparisons between classification methods and does not contribute to providing explanation of physical phenomena that images are supposed to reflect.
In this communication, we present a fine and complete analysis of the spectral signatures of pixels within each class for the two ground truth data sets mentioned above. The metrics used show some incoherence and inaccuracy of these data which wrongly serve as references in several classification comparative studies.
The developed classification approach allows i) a successive partitioning of data into several levels or partitions in which the main classes are first identified, ii) an estimation of the number of classes automatically at each level without any end user help, iii) a nonsystematic subdivision of all classes of a partition Pj to form a partition Pj+1, iv) a stable partitioning result of the same data set from one run of the method to another.
The proposed approach was validated on synthetic and real hyperspectral images related to the identification of several marine algae species. In addition to highly accurate and consistent results (correct classification rate over 99%), this approach is completely unsupervised. It estimates at each level, the optimal number of classes and the final partition without any end user intervention.
The efficiency of this optimized version of LBG is shown through some experimental results on synthetic and real aerial hyperspectral data. More precisely we have tested our proposed classification approach regarding three aspects: firstly for its stability, secondly for its correct classification rate, and thirdly for the correct estimation of number of classes.
View contact details