In many camera-based systems, person detection and localization is an important step for safety and security applications
such as search and rescue, reconnaissance, surveillance, or driver assistance. Long-wave infrared (LWIR) imagery promises
to simplify this task because it is less affected by background clutter or illumination changes. In contrast to a lot of related
work, we make no assumptions about any movement of persons or the camera, i.e. persons may stand still and the camera
may move or any combination thereof. Furthermore, persons may appear arbitrarily in near or far distances to the camera
leading to low-resolution persons in far distances. To address this task, we propose a two-stage system, including a proposal
generation method and a classifier to verify, if the detected proposals really are persons. In contradiction to use all possible
proposals as with sliding window approaches, we apply Maximally Stable Extremal Regions (MSER) and classify the
detected proposals afterwards with a Convolutional Neural Network (CNN). The MSER algorithm acts as a hot spot
detector when applied to LWIR imagery. Because the body temperature of persons is usually higher than the background,
they appear as hot spots in the image. However, the MSER algorithm is unable to distinguish between different kinds of hot
spots. Thus, all further LWIR sources such as windows, animals or vehicles will be detected, too. Still by applying MSER,
the number of proposals is reduced significantly in comparison to a sliding window approach which allows employing the
high discriminative capabilities of deep neural networks classifiers that were recently shown in several applications such
as face recognition or image content classification. We suggest using a CNN as classifier for the detected hot spots and
train it to discriminate between person hot spots and all further hot spots. We specifically design a CNN that is suitable for
the low-resolution person hot spots that are common with LWIR imagery applications and is capable of fast classification.
Evaluation on several different LWIR person detection datasets shows an error rate reduction of up to 80 percent compared
to previous approaches consisting of MSER, local image descriptors and a standard classifier such as an SVM or boosted
decision trees. Further time measurements show that the proposed processing chain is capable of real-time person detection
in LWIR camera streams.
|