Accurate building footprint extraction from optical remote sensing images remains challenging due to the diverse appearance and complex scenarios. Although recent deep learning-based methods have been shown to greatly improve the accuracy of building footprint extraction, vanilla deep networks still suffer from ambiguous predictions of edge pixels. The building edge contains abundant location and shape information, which is important for downstream applications such as building positioning and area measurement. Therefore, the problem of inaccurate edge prediction needs to be resolved urgently. To this end, we propose a novel edge-guided network (EGNet) that makes ample use of the edge prior in an end-to-end manner. First, an edge extraction module (EEM) is proposed to extract the building edge map. Then, an edge-guidance module (EGM) is designed to utilize the edge map to guide each encoder block in extracting edge-related features. Furthermore, a multi-scale context aggregation module (MCAM) is built to enhance the feature representation by aggregating contextual semantics with different receptive fields. EGNet can effectively mine edge semantics and guide the representation learning of boundaries, achieving 75.21% and 91.16% IoU on the Massachusetts and WHU datasets, respectively. Experimental results demonstrate that the proposed EGNet has a certain superiority in both accuracy and efficiency compared with current state-of-the-art (SOTA) methods.
|