Multispectral Airborne Laser Scanning Point-Clouds for Land Cover Classification Using Convolutional Neural Networks
1Department of Geography and Environmental Management, University of Waterloo, Waterloo, ON N2L 3G1, Canada; 2Department of Systems Design Engineering, University of Waterloo, Waterloo, ON N2L 3G1, Canada; 3Fujian Key Laboratory of Sensing and Computing for Smart Cities, School of Information Science and Engineering, Xiamen University, Xiamen, Fujian 361005, China; 4Department of Civil Engineering, Ryerson University, Toronto, ON M5B 2K3, Canada
This paper presents an automated workflow for pixel-wise land cover (LC) classification from multispectral airborne laser scanning (ALS) data using deep learning methods. It mainly contains three procedures: data pre-processing, land cover classification, and accuracy assessment. First, a total of nine raster images with different information were generated from the pre-processed point clouds. These images were assembled into six input data combinations. Meanwhile, the labelled dataset was created using the orthophotos as the ground truth. Also, three deep learning networks were established. Then, each input data combination was used to train and validate each network, which developed eighteen LC classification models with different parameters to predict LC types for pixels. Finally, accuracy assessments and comparisons were done for the eighteen classification results to determine an optimal scheme. The proposed method was tested on six input datasets with three deep learning classification networks (i.e., 1D CNN, 2D CNN, and 3D CNN). The highest overall classification accuracy of 97.2% has been achieved using the proposed 3D CNN. The overall accuracy (OA) of the 2D and 3D CNNs was, on average, 8.4% higher than that of the 1D CNN. Although the OA of the 2D CNN was at most 0.3% lower than that of the 3D CNN, the runtime of the 3D CNN was five times longer than the 2D CNN. Thus, the 2D CNN was the best choice for the multispectral ALS LC classification when considering efficiency. The results demonstrated the proposed methods can successfully classify land covers from multispectral ALS data.
Submanifold Sparse Convolutional Networks for Semantic Segmentation of Large-Scale ALS Point Clouds
Universität Stuttgart, Germany
Semantic segmentation of point clouds is usually one of the main steps in automated processing of data from Airborne Laser Scanning (ALS). Established methods usually require expensive calculation of handcrafted, point-wise features. In contrast, Convolutional Neural Networks (CNNs) have been established as powerful classifiers, which at the same time also learn a set of features by themselves. However, their application to ALS data is not trivial. Pure 3D CNNs require a lot of memory and computing time, therefore most approaches project point clouds into two-dimensional images.
Sparse Submanifold Convolutional Networks (SSCNs) address this issue by exploiting the sparsity often inherent in 3D data. In this work, we propose the application of SSCNs for efficient semantic segmentation of ALS voxel clouds in an end-to-end encoder-decoder architecture. We evaluate this method on the ISPRS Vaihingen 3D Semantic Labeling benchmark and achieve state-of-the-art 85.0% overall accuracy.
Furthermore, we demonstrate its capabilities regarding large-scale ALS data by classifying a 2.5km^2 subset containing 41M points from the Actueel Hoogtebestand Nederland (AHN3) with 95% overall accuracy in just 48s inference time or with 96% in 108s.
Towards Better Classification of Land Cover and Land Use Based on Convolutional Neural Networks
Leibniz University of Hanover, Germany
Land use and land cover are two important variables in remote sensing. Commonly, the information of land use is stored in geospatial databases. In order to update such databases, we present a new approach to determine the land cover and to classify land use objects using convolutional neural networks (CNN). High-resolution aerial images and derived data such as digital surface models serve as input. An encoder-decoder based CNN is used for land cover classification. We found a composite including the infrared band and height data to outperform RGB images in land cover classification. We also propose a CNN-based methodology for the prediction of land use label from the geospatial databases, where we use masks representing object shape, the RGB images and the pixel-wise class scores of land cover as input. For this task, we developed a two-branch network where the first branch considers the whole area of an image, while the second branch focuses on a smaller relevant area. We evaluated our methods using two sites and achieved an overall accuracy of up to 89.6% and 81.8% for land cover and land use, respectively. We also tested our methods for land cover classification using the Vaihingen dataset of the ISPRS 2D semantic labelling challenge and achieved an overall accuracy of 90.7%.
Building Segmentation from Aerial VHR Images using Mask R-CNN
1Dept of Geoscience and Remote Sensing, Delft University of Technology, the Netherlands; 2Dept of Computational Science and Engineering, Delft University of Technology, the Netherlands
Up-to-date 3D building models are important for many applications. Airborne very high resolution (VHR) images often acquired annually give an opportunity to create an up-to-date 3D model. Building segmentation is often the first and utmost step. Convolutional neural networks (CNNs) draw lots of attention in interpreting VHR images as they can learn very effective features for very complex scenes. This paper employs Mask R-CNN to address two problems in building segmentation: detecting different scales of building and segmenting buildings to have accurately segmented edges. Mask R-CNN starts from feature pyramid network (FPN) to create different scales of semantically rich features. FPN is integrated with region proposal network (RPN) to generate objects with various scales with the corresponding optimal scale of features. The features with high and low levels of information are further used for better object classification of small objects and for mask prediction of edges. The method is tested on ISPRS benchmark dataset by comparing results with the fully convolutional networks (FCN), which merge high and low level features by a skip-layer to create a single feature for semantic segmentation. The results show that Mask R-CNN outperforms FCN with around 15\% in detecting objects, especially in detecting small objects. Moreover, Mask R-CNN has much better results in edge region than FCN. The results also show that choosing the range of anchor scales in Mask R-CNN is a critical factor in segmenting different scale of objects. This paper provides an insight into how a good anchor scale for different dataset should be chosen.