Regular Articles

Coarse automated registration of visual imagery with three-dimensional light detection and ranging models

[+] Author Affiliations
Prakash Duraisamy

Old Dominion University, Norfolk, Virginia 23509

Kamesh Namuduri

University of North Texas, Department of Electrical Engineering, Denton, Texas 76203

Stephen Jackson

University of North Texas, Department of Mathematics, Denton, Texas 76203

Bill Buckles

University of North Texas, Department of Computer Science and Engineering, Denton, Texas 76203

J. Electron. Imaging. 22(3), 033035 (Sep 26, 2013). doi:10.1117/1.JEI.22.3.033035
History: Received November 29, 2012; Revised August 22, 2013; Accepted September 5, 2013
Text Size: A A A

Open Access Open Access

Abstract.  An automated, fast, robust registration of visual images with three-dimensional data generated from light detection and ranging (LiDAR) sensor is described. The method is intended to provide a coarse registration of the images, suitable for input into other algorithms or where only a rough registration is required. Our approach consists of two steps. In the first step, two-dimensional lines are extracted as features from both the visual and the LiDAR data. In the second step, a novel and efficient search is performed to recover the external camera parameters, and thus the camera matrix (as a calibrated camera is assumed). An error metric involving line matchings is used to guide the search. We demonstrate the performance of our algorithm on both synthetic and real-world data.

Figures in this Article

Three-dimensional (3-D) models play an essential role in building various applications including urban planning, cartography, and computer gaming. The fast and automated 3-D model reconstruction problem has attracted many researchers due to its vast potential and significance in many real-world applications. The data acquisition of 3-D models is complicated and time consuming. Existing large-scale systems take several months to create 3-D models with a lot of manual intervention.1 In addition, this process is computationally expensive and not suitable for applications like 3-D rendering. The traditional method to reconstruct surfaces is photogrammetry. Building 3-D models using photogrammetry requires two or more overlapping images with some common features such as lines or points. The correspondence between two images is established through a process called stereopsis.2 However, the success of 3-D model reconstruction using visual images is marginal. Despite much research on this topic, there is no widely accepted method for building complex scenarios such as large-scale urban environments with high precision that would be useful for applications such as disaster management. Further, there is a need for human intervention at the level of control and editing. 3-D model reconstruction using visual images also suffers from stereo-matching problems and degree of automation. At the same time, it has some advantages such as very rich scene information and high horizontal and vertical accuracy.

On the other hand, light detection and ranging (LiDAR) has emerged as a promising method to capture digital surface data in a precise manner. It is a system that reflects light from the surface of the earth using points at irregularly spaced intervals. We call the measured points a point cloud in this text. LiDAR has many advantages over traditional photogrammetric devices such as dense sampling, vertical accuracy, fast data collection and processing, and the ability to collect data in a wide range of applications. In addition, it is not restricted by illumination changes due to fluctuations in daylight or clouds unlike aerial photography. At the same time, it faces some drawbacks such as limited scene information, horizontal accuracy, no redundant information, and high cost. Both LiDAR and photogrammetry have their own advantages and disadvantages in the context of reconstructing surfaces; a summary comparison of the relative advantages and disadvantages of the two methods is shown in Table 1. Many researchers have attempted to combine these two methods into one for surface reconstruction. By combining LiDAR and photogrammetry, the advantages of both methods may be exploited.

Table Grahic Jump Location
Table 1Advantages of LiDAR and aerial images.

From the above comparison, it is interesting to see that many of the disadvantages of one method are offset by the other method’s advantages. This is a solid reason to fuse both methods into one.

Main Contributions

In this paper, we develop a strategy to register 3-D LiDAR data with two-dimensional (2-D) visual images without the use of any external camera data [such as global positioning system (GPS) coordinates]. The strategy is to do an efficient search (described in detail below) over the external camera parameter space. To grade a particular set of camera parameters, we use a set of best line segments extracted from the LiDAR image, along with the LiDAR altitude data (which gives a 3-D representation of the extracted line segments) along with a set of best line segments extracted from the aerial image. An error metric, discussed below, grades how well this particular camera matrix matches the two sets of line segments. The 3-D nature of the LiDAR data thus plays an important role in the method.

The novelty of the proposed method includes the following. (1) It does not require the use of GPS or inertial navigation system (INS) unlike some of the existing methods in the current literature. (2) It does not require any prior knowledge about extrinsic camera parameters. (3) The method yields reasonable results with only a small number of strong corresponding line segments present in both the LiDAR and visual images. (4) The novel search algorithm used (described in detail below) is efficient, and so the method is reasonably quick.

Our method assumes that we have a LiDAR image and a conventional aerial image of a region with significant overlap in scene content between the two imaged areas. Thus, we are implicitly assuming that the aerial images are taken of a scene with approximately known coordinates. The avoidance of the GPS or INS is that we do not need any explicit information about the aerial camera position or orientation and need merely the general coordinates of the object being photographed. We feel that in practice this may be a commonly realized situation. For example, in our real-world experiment, we took aerial photos from a helicopter fly-over of downtown New Orleans. The fact that we were photographing downtown New Orleans is sufficient to extract a corresponding LiDAR image from a database of LiDAR data (in our case from the National Geologic Survey); however, we are free to take aerial images as we like without the need for GPS or INS data from the helicopter. Aside from 3-D model reconstruction and map updating, we feel that this method could prove useful in other contexts such as extraterrestrial terrain reconstruction, where a global LiDAR survey may be available but GPS or INS data may be limited.

We apply our method to a synthetic model of an urban environment and then to real-world LiDAR and aerial data from a portion of the city of New Orleans. The data acquisition methods are described below in Sec. 3.

3-D models of urban areas can be built in many ways. In earlier days, 3-D models were developed from two or more visual images, in particular, the collection of polygon patches, in which each patch is surrounded by compact area with photometric and chromatic properties that mutually touch the common boundaries. However, this method lacks accuracy and resolution, and the ability to be fully automated.1

3-D Reconstruction Using Digital Surface Model

In recent years, better resolution and accuracy of airborne lasers have enabled the improvement of 3-D model building using digital surface model (DSM). In addition, the advancement of efficient software systems has reduced human interaction in developing 3-D models. Of particular note are automatic approaches that integrate DSM with 2-D ground plane information in building the model. Although such methods are fast, the resulting resolution is only limited.2 For example, the work presented in 2 focuses on building the 3-D model using ground-level elevations, but it is limited to a few buildings.

3-D Reconstruction Using Visual Images

An image-based method3 was later introduced that overcomes limited resolution problems. In this approach, the scene is recorded by video camera from different viewpoints. Image-based approaches have advantages and challenges.3,4 Primarily, there is no need for calibration of the scene and there is no range limitation in this approach, which can cover the whole scene.

In addition, the data acquisition is easier since recording can be done using any photo recording device like camcorders or digital cameras. This flexibility leads to less cost in building the model. In contrast, however, it has several drawbacks including illumination effects, complexity of outdoor scenes, particularly glass surfaces, bushes, and trees, which cause problems for vision-based approaches.

Another approach based on half spherical images is proposed in 5. This model uses corners as features to detect edges in 2-D images, which were used to estimate the relative rotation of multiple cameras. A hybrid model consisting of expectation maximization and Hough transform uses these edges along with intrinsic camera calibration to estimate the 3-D orientation of vanishing points (VP) of the scene. Finally, the VP information is matched across the cameras and their correspondences are used for optical alignment. This approach has several advantages like scalability, global optimality, robustness, and sophisticated pose computation, but it has a few limitations like the data acquisition being obtained in a stop-and-go-fashion and the 3-D reconstruction being purely image based, which has its own deficiencies.

3-D Reconstruction Using Ground-Based 2-D and 3-D Scanners

A more recent method for building 3-D models makes use of mobile robots with 2-D and 3-D scanners.4,6 In this model, accurate geometry and photometry of the scene is generated by integrating range and image measurements. Mobile robots are used for data acquisition, in which they start with multiple unregistered range scans and visual images with different viewpoints. 3-D features are extracted from the overlap part of the 2-D planar surface, and the 3-D features are registered with the 2-D features and finally translated into a solid 3-D model. However, this model suffers from several limitations such as increased time for data acquisition and reliability of robots in external environments being not dependable.

Another sophisticated model was introduced based on vertical laser scanners in cars.7 In this approach, ground-based lasers are fused with charge-coupled device (CCD) images for building a textured 3-D urban object. Here, both the laser range image and the CCD image are captured from the same platform. The registration between these two images is done by capturing planar surfaces that are extracted from laser range images. This approach has a few advantages such as data acquisition in reasonable time, and at the same time, it has one limitation. It uses a GPS for localization. GPS usually fails in dense environments (canyons). It also gives erroneous results in multipath environments, and it is highly expensive for ground-based modeling.

Later, a fast and improved ground-based acquisition of 3-D models was introduced.8 In this model, data are collected at ground-based level using two inexpensive scanners, one in a vertical and another in a horizontal direction. Successive scans are matched with each other to estimate the vehicle motion, and corresponding movements are concatenated to form an initial path. This initial path is corrected by Monte Carlo localization (MCL) technique. An aerial photograph is used as a global map to match the horizontal laser scans. This model is improved compared to previous models, but still faces some problems. A first problem is in pose computation because pose correction depends on laser scans, and the accuracy of the global map is very limited in dense areas. Second, during data acquisition, the truck was not able to scan facades on the backsides of the buildings. Third, it faces some vision-based problems such as glass surfaces and foreground objects appearing cluttered and visually not convincing. However, it is still an open problem as to how to use this model for complex buildings, especially since this model does not provide any information about the rooftops that are only visible from ground level.

3-D Reconstruction Using LiDAR and Visual Images

An enhanced version of8 was introduced in 9, in which a detailed facade model is registered and fused with a complementary airborne model using MCL techniques. Finally, these two models are merged at different resolutions to build a 3-D model. This model is more advanced than previous models, and visually acceptable, but still has one limitation. It faces problems with foreground objects like cars and trees. Therefore, in dense areas, the resulting image has visual artifacts. A fast and automated camera pose algorithm was introduced to build 3-D models by fusing aerial images with untextured LiDAR data in 10. In this approach, they used VP to develop a feature matching technique based on 2-D corners in aerial images with corresponding 3-D orthogonal corners in DSM images obtained from LiDAR data. The focal length is fixed in this model for the entire data acquisition process, and GPS (NAV 420 CA) was used for estimating a coarse camera position. However, this model sometimes fails to produce detailed information from dense buildings.

Several attempts have been made to create 3-D models by fusing aerial and LiDAR data. A 3-D model generation based on fusing aerial and LiDAR data using multiple geometric features is introduced in 11. In this model, the 2-D images are registered with 3-D LiDAR data by hybrid block adjustment using lines and points as features. While this registration method is robust, it requires manual selection of points for detecting exterior orientation.

A new approach was introduced to detect the boundaries of complex buildings from LiDAR and photogrammetric images. This method consists of three steps: (1) generating directional histograms, (2) splitting and merging of line segments, and (3) matching the line segments. This approach performs substantially better compared to the methods using LiDAR data only. However, the limitation of this approach is that it works mainly for planar scene structures.12 Another approach introduced in 13 integrates LiDAR point clouds and large-scale vector maps to model buildings. This approach involves three stages: (a) preprocessing of LiDAR point clouds and vector maps, (b) roof analysis, and (c) building reconstruction. However, this approach expects sufficient point cloud density to generate reasonable results.

In contrast to the above existing strategies, in our research, we developed a robust and fast registration technique that estimates the extrinsic camera parameters without requiring any prior knowledge or the need for sensors such as GPS or INS.

The goal in our approach is to register a 2-D image (the aerial image) with a 3-D model obtained from LiDAR data. We assume that we have no knowledge about the extrinsic parameters of the camera model, and also we have not used GPS, INS, or other electronic sensors. However, we assume that we know the intrinsic parameters of the camera (i.e., we have a calibrated camera). Our solution to the registration problem is to determine 3-D lines from the 3-D model, using line extraction from the LiDAR image along with the LiDAR z-coordinate data, and match them with the corresponding lines in the 2-D image using estimated extrinsic camera parameters. The algorithm will employ a novel search method to search the space of external camera parameters. The degree of matching of the two sets of line segments will be given by a line-matching metric, which we describe in detail below. We ran the algorithm on both a synthetic model as well as real-world data from the city of New Orleans.

Data Acquisition

For both the synthetic model and the real-world data, we use as inputs to our method a LiDAR data set and an ordinary photographic image, which we refer to as the aerial image. For the synthetic model, the LiDAR data were actually acquired with a Nextengine 3-D laser scanner, which was mounted on a table and which scanned the model to generate output as x, y, z coordinates with z being the altitude. The scanner resolution is 0.5cm. This array is considered to be the LiDAR data, which is input to our algorithm. The scanner and model are shown in Fig. 1.

Graphic Jump LocationF1 :

Nextengine three-dimensional laser scanner and synthetic model.

For the aerial image of the synthetic model, we used a common digital camera with manual focus, which we calibrated after capturing the image. The distance from the camera to the model was 1.55 m. The LiDAR image from the scanner and its corresponding aerial image are shown in Figs. 2(a) and 2(b). The image resolution is 0.1cm.

Graphic Jump LocationF2 :

(a) Digital surface model (DSM) obtained from LiDAR data corresponding to a synthetic model. (b) Visual image of same synthetic model from a different perspective. (c) DSM obtained from LiDAR data corresponding to an urban area in Louisiana. (d) Aerial image of the same urban area from a different perspective.

For the real-world LiDAR data, we used a portion of a LiDAR map of the Louisiana area produced by the US Geological Survey (USGS). The raw data from this survey were in the form of a list of x, y, z values similar to the data for the synthetic model described above. Again, the z values refer to the vertical altitude. The resolution/accuracy of the raw LiDAR data is in the neighborhood of 1 to 2 m. Aerial images of the same location were captured with a digital camera from a fly-over of the area in an R22 Beta II helicopter at altitude of 1200 ft. (as given by helicopter altimeter). The camera resolution is 0.5 to 1.0 m at this camera distance and contains much richer scene information than the LiDAR image. The images were taken at several different angles including significant oblique perspectives. The camera was again calibrated after the image captures.

DSM Generation

The raw scan data consist of a set of triples (x,y,z) with z representing the altitude. In order to refine the scan, it is better to resample the scan points to a regularly spaced row-column structure. However, this step will reduce the spatial resolution based on the grid size. The DSM is created by assigning the highest z value to each grid size cell from its member points, and empty cells are filled by nearest neighbor interpolation. The raw data of x, y, z values were converted using the MATLAB griddata command to an array of z values corresponding to regularly spaced x and y coordinates. Ultimately, the purpose of registration is extraction of properties available through the use of both modalities. Gridding the LiDAR provides a common intermodel data structure that facilitates measures across instruments.

For both the synthetic and the real-world versions, the LiDAR data array is converted onto a grayscale DSM. This DSM is then used to extract line segments, which are used in our registration algorithm. We describe the method of line segment extraction in the next subsection.

Note that there are several sources of error entering into the DSM. For example, the DSM data are affected by small objects such as shrubs, trees, cars, antennas, etc. It is very difficult to reconstruct all these small objects at DSM resolution. In addition, the scan points below the rooftops cause inconsistent altitude values, which result in jittery edges. For better reconstruction, we implemented additional processing steps such as adjusting the image contrast to aid in edge extraction. We used the imadjust MATLAB functions to adjust the intensity values in the images. Intensity adjustment is an image enhancement technique that maps image intensity values to a new range.14 The generated DSM of the synthetic model is shown in Fig. 2(a) and for the urban area of New Orleans in Fig. 2(c).

Finding Line Segments in DSM

To extract the line segment information from the grayscale DSM, we used a standard Canny edge detector to extract a binary (0, 1 valued) image corresponding to pixels determined to be on edges according to the algorithm. In a grayscale DSM, the dynamic range of elevation is mapped linearly into the intensity range of the gridded image. In the DSM, the edges are identified based on pixel values. If the depth value of one pixel differs from any of its neighbors by an amount higher than a threshold, then that pixel will be marked as an edge pixel. The Canny output was then processed to generate a list of 3-D line segments. For the synthetic model, we used a two-step process employing line segment detection and joining algorithms from Peter Kovesi’s toolbox.15 Basically, these algorithms subdivide the contour, and when the maximum deviation of the contour exceeds a threshold value, an edge segment is formed. Line segments are then fitted to these lists of edge segments. Also, small line segments are discarded. Specifically, the MATLAB command edgelink from the toolbox was used to generate an edge list from the Canny output, and the command seglist was used to fit line segments to the edge list data. The line segment data are stored as a list of pairs of coordinates for the endpoints of the line segments. Since the DSM includes z-coordinate values, these endpoint coordinates are points in R3. For the real-world data, we found it more appropriate to use the Hough lines method to extract the lines from the DSM. We used the Hough, Houghpeaks, and Houghlines MATLAB functions to implement the Hough transform edge-extraction. The results of the line-segment extraction are shown in Fig. 3.

Graphic Jump LocationF3 :

(a) Canny output from the synthetic image. (b) Edges extracted from the synthetic image. (c) Canny output from the DSM. (d) Houghlines extracted from the DSM.

Finding Line Segments in Aerial Images

We again used the Canny edge detector to determine the 2-D edges in the aerial images for both the synthetic and real-world data sets. For the synthetic model, we applied Peter Kovesi’s toolbox to extract line segments from the Canny output. For the real-world data, we found it more appropriate to use the Hough transform for the edge extraction from the Canny output. Again we used the Hough, Houghpeaks, and Houghlines MATLAB functions to implement the Hough transform edge-extraction. It was necessary to adjust the parameters to these functions to obtain suitable edges. The results of the line detection and extraction are shown in Fig. 4.

Graphic Jump LocationF4 :

(a) Canny output from the synthetic visual image. (b) Lines extracted from the synthetic visual image. (c) Canny output from the real-world scene. (d) Houghlines extracted from the real-world scene.

For both the synthetic and real-world models, the extracted line segments from the aerial images are stored as 2-D line segments (i.e., a pair of 2-D points). Smaller line segments are again removed in order to improve the ratio between true and false edges. However, there still will be plenty of false edges due to small objects like windows, sidewalks, vehicles, and trees.

Searching for Extrinsic Parameters

The complete camera matrix P consists of 11 parameters (P is a homogeneous 3×4 matrix), in which 5 are intrinsic parameters and remaining 6 are extrinsic parameters.16 The intrinsic parameters correspond to the camera calibration matrix K, which in our model is assumed known. The six extrinsic parameters correspond to the pose of the camera, that is, to the position of the camera center (relative to world origin) and the camera orientation. We assumed fixed (but arbitrary) world coordinates and let the camera centered coordinates be as shown in Fig. 5.

Graphic Jump LocationF5 :

Illustration of extrinsic camera parameters.

In Fig. 5, C=(C1,C2,C3)T are the coordinates of the camera center in the world coordinate system. In this formula, R represents the rotation matrix that rotates the world coordinate axes into the camera coordinates axes. Also, t=RC can be thought of as a translation vector. In our method, we use a somewhat different approach to computing P, which greatly improves the performance of the algorithm. The idea is that in most situations, the camera will point roughly in the direction of C, that is, it will approximately be pointing back to a common position, which we take as the center of the world coordinate system. This has the effect of reducing two of the three angles involved in the rotation matrix to small ranges and essentially removes two degrees of freedom from the search space. To be specific, let x,y,z denote the world coordinate axes, let x, y, z denote the camera centered coordinate axes as shown in Fig. 5, and let x^, y^, z^ denote the coordinate axes defined as follows. Let z^=C be the vector pointing from the camera center to the origin of the world coordinate system. The remaining axes x^ and y^ are arbitrary subject to forming a right-handed orthogonal coordinate system including x^. In our case, we took x^ to be a vector orthogonal to z^, which had 0 third coordinate. We then have y^=x^×z^. We let R be the rotation matrix that rotates the (x^,y^,z^) coordinate system to the (x,y,z) coordinate system. The matrix R can be described as follows. Let αx, αy, αz denote rotation angles about the x, y, z axes with respect to the camera center. Then Display Formula

R=Rz·Ry·Rx,
where Display Formula
Rx=(1000cos(αx)sin(αx)0sin(αx)cos(αx))
Display Formula
Ry=(cos(αy)0sin(αy)010sin(αy)0cos(αy))
Display Formula
Rz=(cos(αz)sin(αz)0sin(αz)cos(αz)0001).

Again, the idea is that αx, αy should be small. Let M be the matrix whose columns are the unit vectors x^, y^, z^ expressed in world coordinates. An easy computation shows that Display Formula

M=(C2C12+C22C1C3C12+C22+C32C12+C22C1C12+C22+C32C1C12+C22C2C3C12+C22+C32C12+C22C2C12+C22+C320C12+C22C12+C22+C32C12+C22C3C12+C22+C32).

Note that M is an orthogonal matrix, so M1=MT. The matrix M1 will change (x,y,z) coordinates into (x^,y^,z^) coordinates, and so RM1 will change (x,y,z) coordinates into (x,y,z) coordinates. Thus, our formula for the camera matrix becomes Display Formula

P=(K·R·M1|K·R·M1·C),(1)
which is a function of C=(C1,C2,C3) and the rotation angles (αx,αy,αz).

We parametrize the camera center coordinate C using spherical coordinates with respect to the world coordinate system. That is, we let Display Formula

C1=rsin(φ)cos(θ),C2=rsin(φ)sin(θ),C3=rcos(φ).

In our model, we assume that the distance r from the world origin to the camera center is known. The angles θ, φ are assumed to be arbitrary (and unknown). Thus, our search space consists of the arbitrary angles θ, φ, αz, and the small angles αx, αy. In practice, even the angles θ and φ would in many cases be approximately known. However, we did not assume any a priori knowledge of these parameters.

For given values of s=(θ,φ,αz,αx,αy) in the search space, the matrix P(s) is computed. The matrix P(s) gives a projective transformation from the lines in the 3-D LiDAR image into the 2-D camera image plane. We search over the angles θ, φ in increments of 5 deg and αx, αy, αz in increments of 0.5 deg. For each value of s, we compute an error metric for this line matching, which we use to rate this particular value of s. This error metric for the line matching is described next.

Line Matching Process

For a given value s=(θ,φ,αz,αx,αy) in the search space, and corresponding camera matrix P(s) described above, the next step is to grade this set of parameters with a suitable error metric that describes how well P(s) actually registers the two images. Recall we have extracted a set of 3-D line segments in the LiDAR image, which we denote L1,,Lm, and a set of 2-D line segments in the aerial image, which we denote L1,,Ln. For given s, we compute the images of the 3-D line segments via the assumed camera matrix P(s), which gives P(s)L1,,P(s)Lm. We compute P(s)Li by multiplying the coordinates of the endpoints of Li (expressed in projective coordinates) by P(s) and converting back to Euclidean coordinates. Note that the transformed line segment P(s)Li is given as an unordered set of 2-D coordinates of endpoints. The error metric Err(s) is then computed as follows. For each im, we find the jn, which minimizes the distance between the unordered sets P(s)Ei and Ej, where P(Ei) denotes the transformed endpoints of Li and Ej denotes the endpoints of Lj. In mathematical terms, we are using the Hausdorff metric on the two point subsets of R2 extending the usual Euclidean metric d on R2. Let d^ denote this extended Hausdorff metric on the two point sets. We choose the value of j that minimizes the d^ distance and then sum over all i to obtain our final error metric. Thus, we define the following error metric to guide the search process. Note that the error is given in units of length corresponding to unit pixel distance in the aerial image (p.d.a.i.). Display Formula

Err(s)=i=1mmin1jnd^[P(s)Ei,Ej].(2)

While the above error metric is sufficient to produce reasonably accurate registration on some of the image pairs, there are other cases where it does not perform as expected. The problem seems to be that when there are too many LiDAR edges that do not correspond to any of the extracted aerial line segments, then the corresponding terms in the above formula behave as noise and can be sufficiently large to override the non-noise terms in the formula. To adjust for this, we modified the above metric by replacing the term min1jnd^[P(s)Ei,Ej] in the formula with the truncation Display Formula

min{min1jnd^[P(s)Ei,Ej],ME}(3)
to ensure that any individual term does not exceed a certain threshold value ME. This controlled the contribution from the noise terms and resulted in improved performance.

In implementing the above error metric, we used only a relatively small set of best line segments from both the LiDAR and aerial images. For the synthetic model, we used the 10 best line segments from each image. In our case, the notion of best was determined by the length of the line segment (in Euclidean coordinates). In practice, the line detection algorithms generate many line segments in both images, and the smaller line segments tend to be less reliable. Also, smaller line segments may correspond to transient features such as cars, which vary in the images.

To recover the external camera parameters, we simply select the s that minimizes Err(s), which by Eq. (3) gives the camera matrix P. As we noted previously, an important point is that the angles αx,αy in the search space are confined to small values, and so the parameter search is mainly a search over the remaining parameters φ,θ,αz. This makes the resulting search practical.

We ran the method described above on two different models. For the first, we used a synthetic model of a city, which was the object shown in Fig. 1. The object, which is intended to approximate an urban environment, is composed mainly of straight line segments. For the second application, we used real-world data obtained from a helicopter fly-over of part of the city of New Orleans as well as LiDAR data of the same area obtained from the USGS. We describe each of these experiments in more detail.

Synthetic Model

The wooden model of Fig. 1 measures 11cm by 18 cm. The synthetic LiDAR data were obtained by using a 3-D light scanner that outputs the data in a file consisting of triples (x,y,z), where the z-coordinate gives the altitude above the level plane of the scanner. Thus, the synthetic LiDAR values are in a form consistent with the real-world LiDAR values to be used in the real-world model. For the aerial view, we used a digital camera with a random orientation at a distance of 1.5m from the object. The camera was pointed roughly in the direction of the model being photographed, which is the main assumption of our method.

As described above, we first turned the raw scanner data into a grayscale image and then ran the Canny edge detector. Line segment extraction and joining was then done using Kovesi’s toolbox, and from the resulting line segments, the five best were selected. The results are shown in Figs. 3(c) and 3(d).

At this point, we have extracted the best line segments from both the LiDAR and aerial images, and these data are then passed to the main MATLAB program that implements the algorithm described in Sec. 3. The algorithm was run on a core i5-2400 3.1 GHz PC implementing MATLAB version R2010b. We used a camera distance of 1.55 m. The search range for αx and αy was restricted to the range of 5deg to 5 deg. The angles θ,φ,αz were unrestricted (so 0φ90deg, 0θ, αz360deg). After completion of the algorithm, we have recovered the camera matrix P that corresponds to the best registration of the LiDAR line segments to the aerial line segments according to the error metric described in Sec. 3.6. For this value of P, the transformed LiDAR line segments are shown in Fig. 5. The values of the camera pose parameters for the optimal pose were determined to be αx=4deg, αy=2deg, αz=1deg, θ=75deg, and φ=35deg. The final output of the algorithm is shown in Fig. 6.

Graphic Jump LocationF6 :

Results of the proposed automated registration approach on the synthetic model.

In this figure, the green lines are the best aerial edges, where we have removed the bounding edges of the object for visual clarity. The blue edges are the best LiDAR edges transformed by the camera matrix P computed by the algorithm.

Test for Robustness of Algorithm

To test the accuracy and robustness of the algorithm, we performed a sequence of tests using the synthetic model above for a variety of controlled values of θ and φ (corresponding to different camera positions). The actual camera positions were measured and recorded so that the true values of these angles were known. We then run the algorithm and compare the recovered parameters with the known values. We did this for six camera positions, corresponding to two values of φ and using three values of θ for each of these. The results are shown in Table 2 (the angles are shown in degrees).

Table Grahic Jump Location
Table 2Results of different runs for robustness check.

The mean square error for the recovered value of φ is 8.4 deg, and for the recovered value of θ is 5.4 deg. Since for the model use, the variation in the vertical pixel positions is less significant than for the x and y axes, we expect the sensitivity to θ to be greater than that for φ, and the results above seem to agree with this. In any event, the recovered values are in approximate agreement with the true (measured) values.

Real-World Model

For the second application of our method, we used real-world LiDAR and visual data from a section of downtown New Orleans. The LiDAR data used were from a USGS survey17 from which the appropriate part of the data set was extracted. Visual images of the same area was obtained from a helicopter fly-over of the area. Since we are assuming a calibrated camera model, we did not alter the camera parameters between the capture of the image and the calibration of the camera.

A grayscale rendering of the LiDAR data is shown in Fig. 2(a). The contrast in the LiDAR image was enhanced in MATLAB and then edge detection was done using the Canny edge detector. Due to the rather coarse nature of the LiDAR data, including some rather jagged edges, we found it advantageous to employ the Hough transform for the edge extraction step. This required configuration of some of the parameters of the Hough and Houghlines MATLAB functions that were called. The results of the edge detection and extraction are shown in Figs. 3(a) and 3(b).

The first aerial image we used is shown in Fig. 2(b). We again used the Canny edge detector followed by the Hough transform to extract the best aerial edges, which are shown in Figs. 4(a) and 4(b). Again, the algorithm was run on a core i5-2400 3.1 GHz PC with MATLAB version R2010b. After running our algorithm, the transformed LiDAR edges (in blue) are shown superimposed on the aerial edges (in green) in Fig. 1. The algorithm recovered the camera pose parameters as described in Sec. 3.5 to be αx=0, αy=1deg, αz=1deg, θ=150deg, and φ=50deg.

A second aerial image of the same area from a different perspective is shown in Fig. 8(a). The results of the edge extraction and output of the algorithm are also shown in Figs. 8(b) and 8(c). The algorithm determined the camera pose parameters to be αx=1deg, αy=2deg, αz=3deg, θ=120deg, and φ=45deg. Table 3 shows the final values of the error metric for the recovered poses, along with the computation time, for the synthetic model and the two real-world data sets.

Table Grahic Jump Location
Table 3Error and computational times.

In this paper, we have investigated a method for 2D-3D registration of aerial camera and LiDAR data without the use of any external orientation data. In particular, no use is made of GPS or other external coordinate data. Because of the relatively coarse nature of the LiDAR data, it is not expected that the camera matrix recovery will be perfect. Rather, it is expected that the method might be useful as an initial camera pose recovery method, which, for example, may be suitable to pass to other registration methods. In the runs of the method we did, we found that decreasing the angle step size for θ and φ in the search algorithm from 5 to 2.5 deg generally did not result in an increased performance. This is most likely due to the nature of the LiDAR data as well as inherent limitations in the method itself (e.g., imperfections in the line matching algorithm). Nevertheless, in both the synthetic and real-world runs of the algorithm, the method was able to recover the camera pose to within about 5 deg. This is confirmed by visual inspection of the LiDAR and aerial images. To help confirm this, in the real-world images shown in Figs. 7 and 8(c), we manually added vertical edges to the output of the LiDAR line extraction for one of the buildings (these edges do not appear in the aerial edge list and so do not affect the algorithm). Visual comparison of these figures with the aerial images confirms the accuracy of the recovered pose.

Graphic Jump LocationF7 :

Results of registering DSM with visual image: Blue lines are projected from DSM and green lines are projected from visual image.

Graphic Jump LocationF8 :

(a) Second aerial image. (b) Edge extraction. (c) Final registration.

Future Work

There are several limitations to the current implementation of the algorithm. First, we assume a known value r for the distance from the camera to the object (origin of world coordinate system). In theory, we could add this as a parameter to our search space, but this adds new difficulties. Unlike the other search parameters, r does not range over a bounded interval, so some a priori knowledge is still necessary. It would be reasonable, however, to assume coarse bounds on the values of r and search over the resulting interval. This, however, would slow the algorithm considerably, and in practice, we feel it is a reasonable assumption to say that r is known. For example, if the visual images are acquired by aircraft, altitude data are probably available, and from this, a reasonable estimate of r can be obtained. Note that in this case, even if a GPS were available for the aircraft, if the LiDAR image was taken without GPS coordinates, then the aircraft GPS coordinates alone would not be of assistance in determining the external pose parameters. This situation may arise, for example, when the LiDAR and aerial images were taken at different times. Alternatively, if one assumes knowledge of a matched pair of line segments in the LiDAR and visual images, then the algorithm could be readily modified to compute the appropriate value of r at each step. Indeed, the error metric used in the algorithm can be easily modified to account for a certain number of fixed matched pairs of line segments. This might also improve the accuracy of the algorithm, but we have not as yet carried out this investigation. Our goal is to move toward a completely automated registration procedure, not using any camera pose information. Another limitation is that the method fails to work if the visual image is taken from an angle nearly orthogonal to that of the LiDAR image. This seems to be an inherent limitation of the method, but in many applications would not present a major obstacle (e.g., if all the images were taken by an aircraft). On the other hand, registering an overhead LiDAR view with a horizontal ground view would not be feasible.

There are several ways in which the algorithm might be improved. First, for the aerial images, it was necessary to manually adjust some of the parameters in the Hough line extraction in order to obtain reasonable line sets (although the same parameter set worked for both aerial images). More work needs to be done on automating/improving this part of the method. Also, various factors such as small objects (trees, cars, etc.) and other occlusions affect the quality of the edge extraction. These remain challenging problems. There is also much room for further investigation into the error metric that is used in the camera-matrix scoring routine. For example, giving different weights to different line segments based on a reasonable image-specific criterion might improve the method.

This research is partially supported by the Texas Norman Hackerman Advanced Research Program under Grant No. 0003594-00160-2009.

Frere  D. et al., “Automatic modeling and 3D reconstruction of urban buildings from aerial imagery,” in  IEEE Int. Geoscience and Remote Sensing Symp. Proc. , pp. 2593 –2596 (1998).
Schenk  T., Satho  B. C., “Fusion of LiDAR data and aerial imagery for a more complete surface description,” in  ISPRS  Vol. 34, pp. 310 –317 (2002).
Koch  R., Pollefeys  M., Gool  L. V., “Realistic surface reconstruction of 3D scenes from uncalibrated image sequences,” J. Visual. Comput. Animation. 11, (3 ), 115 –127 (2000).
Stamos  I., Allen  P. K., “3-D model construction using range and image data,” in  CVPR , pp. 531 –536,  IEEE  (2000).
Antone  M., Teller  S., “Automatic recovery of relative camera rotations for urban scenes,” in  CVPR , pp. 282 –289,  IEEE  (2000).
Hahnel  D., Burgard  W., Thrun  S., “Learning compact 3 D models of indoor and outdoor environments with a mobile robot,” in  4th European Workshop on Advanced Mobile Robots ,  Sweden , Vol. 44, pp. 15 –27,  Elsevier  (2001).
Zhao  H., Shibasaki  R., “A system for reconstructing urban 3-D objects using ground- based range and CCD images,” in  Proc. of IAPR Workshop on Machine Vision Applications ,  Japan  (1998).
Fruh  C., Zakhor  A., “An automated method for large-scale, ground-based city model acquisition,” Int. J. Comput. Vis.. 60, , 5 –24 (2004). 0920-5691 CrossRef
Fruh  C., Zakhor  A., “Constructing 3-D city models by merging ground based and air-borned views,” in  CVPR , pp. 562 –569,  IEEE  (2003).
Ding  M., Zakhor  A., “Automatic registration of aerial imagery with untextured 3D LiDAR models,” in  Proc. of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , pp. 1 –8 (2008).
Fei  D., Huminjie  B., Haiyan  C. G., “Automatic registration between LIDAR, and digital images,” in  ISPRS , pp. 487 –490 (2008).
Lee  D. H., Lee  K. M., Lee  S. U., “Fusion of Lidar and imagery for reliable building extraction,” Am. Soc. Photogramm. Rem. Sens.. 74, (2 ), 215 –225 (2008).
Chen  L. C. et al., “Shaping polyhedral buildings by the fusion of vector maps and Lidar point clouds,” Am. Soc. Photogramm. Rem. Sens.. 74, (9 ), 1147 –1157 (2008)
The Math Works, “MATLAB program,” http://www.mathworks.com/ (2010).
Kovesi  P., “MATLAB functions for computer vision and image processing,” http://www.csse.uwa.edu.au/~pk/research/matlabfns/ (2011).
Hartley  R., Zisserman  A., Multiple View Geometry in Computer Vision. , 2nd ed.,  Cambridge University Press  (2004)
Atlas: The Louisiana Statewide GIS. LSU CADGIS Research Laboratory, Baton Rouge, LA, http://atlas.lsu.edu (2009).

Grahic Jump LocationImage not available.

Prakash Duraisamy received his BE degree in electronics and communication engineering from the Bharathiar University, Coimbatore, India, in 2002 and his master’s degree in electrical engineering from the University of South Alabama, Mobile, Alabama, in 2008. He completed his PhD degree in computer science and engineering at University of North Texas, Denton, Texas, in December 2012. Since January 2013, he has worked as a visiting scientist at Massachusetts Institute of Technology, and since May 2013, he has worked as postdoctoral fellow at Old Dominion University, Virginia. His areas of interest are image processing, computer vision, pattern recognition, bio-medical, LiDAR, remote sensing, and multiple-view geometry.

Grahic Jump LocationImage not available.

Kamesh Namuduri received his BS degree in electronics and communication engineering from Osmania University, India, in 1984, MS degree in computer science from University of Hyderabad in 1986, and PhD degree in computer science and engineering from University of South Florida in 1992. He worked in C-DoT, a telecommunication firm in India from 1984 to 1986, where he participated in the development of the first indigenous digital exchange in India. He also worked in GTE Telecommunication Services Inc., USA, (now Verizon) from 1993 to 1997, where he participated in the development of a mobile telephone fraud detection system. From 1998 to 2002, he worked as a research scientist in the Center for Theoretical Studies of Physical Systems at Clark Atlanta University in Atlanta. From 2002 to 2008, he was a faculty member in the Department of Electrical Engineering and Computer Science at Wichita State University. Currently, he is with the Electrical Engineering Department at University of North Texas as an associate professor. His areas of research interest include information security, image/video processing and communications, and ad hoc sensor networks.

Grahic Jump LocationImage not available.

Stephen Jackson received his PhD from the University of California, Los Angeles, in 1983 under the supervision of Donald Martin. Upon graduation, he was a Van Vleck assistant professor of mathematics at the University of Wisconsin, Madison, for three years, a two-year IBM research fellow at the California Institute of Technology, and then three years as a National Science Foundation research fellow at the California Institute of Technology. In 1984, he was awarded the Victoria Delfino Prize for his calculation of δ51. Presently, he is Regents professor of mathematics at the University of North Texas.

Grahic Jump LocationImage not available.

Bill Buckles received his graduate degrees, including his PhD, from the University of Alabama in Huntsville in computer science and in operations research. He is presently a professor of computer science in the Computer Science and Engineering Department at University of North Texas. He has published almost 150 papers in national and international journals and has authored a book. His research has been supported by NASA, NSF, the states of Louisiana and Texas, and the Missile Defense Agency. Twice he has been honored with national technical achievement awards from NASA. He has been a visiting professor at the Techhochshule in Aachen Germany, the GMD (Germany’s counterpart to the NSF), the Free University of Brussels, and National Central University of Taiwan. His research focuses on image understanding, video analytics, and related problems in search, optimization, and pattern recognition. Nearly all of the graduate students he has advised have embarked upon academic careers leading often to distinguished service or research. He has been an associate editor of the IEEE Transactions on Parallel and Distributed Systems, chair of the IEEE Computer Society Technical Committee on Distributed Processing, and general chair of the IEEE International Conference on Distributed Computing Systems. He belongs to several professional societies that set high technical standards.

© The Authors. Published by SPIE under a Creative Commons Attribution 3.0 Unported License. Distribution or reproduction of this work in whole or in part requires full attribution of the original publication, including its DOI.

Citation

Prakash Duraisamy ; Kamesh Namuduri ; Stephen Jackson and Bill Buckles
"Coarse automated registration of visual imagery with three-dimensional light detection and ranging models", J. Electron. Imaging. 22(3), 033035 (Sep 26, 2013). ; http://dx.doi.org/10.1117/1.JEI.22.3.033035


Figures

Graphic Jump LocationF1 :

Nextengine three-dimensional laser scanner and synthetic model.

Graphic Jump LocationF2 :

(a) Digital surface model (DSM) obtained from LiDAR data corresponding to a synthetic model. (b) Visual image of same synthetic model from a different perspective. (c) DSM obtained from LiDAR data corresponding to an urban area in Louisiana. (d) Aerial image of the same urban area from a different perspective.

Graphic Jump LocationF3 :

(a) Canny output from the synthetic image. (b) Edges extracted from the synthetic image. (c) Canny output from the DSM. (d) Houghlines extracted from the DSM.

Graphic Jump LocationF4 :

(a) Canny output from the synthetic visual image. (b) Lines extracted from the synthetic visual image. (c) Canny output from the real-world scene. (d) Houghlines extracted from the real-world scene.

Graphic Jump LocationF5 :

Illustration of extrinsic camera parameters.

Graphic Jump LocationF7 :

Results of registering DSM with visual image: Blue lines are projected from DSM and green lines are projected from visual image.

Graphic Jump LocationF8 :

(a) Second aerial image. (b) Edge extraction. (c) Final registration.

Graphic Jump LocationF6 :

Results of the proposed automated registration approach on the synthetic model.

Tables

Table Grahic Jump Location
Table 1Advantages of LiDAR and aerial images.
Table Grahic Jump Location
Table 2Results of different runs for robustness check.
Table Grahic Jump Location
Table 3Error and computational times.

References

Frere  D. et al., “Automatic modeling and 3D reconstruction of urban buildings from aerial imagery,” in  IEEE Int. Geoscience and Remote Sensing Symp. Proc. , pp. 2593 –2596 (1998).
Schenk  T., Satho  B. C., “Fusion of LiDAR data and aerial imagery for a more complete surface description,” in  ISPRS  Vol. 34, pp. 310 –317 (2002).
Koch  R., Pollefeys  M., Gool  L. V., “Realistic surface reconstruction of 3D scenes from uncalibrated image sequences,” J. Visual. Comput. Animation. 11, (3 ), 115 –127 (2000).
Stamos  I., Allen  P. K., “3-D model construction using range and image data,” in  CVPR , pp. 531 –536,  IEEE  (2000).
Antone  M., Teller  S., “Automatic recovery of relative camera rotations for urban scenes,” in  CVPR , pp. 282 –289,  IEEE  (2000).
Hahnel  D., Burgard  W., Thrun  S., “Learning compact 3 D models of indoor and outdoor environments with a mobile robot,” in  4th European Workshop on Advanced Mobile Robots ,  Sweden , Vol. 44, pp. 15 –27,  Elsevier  (2001).
Zhao  H., Shibasaki  R., “A system for reconstructing urban 3-D objects using ground- based range and CCD images,” in  Proc. of IAPR Workshop on Machine Vision Applications ,  Japan  (1998).
Fruh  C., Zakhor  A., “An automated method for large-scale, ground-based city model acquisition,” Int. J. Comput. Vis.. 60, , 5 –24 (2004). 0920-5691 CrossRef
Fruh  C., Zakhor  A., “Constructing 3-D city models by merging ground based and air-borned views,” in  CVPR , pp. 562 –569,  IEEE  (2003).
Ding  M., Zakhor  A., “Automatic registration of aerial imagery with untextured 3D LiDAR models,” in  Proc. of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , pp. 1 –8 (2008).
Fei  D., Huminjie  B., Haiyan  C. G., “Automatic registration between LIDAR, and digital images,” in  ISPRS , pp. 487 –490 (2008).
Lee  D. H., Lee  K. M., Lee  S. U., “Fusion of Lidar and imagery for reliable building extraction,” Am. Soc. Photogramm. Rem. Sens.. 74, (2 ), 215 –225 (2008).
Chen  L. C. et al., “Shaping polyhedral buildings by the fusion of vector maps and Lidar point clouds,” Am. Soc. Photogramm. Rem. Sens.. 74, (9 ), 1147 –1157 (2008)
The Math Works, “MATLAB program,” http://www.mathworks.com/ (2010).
Kovesi  P., “MATLAB functions for computer vision and image processing,” http://www.csse.uwa.edu.au/~pk/research/matlabfns/ (2011).
Hartley  R., Zisserman  A., Multiple View Geometry in Computer Vision. , 2nd ed.,  Cambridge University Press  (2004)
Atlas: The Louisiana Statewide GIS. LSU CADGIS Research Laboratory, Baton Rouge, LA, http://atlas.lsu.edu (2009).

Some tools below are only available to our subscribers or users with an online account.

Related Content

Customize your page view by dragging & repositioning the boxes below.

Related Book Chapters

Topic Collections

Advertisement
  • Don't have an account?
  • Subscribe to the SPIE Digital Library
  • Create a FREE account to sign up for Digital Library content alerts and gain access to institutional subscriptions remotely.
Access This Article
Sign in or Create a personal account to Buy this article ($20 for members, $25 for non-members).
Access This Proceeding
Sign in or Create a personal account to Buy this article ($15 for members, $18 for non-members).
Access This Chapter

Access to SPIE eBooks is limited to subscribing institutions and is not available as part of a personal subscription. Print or electronic versions of individual SPIE books may be purchased via SPIE.org.