Review Papers

Acquisition of omnidirectional stereoscopic images and videos of dynamic scenes: a review

[+] Author Affiliations
Luis E. Gurrieri

University of Ottawa, School of Electrical Engineering and Computer Science, Ottawa, Ontario K1N 6N5, Canada

Eric Dubois

University of Ottawa, School of Electrical Engineering and Computer Science, Ottawa, Ontario K1N 6N5, Canada

J. Electron. Imaging. 22(3), 030902 (Jul 08, 2013). doi:10.1117/1.JEI.22.3.030902
History: Received February 13, 2013; Accepted May 24, 2013
Text Size: A A A

Open Access Open Access

Abstract.  Different camera configurations to capture panoramic images and videos are commercially available today. However, capturing omnistereoscopic snapshots and videos of dynamic scenes is still an open problem. Several methods to produce stereoscopic panoramas have been proposed in the last decade, some of which were conceived in the realm of robot navigation and three-dimensional (3-D) structure acquisition. Even though some of these methods can estimate omnidirectional depth in real time, they were not conceived to render panoramic images for binocular human viewing. Alternatively, sequential acquisition methods, such as rotating image sensors, can produce remarkable stereoscopic panoramas, but they are unable to capture real-time events. Hence, there is a need for a panoramic camera to enable the consistent and correct stereoscopic rendering of the scene in every direction. Potential uses for a stereo panoramic camera with such characteristics are free-viewpoint 3-D TV and image-based stereoscopic telepresence, among others. A comparative study of the different cameras and methods to create stereoscopic panoramas of a scene, highlighting those that can be used for the real-time acquisition of imagery and video, is presented.

In recent years, the availability of single-snapshot panoramic cameras has enabled a variety of immersive applications. The improved realism attained by using real-world omnidirectional pictures instead of synthetic three-dimensional (3-D) models is evident. However, a camera capable of capturing stereoscopic panoramas of dynamic scenes in a single snapshot is a problem still open for contributions since most of the omnistereoscopic acquisition strategies are constrained to static scenes. Special distinction has to be made between dynamic and static scenes. Most practical scenarios are intrinsically dynamic, and hence a practical omnistereoscopic camera should be able to provide the means to render (in real time or off-line) two views of the scene with horizontal parallax, in any arbitrary gazing direction with respect to the capture viewpoint. These two views must correspond to the views from the left and right eyes of a human viewer since they should be able to stimulate the mechanism of human binocular vision, reproducing a credible and consistent perception of depth. A few cameras can capture omnistereoscopic visual information in a single snapshot, but some of these cameras are unsuitable to produce omnistereoscopic views suitable for human binocular vision, while the capabilities of other potentially suitable cameras have not been formally demonstrated.

In order to satisfy the constraints of the problem as defined, we need a panoramic camera capable of acquiring all of the scene’s necessary visual information to reconstruct stereoscopic views in arbitrary directions. The camera must sample the necessary visual information omnidirectionally from a chosen reference viewpoint in space, and it has to do this in a single snapshot to account for scene dynamics. Consequently, sequential acquisition strategies to create stereoscopic panoramas are inadequate for this problem. Despite this, some sequential strategies have inspired multicamera configurations that may be suitable for the task. However promising, the capabilities of these multicamera techniques have not been properly justified by theoretical models of omnistereoscopic image formation. Furthermore, there is a need for a model to represent the binocular and omnidirectional viewing of the scene. Finally, a formal analysis to evaluate the performance of an omnistereoscopic camera taking into account a model for the human binocular vision is still waiting.

In 2001, Zhu1 presented an extensive classification of the different technologies to create omnidirectional stereoscopic imagery. This was an excellent survey of omnistereoscopic methods up to its publication date, which presented a taxonomical classification of camera configurations and methods, comparing their capabilities to produce viewable stereoscopic imagery in any azimuthal gazing direction around the viewer. However, the real-time acquisition of dynamic scenes, which is relevant for today’s multimedia applications, was not taken into account in that work.

In this paper, we review and classify different panoramic cameras and acquisition strategies available to date to produce realistic stereoscopic renditions of real-world scenes in arbitrary gazing directions.

Panoramic images can be represented in any of the omnidirectional image formats, e.g., cylindrical, cubic, spherical, etc. In some cases, the representation can be truly omnidirectional; in other words, the visual information acquired by the camera is projected on a 3-D surface covering 360 deg in azimuth and 180 deg in elevation. These panoramic representations are spherical, or projections of the scene on topological equivalents of a sphere, e.g., cubic or dodecahedral projections to name some. These complete omnidirectional representations are common for monoscopic panoramas where the scene is acquired from a single viewpoint or, at least, an approximation to a single viewpoint, where images are acquired from close but different viewpoints.

In the case of stereoscopic panoramas, the scene is generally acquired from two distinct viewpoints with horizontal parallax, for every possible gazing direction in azimuth and for a limited range of gazing directions in elevation. The latter viewing model has a correspondence with the human binocular visual system, where eyes are horizontally displaced from each other, and they are located on a plane parallel to the reference floor. This idea is illustrated in Fig. 1. In this binocular viewing model, the scene is acquired by rotating the head in azimuth θ around a viewing point r and gazing up and down in a limited range of elevation angles (ϕmin<ϕ<ϕmax), always maintaining the geometric constraints of the model. This model can be represented in a cylindrical panoramic format, or a surface equivalent to a cylinder, where the elevation angles are limited to a certain range. Note that when trying to apply the binocular model to a full spherical representation, there are intrinsic difficulties in acquiring and rendering stereoscopic views with horizontal parallax for elevation angles close to the poles. For this reason, the methods for omnistereoscopic image acquisition are mainly restricted to cylindrical topologies.

Graphic Jump LocationF1 :

Binocular viewing model for omnistereoscopic image acquisition where the region of zero parallax (eyes convergence points) is limited to a spherical section, topologically equivalent to a cylinder.

In this review, we focus on methods to acquire stereoscopic panoramas of dynamic scenes to be represented in a cylindrical format. These omnistereoscopic images can be cubic, cylinders or spherical sections, which can be projected inside a cave automatic virtual environment (CAVE) or in a dome-shaped display to create immersive shared experiences. Alternatively, the omnistereoscopic methods reviewed can provide wide-angle stereoscopic renditions of the scene in desired viewing directions, which can be created using the information acquired by the different cameras. The latter application can be seen in head-mounted devices for visualization of stereoscopic virtual environments.

Acquisition Models for Monoscopic Panoramas

There are two main models for the acquisition of monoscopic panoramas: a singular viewpoint (SVP) model and a nonsingular viewpoint (non-SVP) model, also known as polycentric panoramic model. Any camera or acquisition technique available to produce monoscopic omnidirectional imagery can be classified into one of these two models.

In the SVP acquisition model, for any gazing direction in azimuth (camera’s panning direction), there is a unique projection center that marks a single convergence point for all incident light rays. This model groups the catadioptric cameras used to acquire the whole scene using, for example, a single photosensitive sensor array and a curved mirror. Panoramas created by a rotating camera around its nodal point, or its projection center assuming a pinhole camera model, also satisfy the SVP model. These panoramas are created by acquiring planar images to be mosaicked or by scanning the scene column-wise, i.e., using line-sensor cameras and turning platforms. Examples of SVP acquisition are illustrated in Fig. 2(a) and 2(b).

Graphic Jump LocationF2 :

Examples of omnidirectional image acquisition: (a) catadioptric cameras based on parabolic or hyperbolic mirrors produce SVP panoramas,2 (b) rotating a camera about its nodal point to acquire multiple perspective projections with a common projection center also produces SVP panoramas, while (c) rotating an off-centered camera to acquire image patches, around a point different than its nodal point, produces non-SVP panoramas, as well as (d) multi-sensor cameras, such as the Ladybug2 panoramic camera,3 which also produce non-SVP panoramas.

In the case of a non-SVP model, the panoramic image is rendered using a centrally symmetric set of projection centers which are not spatially collocated. Cameras based on the non-SVP paradigm are more common than those based on an SVP model because the physical dimension of multiple camera configurations prevents sampling the scene from a single viewpoint. A way around this problem is using planar mirrors to reposition the projection centers closer to each other, approximating an SVP configuration. In the context of the problem studied in this paper, stereoscopic panoramas are by definition non-SVP panoramas since the scene is imaged from two distinct viewpoints (left- and right-eye viewpoints) for any possible gazing direction. Examples of non-SVP cameras are shown in Fig. 2(c) and 2(d).

Omnistereoscopic Acquisition Models

The different strategies to acquire the necessary visual information to produce stereoscopic panoramas (in a cylindrical format) can be summarized into a limited number of acquisition models. We propose to reduce the classification to four models constrained to acquire stereoscopic panoramic imagery for human viewing. Hence, these models are conceived to represent the acquisition of two images of the same scene from two distinct viewpoints with horizontal parallax. Each of these models represents the stereoscopic acquisition of image pairs for multiple gazing directions in azimuth, and for a limited field of view (FOV) in elevation. All the cameras and acquisition techniques reviewed in this paper can be modeled by one of these four models.

The proposed models are suitable to describe the sequential acquisition of visual information toward the rendering of stereoscopic panoramas. A few of the proposed acquisition models are limited to sequential sampling since inherent self-occlusion problems prevent them from being implemented using multiple sensors. But some of the proposed cases can also model the simultaneous scene acquisition, i.e., using multiple sensor configurations or other omnidirectional camera systems. The simultaneous acquisition case is of particular interest in the context of the problem studied in this paper.

The first stereoscopic acquisition model is the central stereoscopic rig, which is illustrated in Fig. 3(a). In this case, two coplanar cameras separated by a baseline b determine a viewing circle concentric with the geometric center of the camera arrangement. The viewing circle is the virtual circle determined by the trajectory of both cameras while panning all azimuthal angles (0deg<θ360deg) around O. This model for omnistereoscopic rendering has been widely used in the literature over the last decade to represent a stereoscopic rig panning the scene in azimuth.4 This model is suitable to represent the sequential acquisition of partially overlapped stereoscopic image pairs with respect to a common center.5 In all of the four acquisition models, a single- or dual-camera system samples the scene for different θ on a plane XZ parallel to the reference floor XZ as illustrated in the binocular viewing model of Fig. 1. Finally, this model can also represent a widely used technique based on extracting two columns, corresponding to the left- and right-eye perspective, from the sequence of planar images acquired using single camera rotated off-center.6,7 However, due to self-occlusion between cameras, this model cannot be applied in a parallel acquisition configuration.

Graphic Jump LocationF3 :

Omnistereoscopic acquisition models using multiple-camera configurations: (a) central stereoscopic rig, (b) lateral stereoscopic rig, (c) lateral-radial stereoscopic rig, and (d) off-centered stereoscopic rig.

The lateral stereoscopic rig model is shown in Fig. 3(b). This model represents a viewing circle centered at the projection center O of one of the two cameras. The off-centered camera (stereoscopic counterpart) describes a circle of radius equal to the stereo baseline rc=b while rotating around the center O. In this model, one camera is used to produce an SVP panorama centered at O (nodal point of the central camera), while the second camera is used to estimate the scene’s depth by acquiring stereoscopic counterparts of the images acquired by the central camera.811 This method enables horizontal disparity estimation and the extraction of occlusion information to be used in the rendering. For similar reasons as in the previous acquisition model, this acquisition model cannot be used in a parallel acquisition scheme due to self-occlusion between cameras.

The lateral-radial stereoscopic rig model, which is shown in Fig. 3(c), can be derived from the lateral stereoscopic rig model presented above, by adding a radial distance rc between the symmetry center O and the nodal point of one of the cameras (central camera in the previous model). This is a more general model where the nodal points in a multiple-sensor arrangement cannot be concentric due to the physical dimension of each camera.12 The lateral-radial stereoscopic rig model can also represent a stereoscopic rig rotated off-center, where one camera is radially aligned with the center O, while the second camera is horizontally displaced b to capture another snapshot with horizontal parallax. This model can represent a parallel acquisition scheme, i.e., a multiple-sensor arrangement.

The off-centered stereoscopic rig models a stereoscopic rig located at a radial distance rc from the geometrical center O as depicted in Fig. 3(d). This model is suited for camera configurations where multiple cameras, usually a large number of them, are radially located with respect to a center O. These cameras, when taken in pairs, define a series of radially distributed stereoscopic rigs. The partially overlapping FOV between even (or odd) cameras can be used to mosaic vertical slices of the scene, rendering a stereoscopic pair of panoramas.13 Multicamera configurations have been proposed14 using N (N5) radially distributed stereoscopic rigs. These configurations are based on acquiring a number of partially overlapped stereoscopic images of the scene. This model of acquisition can also represent a parallel acquisition of multiple stereoscopic snapshots of the scene.

Comparing Different Camera Configurations

Several omnistereoscopic acquisition and rendering techniques have been proposed over the last decade. Most of them are not suitable for acquiring dynamic scenes omnistereoscopically, but some configurations satisfy this constraint. Unfortunately, the pros and cons of the panoramic cameras suitable for the task are still open to research. In order to understand the limitations of the different camera configurations, we simulated some basic characteristics of the four configurations presented in the previous section.

One fundamental aspect to consider is the continuity of the horizontal disparity between partially overlapped stereoscopic snapshots. This is particularly important when the rendering is based on mosaicking. In Fig. 4, we compared the relative variation in the minimum distance to maintain continuity in the horizontal disparity between mosaics. The idea is to find the minimum distance to the scene to have subpixel variations in the horizontal disparity between adjacent image samples. Our simulations were based on an APS-C sensor size (22mm×14.8mm) of 10.1 megapixels. The case presented here, which is shown as an example only, corresponds to one particular combination of baseline b=65mm and lenses’ FOV 45° for the four camera models. The simulation result shows a reduction in the minimum distance for stereoscopic rendering achievable for the all the acquisition models, compared against Model 1, as a function of the blending position in each image. In this particular example, for the camera Models 2 and 3, the relative minimum distance increased more than in other camera models when the blending threshold is above 12% of the image width Wh, measured from the edge of each image to mosaic.

Graphic Jump LocationF4 :

The acquisition models are contrasted against the central stereoscopic rig (model 1) showing the relative variation of the minimum distance to the scene to achieve horizontal disparity continuity among neighbor images: the compared configurations are lateral stereoscopic rig (model 2), the lateral-radial stereoscopic rig for rc=b (model 3), and the off-centered stereoscopic rig for rc=b (model 4).

Also important in the camera design is the minimum distance to the scene to have an object imaged by adjacent stereoscopic sensor pairs; in other words, how far the stereoscopic FOV is located with respect to the panoramic camera. This distance depends on the multiple cameras’ geometric configuration, the FOV of each camera, and the stereoscopic baseline. In another example, we contrasted the minimum distance for stereoscopic rendering using the same baseline and changing the lenses’ FOV only. The results presented in Fig. 5 correspond to the camera Model 1 for a fixed baseline length B=35mm and three FOV cases. The simulation results show that the minimum acceptable distance to maintain stereoscopic continuity between mosaics as a function of the blending position in each image. As in the previous case, the blending position is expressed as a horizontal length, measured from the edge of the image as a percentage of the image width.

Graphic Jump LocationF5 :

Minimum distance to the scene for the central stereoscopic rig (Model 1) for b=35mm and different lenses’ FOV.

Problem

The main problem of the omnistereoscopic image acquisition for human viewing is how to sample the complete visual field at once, from two viewpoints with horizontal parallax. Furthermore, if using multiple cameras to do this, how to avoid the self-occlusion between cameras and how to minimize the problems introduced by sampling the scene from close but different viewpoints.

The self-occlusion problem is common to all the conceptualizations of panoramic photography, which must be considered when a single or multiple cameras are used to sample the scene omnidirectionally. If the image sampling is sequential, self-occlusion can be avoided. However, the acquisition of dynamic scenes exacerbates the restrictions since all the information to produce omnistereoscopic images has to be acquired at once. The parallax arising from sampling the scene from different viewpoints is another problem common in panoramic photography. The problem gets more complicated when the simultaneous acquisition of stereoscopic images from different viewpoints enters into the equation.

One possible solution to the problem is to acquire multiple stereoscopic snapshots of the scene simultaneously. In this case, the geometric configuration of the multisensor device must be carefully designed to avoid self-occlusion that occurs when one camera lies in the FOV of another. Alternatively, another possible solution is using diffractive optics to obtain two views of the scene with horizontal parallax, and doing so omnidirectionally. In this case, the image formation for this type of diffractive lens has to be modeled and the capabilities of such a camera have to be assessed.

A camera under these constraints should be able to acquire an omnidirectional binocular snapshot of the whole scene. The information captured by this camera should be sufficient to render two non-SVP panoramic views corresponding to the left and right eyes or, more generally, for stereoscopic renditions of the scene in any arbitrary gazing direction.

Panoramic Acquisition: Cameras and Methods

The omnistereoscopic technologies reviewed in this paper were classified into four families based on their image acquisition strategies and/or their constructive characteristics.

  • Omnistereo based on catadioptric-based cameras
  • Sequential techniques to produce stereoscopic panoramas
  • Omnistereo based on panoramic snapshots
  • Omnistereo based on multiple cameras

This classification in families is independent of the four models of omnistereoscopic image acquisition presented in Sec. 2.2. Nevertheless, each omnistereoscopic technology representative of these four omnistereoscopic families can be modeled using one of the four acquisition models introduced above. The catadioptric cameras based on vertical parallax are the only exception to this rule, since all the presented acquisition models are based on horizontal stereo.

In the following section, the pros and cons of each family are studied individually, distinguishing those cameras whose characteristics can be adapted to an omnistereoscopic configuration suitable for acquiring dynamic scenes.

A catadioptric panoramic camera captures a complete 360 deg in azimuth by combining the use of mirrors (catoptric system) to reflect the light arriving from every direction toward a lens system (dioptric system), which focuses the light over a planar photosensitive sensor. In the case of a parabolic profile mirror, light rays emanating from the mirror’s focal point are reflected outward as parallel rays. Conversely, by applying the principle of reversibility of the optical path, ray paths intersecting the mirror’s inner focal point are reflected parallel to the mirror’s symmetry axis. A dioptric system coaxially located at a certain distance from the mirror surface focuses the light over a planar photosensitive sensor. The principle is analogous to a parabolic dish antenna, which collects electromagnetic radiation at its focal point by concentrating incident wavefronts arriving from a source relatively far from the antenna.

One of the first panoramic cameras exploiting this idea is attributed to Rees,15 who proposed, in 1967, to combine a hyperbolic mirror and a TV camera to provide an omnidirectional view of the scene to the operator of a military-armored vehicle.

In Fig. 6, a simplified model of the catadioptric principle is illustrated, showing how a ray of light emanating from the scene point p is reflected by the catoptric system and focused by the dioptric system onto the point p on the image plane. The acquired image is an orthographic projection of the scene, which can then be projected onto a canonical representation used in panoramic imaging, i.e., cylindrical, cubical, spherical, among others, or to extract a partial FOV of the scene in any direction in azimuth.

Graphic Jump LocationF6 :

SVP catadioptric camera principle using parabolic mirrors.

In a real-world scenario, a parabolic mirror profile reflects light in a quasiparallel fashion, affecting the quality of the orthographic projection. In the case of using a hyperbolic mirror profile, light rays directed toward a focal point (located inside the convex mirror’s surface) are reflected toward the other focal point of the hyperbola, where the dioptric system is located.

Panoramic cameras based on the catadioptric configuration, where light is focused on a single projection point, correspond to the SVP model. Following the SVP principle, a full spherical panorama can be approximated by using two coaxial catadioptric cameras back-to-back and mosaicking the semi-hemispherical images originating from each camera. This idea was proposed by Nayar16 in 1997. At about the same time, Baker and Nayar proposed a model for catadioptric image formation,17 from which they concluded that only parabolic and hyperbolic mirror profiles satisfy the SVP criteria.

Other configurations of catadioptric cameras based on different mirror profiles, i.e., semi-spherical or multifaceted pyramidal mirrors, exhibit multiple focal points, which makes them require multiple cameras. An example of these non-SVP cameras is illustrated in Fig. 7, where the catoptric system uses planar mirrors. These configurations are being used in commercial panoramic cameras to produce monoscopic panoramas.18,19

Graphic Jump LocationF7 :

Catadioptric camera using planar mirrors instead of hyperbolic or parabolic profile mirrors.

The camera configurations described so far can only produce monoscopic panoramas when used as a single-snapshot camera. However, catadioptric cameras can be used to produce omnistereoscopic images when used in clusters.2022 The case of omnistereoscopic images based on a number of monoscopic panoramas is studied in Sec. 5. Along with the development of monoscopic catadioptric cameras, there has been a parallel development of catadioptric omnistereoscopic cameras. The family of the catadioptric cameras for omnistereoscopic imagery is described next.

Catadioptric Omnistereoscopic Cameras

The development of omnistereoscopic catadioptric cameras has paralleled the development of general (monoscopic) catadioptric sensors. It should be mentioned that omnistereoscopic catadioptric cameras were originally intended for the real-time estimation of depth maps. In other words, these omnistereoscopic approaches were not intended to produce omnistereoscopic imagery for human viewing, but they were motivated by applications such as robot navigation and 3-D scene reconstruction. One important remark is necessary here: the omnistereoscopic cameras based on a catadioptric configuration with vertical parallax presented in this section are not modeled by any of the acquisition models presented in Sec. 2.2 since the omnistereoscopic acquisition classification has been constrained to human-viewable omnistereoscopic imagery, i.e., binocular stereo with horizontal parallax.

One of the earlier examples of this technology was an SVP catadioptric camera proposed by Southwell et al.23 in 1996. This omnistereoscopic catadioptric camera is based on a coaxial, dual-lobe parabolic mirror, and its main application was to generate omnidirectional depth maps of the terrain. A depiction of this camera appears in Fig. 8(a).

Graphic Jump LocationF8 :

Omnistereoscopic catadioptric examples: (a) the camera proposed by Southwell et al.23 in 1996 using dual-lobe mirror and (b) an early SVP catadioptric omnistereoscopic camera proposed by Gluckman24 in 1998: this configuration uses two coaxial catadioptric panoramic cameras with a large vertical baseline to acquire two panoramic views of the scene with vertical parallax b; the 3-D scene structure is estimated from the vertical disparity arising between matched feature points in each panorama.

Another configuration was proposed by Gluckman et al.24 in 1998. Their configuration is based on using two coaxial catadioptric cameras whose vertical baseline helps to acquire an omnistereoscopic pair of images. This camera, illustrated in Fig. 8(b), enables the estimation of the 3-D location of a scene point P in space, matching pairs of feature points (p,p) in the panoramic views from each camera. The larger the vertical baseline b used, the better the accuracy of depth estimation achieved. A theoretical model of the image formation in a dual-mirror, axially symmetrical, catadioptric sensor was proposed by Stürzl et al.25

The coaxial catadioptric camera has been a popular method to estimate omnidirectional depth over the past two decades due to its simplicity and hardware economy. Unfortunately, it is not suitable to produce a satisfactory omnistereoscopic rendition of the scene capable to stimulate human stereopsis. The binocular visual process is based on fusing two views of the scene obtained from horizontally displaced viewpoints (horizontal parallax). This camera provides two views of the scene in every direction, but based on vertical parallax. Although the omnidirectional depth can be estimated using this information, it cannot be used to produce satisfactory horizontal stereoscopic views. For instance, when using a two-dimensional (2-D) to 3-D conversion, the gaps in image coming from occluded areas need to be filled, e.g., using texture synthesis.26 One of the problems of the vertically coaxial method is that the visual information regarding occluded areas is not acquired.

A similar omnistereoscopic camera based on the catadioptric principle was proposed by Kawanishi et al.27 in 1998. Their camera consists of two non-SVP catadioptric cameras in vertical coaxial configuration as shown in Fig. 9(a). Each catadioptric camera consists of six planar mirrors, each of which reflects a partial view of the scene over a video camera. This configuration produces 12 video streams covering 360 deg in azimuth. Each camera in the top of the arrangement is paired with the camera located directly below, i.e., the cameras n and n form a stereoscopic camera pair, whose vertical baseline is b, as illustrated in Fig. 9(b). Similar to Gluckman’s camera,24 the vertical parallax b between camera pairs enables the panoramic estimation of the scene’s depth but does not provide the means to render viewable stereoscopic images. In a follow-up of this design, Shimamura et al.28 built a working prototype based on the Kawanishi et al. design capable of producing panoramic depth maps.

Graphic Jump LocationF9 :

Omnistereoscopic camera configuration based on coaxial planar mirrors: (a) configuration based on Kawanishi et al.’s idea27 and (b) virtual location of each camera’s projection center and vertical baseline b.

Spacek29 relaxed the non-SVP condition using two conic mirrors, instead of pyramidal mirrors, coaxially aligned with cameras. This configuration was conceived to estimate distances based on vertical disparities. The author reported benefits over other profiles in using conical mirrors in terms of the uniformity of the resolution density. However, this type of profile introduces out-of-focus blurring in some regions of the orthographic image because the optical focus is not uniformly located as in the case of hyperbolic and parabolic mirrors.

A recent interesting development is due to researchers at the Fraunhofer Heinrich Hertz Institute,30 who are currently working on a prototype of an omnistereoscopic high-definition television (HDTV) camera based on a catadioptric design. Conceived for omnistereoscopic 3-D TV broadcasting, this setup uses six stereoscopic camera rigs, each of which is associated with a planar mirror. Each mirror reflects a partial view of the scene on a camera pair, for a total of 12 HDTV cameras.31,32 These video streams can be mosaicked into a full omnistereoscopic video, or into free-panning 3-D TV signal. The concept of this camera is presented in Fig. 10. The creators of this camera have reported difficulties when mosaicking partially overlapping stereoscopic frames due to the large parallax between adjacent projection centers.33 Part of the problem resides, as in other star-like configurations, in the excessive parallax introduced by the stereoscopic rigs where both cameras are laterally displaced with respect to each other. The minimum distance to objects in the scene for correct rendering is affected by the large intercamera parallax, reducing the stereoscopic usability in foreground regions of the scene. This camera configuration can be represented by the off-centered stereoscopic rig acquisition model [Fig. 3(d)], where six stereoscopic camera rigs, equally distributed at a distance rc from the geometric center O, simultaneously capture six partially overlapped, stereoscopic video signals of the scene.

Graphic Jump LocationF10 :

Omnistereoscopic video camera developed at the Fraunhofer Heinrich-Hertz Institute: (a) each planar mirror face is associated with a stereoscopic pair of cameras, (b) locations the cameras as seen reflected on the planar mirrors.

Peleg et al.34 proposed a different catadioptric camera configuration capable of acquiring horizontal binocular stereo omnidirectionally and in real time. This catadioptric system uses a complex spiral lenticular lens and an optical prism to acquire a complete omnistereoscopic image in real time. This configuration deflects incoming light rays as if the scene were acquired simultaneously from multiple perspective points located on a virtual viewing circle (Sec. 2.2). This camera can be modeled by the central stereoscopic rig acquisition model [Fig. 3(a)], where a large number of stereoscopic image vertical stripes (central columns of left and right images) are simultaneously sampled and mosaicked to create complete left and right cylindrical panoramas in real time. The proposed lenticular arrangement is shown in Fig. 11(a), which illustrates the acquisition of a one eye (left or right) panoramic view. The idea of using a Fresnel-like diffractive system with thousands of lenses to capture both omnistereoscopic views simultaneously could, in theory, produce an omnistereoscopic video in real time. The lenticular arrangement can be built around an SVP panoramic camera as shown in Fig. 11(b). The authors have proposed using a beam splitter and the described lenticular system to acquire both viewpoints of a stereo projection simultaneously as illustrated in Fig. 11(c).

Graphic Jump LocationF11 :

Peleg et al.’s34 proposal for a real-time omnistereoscopic camera based on a catadioptric principle: (a) a Fresnel-like lenticular lens arrangement diffracts the light over a viewing circle, (b) a catadioptric scheme with a cylindrical diffractive material composed of vertical stripes of the proposed Fresnel lens to capture one (left- or right-view) panorama, and (c) using an optical beam splitter, e.g., a prism, and combining diffraction lenses for left and right view in the same cylindrical surface, both (left- and right-eye) views can be captured simultaneously.

This camera could be a solution to the problem of omnistereoscopic image acquisition of dynamic scenes. Furthermore, Peleg, Ben-Ezra, and Pritch were granted a patent35 for this camera in 2004, but no prototype has yet been built or licensed to the best of our knowledge. This commercialization lag must not be taken as a proof of the inadequacy of the idea; e.g., more than 70,000 patents were granted annually in the United States by the turn of the century and only a very few of them were commercially developed.36 As a matter of fact, a variation of the lenticular lens, although not an omnistereoscopic application, has been licensed for the production of 3-D billboards.37 Peleg et al. proposed a geometrical model of their elliptic lens;38 however, there still are aspects of the image formation for this camera that have not been extensively studied. More importantly, the capabilities of such an optical approach to produce high-quality omnistereoscopic imagery has not been demonstrated.

It is important to remark that a cylindrical stereoscopic panorama produces a correct binocular experience only in the center of each rectangular view extracted from these left and right panoramas. The peripheral image region, outside the image center, produces a distorted binocular perception. This effect was mentioned by Bourke39,40 while addressing the problem of semi-spherical omnistereo to display in dome surfaces. Although this is a noticeable effect, the user tends to focus on a region of interest (ROI) at the center of the image, where the binocular perception is correct, reducing the likelihood of uncomfortable effects that could lead to eye strain. Furthermore, if a cylindrical omnistereoscopic image created with this acquisition method was projected in a cylindrical surface around the user (located at the center), a correct binocular depth would be experienced by looking in any direction around the user, as long as the zero parallax distance (eyes vergence) coincides with the distance to the cylindrical screen.41

Other configurations based on catadioptric cameras that appeared in the last decade mainly focused on the problem of omnidirectional stereo reconstruction42,43 following the idea of coaxial catadioptric stereo, which makes them inadequate to produce omnistereoscopic imagery suitable for binocular human viewing.

Pros and Cons of the Catadioptric Omnistereoscopic Cameras

The most evident advantage of an SVP catadioptric camera is its simplicity: a single camera and dioptric system can sample the scene’s visual field, in addition to estimating the panoramic depth, in a single snapshot. The SVP approach avoids stitching problems and imperfections that arise by parallax between multiple projection centers in non-SVP cameras. However, focusing the light uniformly on a planar image sensor after reflecting light in a nonplanar mirror is problematic. Not all the rays are parallel after reflection in a parabolic surface, nor are they perfectly reflected toward the convex focal point in a hyperbolic mirror. In practice, image blurring as a function of the radial distance from the center of the image17 is difficult to avoid. This problem can be reduced, although not completely, by a careful design of the catoptric system (mirror) and its dioptric (focusing) counterpart. The high-resolution CCD sensors available nowadays can help to reduce problems while resampling the acquired orthographic projection of the scene into a canonical panoramic surface. Furthermore, a catadioptric camera has advantages with respect to other mirror-less systems in terms of reducing chromatic aberrations present in most aspheric lenses, i.e., fisheye lenses. Offsetting the advantages of catadioptric cameras, there are still inherent problems in using this camera to render an omnistereoscopic scene for binocular viewing.

Although catadioptric omnistereoscopic configurations using vertically coaxial mirrors are undoubtedly an elegant method for acquiring omnidirectional depth maps in real time, the camera configurations are unable to handle occlusions in the scene. A binocularly correct stereoscopic view can be synthesized with one panoramic image plus a dense horizontal disparity information of the scene. The disparity information can be used to generate a pixel-wise horizontal displacement map. Applying this map to horizontally displace pixels (or regions) in the image can produce a correct illusion of depth. The necessary information to produce this omnistereoscopic image can be acquired using an omnistereoscopic catadioptric configuration: a correct panorama of the scene and an omnidirectional depth map. Unfortunately, the information to fill image gaps (occluded areas) cannot directly be obtained using a camera configuration based only on vertical parallax. But, there are still suboptimal solutions coming from the field of 2-D to 3-D conversion that can be applied. For instance, pixels can be copied from the adjacent regions of the missing areas to fill these gaps. A much better alternative will be to simultaneously acquire a different view, e.g., a horizontal stereoscopic pair for each gazing direction. The parallax view information can be used to fill-in the occluded areas using a texture inpainting technique.44

A good candidate to produce directly viewable omnistereoscopic imagery, capable of satisfing the real-time acquisition of dynamic events as well, is the optical diffractive approach proposed by Peleg et al.,35,45 as illustrated in Fig. 11. However, the design and implementation of such a lenticular arrangement is challenging. Similar to Peleg’s approach, there has been at least one other recent proposal, which is also based on lenticular arrays to multiplex two horizontal binocular views of the scene, but using non-SVP configurations,46 which will be presented in Sec. 6. Besides the lack of commercial interest for an optical-based approach, an alternative solution, based on off-the-shelf lenses and camera sensors, can provide a satisfactory solution to the problem with reduced hardware complexity.

Another good candidate is the panoramic 3-D TV camera developed at the Fraunhofer Heinrich-Hertz Institute. This omnistereoscopic camera, which is illustrated in Fig. 10, uses multiple off-the-shelf HDTV cameras and mirrors to produce real-time stereoscopic videos with broadcasting quality. The downfall of this camera is its size31 which makes difficult the mosaicking of individual video streams. One possible solution for the parallax problem would be reducing the size of the cameras, e.g., using custom-made HDTV cameras. Additional improvement may be achieved by using a different camera distribution to enable registering stereoscopic images of scene elements closer to the camera.

A literature review of omnistereoscopic methods and configurations cannot be complete without mentioning the family of sequential acquisition methods. It is necessary to point out that sequential methods are intrinsically inadequate to acquire dynamic scenes omnistereoscopically since they require the scene to be static for correct rendering. However, many multiple-camera configurations that will be presented in Sec. 6 can be directly traced back to parallelized (simultaneous acquisition) versions of sequential methods presented in this section. Therefore, these sequential techniques deserve a closer look.

The sequential acquisition of images has been widely used to produce high-quality panoramas. The idea is quite simple: using a single camera, it is possible to capture partially overlapped snapshots of the scene, or image columns, which can be mosaicked to produce panoramas. The simplest approach is a single camera or line sensor rotated around a center O, which preferably should be the camera’s nodal point, and taking snapshots of the scene during its trajectory. One example of this method to produce monoscopic panoramas was the Roundshot VR22047 film camera by the Swiss company Seitz Phototechnik AG, which currently offers a line of rotating heads for panoramic acquisition.48

Rotating a single camera around its nodal point can produce SVP monoscopic panoramas only; however, by rotating an off-centered camera, omnistereoscopic images can be created.7 In terms of sequential omnistereoscopic acquisition, the rotating stereoscopic rig models, which are depicted in Fig. 2(a) and 2(b), produce non-SVP panoramas, but they cannot be adapted for a simultaneous acquisition due to self-occlusion between cameras. The acquisition models presented in Fig. 2(c) and 2(d) can be used for sequential or simultaneous acquisition.

To produce omnistereoscopic imagery, one of the first sequential approaches has been to rotate a stereoscopic camera rig around the midpoint between nodal points. This corresponds to the central stereoscopic rig model [Fig. 3(a)]. A valid rendering strategy in this case is mosaicking vertical image stripes from the center of each image pair to create left- and right-eye panoramic views. The number of stereoscopic images to be sampled determines the incremental panning angle and the strip width. As was mentioned in Sec. 3.1, the stereoscopic panoramas are correct only at the center of each view, becoming distorted outside the central region. Note that each camera’s projection center (nodal point) defines a viewing circle of diameter b=2·rc during the scene panning. Alternatively, rotating the stereoscopic rig about one camera’s nodal point corresponds to the lateral stereoscopic rig acquisition model [Fig. 3(b)]. Unfortunately, the latter strategy cannot be implemented using multiple cameras for a simultaneous (parallel) acquisition.

The other sequential acquisition strategy corresponds to the off-centered stereoscopic rig acquisition model [Fig. 3(d)]. In this case, the scene is sampled by acquiring a sequence of stereoscopic snapshots, rotating the stereoscopic rig off-center with a radius rc. A modified version of this sequential method consists of radially aligning the nodal point of one of the cameras with the center O, which corresponds to the lateral-radial stereoscopic rig acquisition model [Fig. 3(c)]. Both sequential variants can be parallelized for simultaneous acquisition using multiple cameras.

The last sequential strategy to acquire omnistereoscopic imagery is rotating a single camera off-center, at a radial distance rc from the rotation center O. This strategy corresponds to the first acquisition model [Fig. 3(a)], where left and right views are acquired during the circular trajectory of the camera by back-projecting vertical image stripes. The image columns’ position relative to the image center can be located by tracing the rays connecting the camera projection center O with the points (Ol,Or) on the viewing circle. There have been attempts to parallelize this sequential acquisition strategy using multiple cameras in a circular radial configuration.

These sequential techniques, their variations, and other alternatives are individually detailed in the next section.

Omnistereo Based on Sequential Acquisition Models

Perhaps one of the most illustrious applications of omnistereoscopic sequential acquisition was integrated to the Mars Pathfinder rover4 in the late 1990s. The camera can be modeled by the central stereoscopic rig method illustrated in Fig. 3(a). It was designed to provide a variety of telemetric measurements beyond producing omnistereoscopic imagery; actually, producing stereoscopic images suitable for human stereopsis was not the primary goal of this camera. Two cameras were mounted on a retractable mast and were rotated around the midpoint between each camera’s nodal point. The pair of cameras, whose resolution was modest as is expected for an interplanetary probe of that era (512×256), were toed in 0.03 rad, defining a fixation point (zero parallax distance) at 2.5 m from the rotation center. The stereoscopic panoramas produced by this camera received much attention in the news.4951 Although these omnistereoscopic images are not impressive in terms of quality of the binocular stereo, they constitute an important precedent for the marketing value of realistic immersive imagery to promote a planetary mission to the layperson.52

Other authors reported variations on the rotating stereoscopic rig method that are worth mentioning. For instance, Romain et al.53 proposed a rotating platform with two degrees of freedom, which can rotate in azimuth and elevation to produce spherical omnistereoscopic images. More recently, Ainsworth et al.54 revisited the method of a rotating stereoscopic rig to minimize (not to eliminate as the authors stated) stitching problems. They reported their method to create stereoscopic panoramas55 based on mosaicking partially overlapped scene's images, which they tested using an off-the-shelf digital camera (Panasonic Lumix) and a commercial rotating platform (GigaPan EPIC Pro Robotic Controller).

Huang et al.8 proposed a rotating camera to acquire stereoscopic images of scenes, which were aligned and mosaicked to render omnistereoscopic images. Their idea, originally published in 1996, is illustrated in Fig. 12(a), which shows a central camera that is rotated around its nodal point and a second camera that is rotated off-axis, providing a parallax view of the scene. Their method corresponds to the lateral stereoscopic rig model [Fig. 3(b)], which produces non-SVP stereoscopic panoramas, where the stereo budget can be selected by choosing the baseline rc=b of the rotating stereo rig. An interesting idea is to acquire stereoscopic images of the scene using a rotating stereoscopic rig, where one camera (central) is in the optimum position to minimize stitching errors (nodal point), while the second camera captures a second view of the scene with horizontal parallax. The central camera produces an SVP panorama while the second camera can be used to render a binocular polycentric panorama. Because of camera self-occlusion, this idea is difficult to implement using multiple cameras, although a multicamera configuration using planar mirrors can approximate an SVP for a central camera and acquire a horizontal parallax view simultaneously. There have not been proposals based on this catadioptric design to our best knowledge.

Graphic Jump LocationF12 :

Omnistereoscopic methods based on sequential acquisition of partially overlapped images: (a) method proposed by Huang et al. in 1996 to generate a correct panorama (central camera) and accessory information to estimate a horizontally parallax view (lateral camera) and (b) the acquisition strategy proposed by Yamada et al., which is similar to Huang’s method, is based on acquiring images to produce an SVP panorama (central camera) and, in this case, estimating the panoramic depth map based on a large-baseline stereoscopic pair of images (left and right cameras).

Using a similar approach, Yamada et al.911 proposed the triple camera system shown in Fig. 12(b). Similar to Huang et al.’s method, the central camera produces an SVP panorama while the two satellite cameras help to estimate a panoramic disparity map using a large baseline. This sequential acquisition technique can be seen as a modification of the central stereoscopic rig [Fig. 3(a)], used only to estimate depth by exploiting the larger baseline (b=2·rc) between the satellite cameras, and adding a central camera to produce an SVP panorama as in the lateral stereoscopic rig model. Again, self-occlusion between cameras makes it difficult to parallelize Yamada’s approach in a simultaneous acquisition scheme. But a catadioptric scheme to acquire an SVP and a multisensor scheme to simultaneously acquire a number of partially overlapped stereoscopic snapshots of the scene is an interesting design challenge and a suitable approach for dynamic scene omnistereoscopic acquisition. No camera following the suggested approach has been proposed to our best knowledge.

A variation of the idea of rotating sensors was proposed by Chan and Clarke56 who devised a camera where a mirror is rotated while a single sensor sequentially captures binocular stereoscopic images of the scene. Their idea was inspired from endoscopic applications where a single probe has to be inserted in a biological cavity or in an archeological site. The principle behind the idea is similar to other proposals in terms of a sequential acquisition. For instance, carefully selecting the planar mirror and camera locations, the central stereoscopic rig acquisition scheme of Fig. 3(a) can be implemented with this technique.

Ishiguro et al.6 proposed in 1992 a method to create omnistereoscopic imagery based on a single rotating camera, but for robot navigation instead of human visualization. This method corresponds to the central stereoscopic rig acquisition model illustrated in Fig. 3(a). Peleg and Ben-Enzra7 rediscovered this method in the late 1990s, but tailored the idea with human visualization in mind. This method has become one of the most popular sequential techniques to create high-quality omnistereoscopic imagery given its hardware economy and simplicity. The principle is based on a single camera rotating around a point behind its projection center, as depicted in Fig. 13(a). In one complete rotation, the camera captures a number of images that are used to extract left (imL) and right (imR) columns. These two image columns correspond to the back-projection of the image’s vertical stripes defined by intersecting the rays passing through the camera projection center O and the points OL and OR. The ray tracing concept is depicted in Fig. 13(b). These columns are then mosaicked producing left- and right-eye panoramic views. The end result is equivalent to the central stereoscopic rig model [Fig. 3(a)], used to acquire left and right views column-wise. There have been proposals for omnistereoscopic cameras that can be directly traced to parallelization of a single rotating camera.57,58,59

Graphic Jump LocationF13 :

Rotating method to produce omnistereoscopic imagery based on a single rotating camera: (a) a single camera is rotated from an off-centered location and (b) two projections corresponding to left- and right-eye projections can be defined intersecting the rays passing through the camera’s projection center O and the points OL and OR defined over a virtual viewing circle.

The method has the advantage of defining a virtual baseline (b=2·rc) that can be varied according to the stereoscopic budget desired for the scene by changing the relative distance between left- and right-eye vertical stripes extracted from the successively acquired images.60 Changing the distance has the effect of changing the viewing circle diameter. More recently, Wang et al. proposed an interactive method to adjust the disparity in panoramas created using the single off-centered rotating camera61 based on the interactive selection of objects in the scene.

Examples of 3-D (not stereoscopic) images created using the central stereoscopic rig method for monoscopic panoramas and omnistereoscopic images can be visited online.62,63 Additionally, several patents have been granted to Peleg et al. for acquiring, disparity adjusting, and displaying stereoscopic panoramas using the single-camera method,6466 one of which has been licensed to HumanEyes,37 not to create omnistereoscopic imagery but to create multiview 3-D billboards.

A different approach was proposed by Hongfei et al.5 who used a single digital single-lens reflex camera to successively acquire 12 images of a static scene. First, three stereoscopic image pairs (six images in cd by placing the camera in the positions labeled iL and iR [for i=(1,2,3)] as indicated in Fig. 14(a). After that, the camera is rotated 180 deg and six new stereoscopic images are successively acquired following the pattern illustrated in Fig. 14(b). This acquisition scheme corresponds to the central stereoscopic rig model illustrated in Fig 3(a). Although Hongfei et al.’s method is not suitable for dynamic scenes, a parallelized version of the same idea was already proposed by Baker et al.14 as a patent application in 2003.

Graphic Jump LocationF14 :

Method based on rotating a single camera to capture six stereoscopic images by positioning the camera in 12 different locations and orientations: (a) first, the stereoscopic pairs [(1L,1R),(2L,2R),(3L,3R)] are acquired one-by-one rotating and positioning the camera to the corresponding locations. (b) Finally, the camera is rotated 180 deg around its nodal point and the pairs [(4L,4R),(5L,5R),(6L,6R)] are sequentially acquired.

Among the panoramic sequential methods, those based on using line-sensor cameras deserve a particular mention. Cylindrical omnistereoscopic imagery obtained by this method produces geometrically correct binocular views of the scene at the center of the image, while the depth perception is distorted in the periphery of the image.67 This viewing paradigm is valid for cylindrical panoramas projected in a cylindrical display, in both monoscopic and stereoscopic panoramas, in part because it does not have to deal with parallax ghosting while blending images. Sequential line scanning is based on mosaicking image columns a single pixel wide and therefore it produces high-quality stereoscopic panoramas. However, this virtue is offset by their lengthy acquisition time, which is common to all sequential-scanning methods. The line-scanning methods to acquire omnistereoscopic imagery can be modeled by the central stereoscopic rig acquisition method [Fig. 3(a)].

High-quality omnistereoscopic images, in cylindrical format, can be produced using line-scanning sensors. This sequential method has been studied over the first decade of the 2000s,68 thanks, in part, to the commercial availability of line-scanning cameras.69 As their name indicates, these cameras acquire the scene column-wise. An omnistereoscopic view of the scene can be acquired by rotating the line-scanning sensor off-center at radius rc to acquire independently left (IR) and right (IL) cylindrical panoramas column-by-column. The acquisition model corresponds to the central stereoscopic rig method illustrated in Fig. 3(a), where a line-scanning sensor is in the position of each of the cameras for one complete rotation, acquiring successively each eye viewpoint after two complete rotations. Several authors have contributed to the understanding and modeling of omnistereoscopic imagery using line sensors, covering the line-scanning camera calibration,68,70 a geometrical model for polycentric panoramas using this acquisition strategy,71 and the omnistereoscopic image acquisition.72,7375 Although the literature on line sensors is extensive and insightful, this approach cannot be adapted to acquire dynamic scenes.

Hybrid approaches have been proposed that use a laser range finder and rotating sensors to provide a high-resolution depth mapping of the scene. For instance, Jiang and Lu76 proposed a method that combines an off-center CCD camera with a laser range sensor, which together produced a monoscopic panorama and its dense depth map. Their approach is shown in Fig. 15. Once again, the problem addressed was the reconstruction of a 3-D scene and not the production of a binocular omnistereoscopic imagery. Besides its sequential conception, this idea cannot be used for an omnistereoscopic 2-D to 3-D conversion since occlusion information is not collected during the sequential acquisition. Similar limitations in handling occlusions arise in a recent proposal by Barnes,77 which combines ground-based LIDAR and monoscopic panoramas.

Graphic Jump LocationF15 :

Combined camera and laser range sensor in a rotating platform for sequential acquisition: an SVP panorama plus a panoramic depth map can be obtained for static scenes using this acquisition method.76

The rotating sensor for omnistereoscopic imagery has appeared with some variations over the last decade.67,78 A good summary of methods to create omnistereoscopic images based on rotating sensors was published in 2006 by Burke.79

Pros and Cons of the Sequential Acquisition

Sequential acquisition strategies are an interesting starting point to devise multicamera configurations for real-time omnistereoscopic acquisition. The configuration proposed by Huang and Hung8 is interesting since it presents a solution to reduce the parallax-induced errors while stitching multiple images. This method can be parallelized to simultaneously acquire a number of image patches and their corresponding stereoscopic counterparts in a non-SVP scheme. The sequential method of Peleg and Ben-Enzra using a single off-centered camera7 is interesting for reducing the hardware involved, but it is difficult to parallelize since it would require a prohibitively large number of cameras to prevent blending artifacts while mosaicking. In that case, the mosaicked image columns have to be constrained to a few pixels wide, which implies taking hundreds of snapshots. A multiple-camera configuration that attempts to take this number of simultaneous pictures is impractical, but, as will be shown in Sec. 6, some multiple-camera configurations that do this parallel acquisition have been proposed.

The large overhead when acquiring hundreds of snapshots to just use two narrow slices is partially compensated by line cameras, which scan the scene column-wise. Unfortunately, the large acquisition time of line cameras limits them to controlled static environments, e.g., mostly indoor scenarios.

However, it is possible to devise improvements in acquisition speed of omnistereoscopic images using line sensors. For instance, using multiple stereoscopic line sensors, the acquisition speed can be increased linearly with the number of stereoscopic sets. This approach would enable the simultaneous capture of nonoverlapping stereoscopic images of the scene, which can then be mosaicked to create a full omnistereoscopic image in a fraction of time that a single sensor can achieve. Unfortunately, a rotating camera system with such characteristics is still commercially unavailable and, if it were available, it would not be suitable to acquire dynamic scenes.

The panoramic-based methods use panoramic snapshots as raw material to synthesize omnistereoscopic images of a scene. This is a relatively new alternative resulting from the commercial availability of panoramic cameras during the last decade, which made capturing panoramic snapshots of the scene practical.

The idea of using multiple stereoscopic panoramas of the scene to perform a 3-D mapping of a scene had started in the late 1980s.80 For instance, Ishiguro et al.6 proposed to render a sequence of omnistereoscopic images, not for human viewing but to estimate the 3-D relationship within objects in a scene. To do this, the authors proposed to use a mobile platform (a robot) equipped with a rotating camera mounted on top. The mobile unit moved on a preprogrammed route, stopping at intervals to acquire omnistereoscopic images of the scene using the single rotating sensor. The central stereoscopic rig acquisition model was used. This sequence of omnistereoscopic views can be used to estimate the distance to obstacles from the traveling path by matching feature points between stereoscopic images obtained between successive panoramic views, e.g., exploiting motion parallax between samples. This method is constrained to static scenes, the panoramas have to be coplanar, the location of each panoramic sample has to be precisely known, and, more importantly, the accuracy of the estimation decreases as soon as the viewing direction approximates the direction of motion progression (robot acquisition path). The panoramas can be aligned by determining the cylindrical epipoles or, alternatively, by finding the focus of expansion81 direction using motion analysis between panoramic frames in the sequence. The method is only valid for a limited subset of panning angles around the perpendicular of the planned trajectory. Furthermore, uncertainties in the stereoscopic baseline lead to inconsistencies in the final depth perceived from different viewpoints.

Similar to Ishiguro, Kang and Szeliski presented a method82 to reconstruct the 3-D structure of a scene using multibase stereo obtained from randomly located panoramic samples (Fig. 16). The idea was to use stereo matching from multiple viewpoints, whose locations were mapped, to estimate an omnidirectional depth map. Fleck also described a method similar to Ishiguro’s to reconstruct a 3-D model of the scene.83 These methods were conceived for robot navigation and not to produce omnistereoscopic renderings for binocular human viewing.

Graphic Jump LocationF16 :

Multibase stereo reconstruction of a scene based on information acquired from multiple cylindrical panoramas.82

Many techniques were specifically conceived for the off-line navigation of image-based stereoscopic virtual environments. For instance, Yamaguchi et al.84 and, in a more recent follow-up, Hori et al.85 proposed a method to generate stereoscopic images based on a light-field rendering of synthetic novel views using panoramic video frames as sources. This approach enables a smooth navigation of the scene, but creates large data overhead. A key problem with this idea is that the exact position of each panoramic frame must be known a priori to find the best panoramic pair to render a binocular image for each user’s virtual location and gazing direction. In addition, it is not practically feasible to acquire the panorama frames in an accurate equally spaced grid to have control over the stereoscopic baseline. To cope with this problem, Hori et al.22 proposed capturing a panoramic video by simultaneously using two panoramic cameras mounted on a rig. Unfortunately, this approach cannot provide a full omnistereoscopic rendition of the scene, but only in directions perpendicular to the traveling path, and does not solve the data overhead problem.

Vanijja and Horiguchi proposed an alternative method86 specifically tailored for a CAVE type of display. Their idea relies on a limited set of four panoramas acquired in a controlled square sampling pattern, as illustrated in Fig. 17(a), which was used to render four wide FOV panoramic images. These four partial stereoscopic views were projected on each wall of a CAVE87 to produce an omnistereoscopic immersive experience. The image patches extracted from each panorama of the cluster were arranged according to the camera panning direction in azimuth. This mapping is illustrated in Fig. 17(b). The type of display, in this case a CAVE display, imposed restrictions in terms of the number and the spatial distribution of cylindrical panoramas to use. The authors constrained the viewable stereoscopic range to certain elevation angles because the distinct cylindrical projection centers of each stereoscopic pair produced undesirable vertical disparities at low and high gazing angles in elevation. Even with these problems, this method offers advantages in terms of acquisition time and depth consistency between sampled viewpoints, making it suitable for a practical stereoscopic telepresence system. Unfortunately, this technique does not satisfy the dynamic scene acquisition constraint since individual panoramas still need to be acquired sequentially. This omnistereoscopic acquisition approach can be modeled using the central stereoscopic rig model [Fig. 3(a)], where four wide-angle (90-deg FOV in azimuth) stereoscopic snapshots of the scene are sequentially or simultaneously acquired in the directions of θ=0, 90, 180, and 270 deg to produce the image patches (iL,iR), for i=(1...8) as indicated in Fig. 17.

Graphic Jump LocationF17 :

Vanijja and Horiguchi proposed20 using clusters of four panoramic snapshots of the scene taken in a square pattern, to extract eight wide-angle stereoscopic images to render a full omnistereoscopic image: (a) the mapping of the eight image sections from panoramas (I1,I2,I3,I4) and (b) the mosaicking of these sections to create a cylindrical omnistereoscopic pair (IL,IR).

An interesting antecedent of the method of Vanijja et al. can be found in a patent application by Baker et al.14 in 2003, which was granted in 2007. This omnistereoscopic camera proposal is built around four panoramic cameras in a square pattern. The camera is shown in Fig. 18(a) illustrating four panoramic cameras in a square distribution. The arrangement simultaneously acquires four cylindrical panoramas that are used to compose four wide-angle stereoscopic views of the scene as is illustrated in Fig. 18(b) and 18(c). Although this is a multicamera system, it is relevant to include it in this section since it is a parallelization of the sequential method proposed by Vanijja et al. three years later.20 It should be mentioned that this configuration has a minimum distance to the camera where correct stereoscopic acquisition is possible; however, it has the potential to produce omnistereoscopic images of dynamic scenes. Another drawback is its fixed stereoscopic baseline, which limits its use to certain scenes. This is a good candidate for the real-time omnistereoscopic acquisition of dynamic scenes, but no attempt has been done in formalizing the stereoscopic image formation of this camera.

Graphic Jump LocationF18 :

The omnistereoscopic camera patented by Baker on behalf of Image Systems Inc.14 proposed a parallel acquisition of a panoramic cluster that precedes Vanijja and Horiguchi’s proposal:86 (a) using four panoramic cameras in a square pattern to simultaneously acquire four overlapping cylindrical panoramic snapshots of the scene, (b) a possible rendering strategy using sections of each cylindrical panorama to render stereoscopic close-ups, and (c) alternative with a slightly larger baseline, but where the regions for stereoscopic rendering are farther from the camera.

A different approach, also based on using a cluster of cylindrical panoramas of the scene in a controlled pattern, was proposed by the authors of this paper.21 Our omnistereoscopic technique was designed to reduce the acquisition time and improve the overhead efficiency. The idea improves upon the method proposed by Vanijja et al.86 by using clusters of three coplanar cylindrical panoramic snapshots of the scene. Each cluster has an equilateral triangular pattern, where each side length is directly related to the desired stereoscopic baseline. The triangular cluster is shown in Fig. 19(a), where the projection centers of each cylindrical panorama are at the vertices of the triangle. Once the panoramas are aligned,81 it is possible to map and extract stereoscopic images from pairs of panoramas within the cluster as a function of the panning direction, similar to Vanijja’s method. A similar idea based on panoramic triad was suggested by Zhu in 2001.1 An example of this mapping for a triad of cylindrical images is illustrated in Fig. 19(b), where each image section of panoramas (I1,I2,I3) corresponds to a particular camera panning direction in azimuth. Mosaicking these six pairs of images renders a full omnistereoscopic view of the scene in a fraction of the time needed by sequential methods based on acquiring an omnistereoscopic image column-wise. This method can be modeled by the central stereoscopic rig acquisition method [Fig. 3(a)], where six stereoscopic pairs are sampled (sequentially or simultaneously) for the panning regions indicated as (iL,iR), for i=(1,,6), as indicated in Fig. 19.

Graphic Jump LocationF19 :

Omnistereoscopic images using a cluster of three panoramas: (a) three cylindrical panoramas (I1,I2,I3) in a coplanar, triangular pattern can be used to extract six image sections, and then mosaicking them using the sequence illustrated in (b) to create two novel stereoscopic views (IL,IR).

Pros and Cons of Using Panoramic Sources

Although the sequential methods based on panoramic sources are not suitable to acquire dynamic scenes, some of them can be adapted to multisensor configurations using wide-angle cameras. This is the case of clusters of panoramas where the distance and spatial location between projection centers are precisely known. An example of this parallelization in the acquisition is the camera proposed by Baker et al.14 using four panoramic sources and the sequential approach proposed by Vanijja et al.20 using the same square pattern. Similar parallelization can be applied to the authors’ own proposal of using triangular clusters of coplanar cylindrical panoramas. The main cons of using panoramic sources to extract wide-angle stereoscopic views of the scene are the natural self-occlusion between panoramas in parallel acquisition approach that leads to large data overhead.

The earlier antecedents of using multiple cameras to compose a wide-angle stereoscopic image date back to 1965, when Clay was granted a patent13 for a camera designed to capture wide-angle stereoscopic images. Although this camera was not conceived to produce omnistereoscopic imagery, it is one of the earliest antecedents of panoramic (in the wide-angle sense) stereoscopic photography to the best of our knowledge. The device illustrated in Fig. 20(a) is made of a multitude of cameras. The different optical axes converge in the direction of the object or ROI to photograph; each camera captures the image from a different viewpoint. Each camera, paired with its immediate neighbor, constitutes a stereoscopic pair. This can be taken as a precursor of a parallelization of the central stereoscopic rig acquisition method [Fig. 3(a)]. The drawback is that this configuration uses a multiplicity of narrow-angle lenses, which creates stereoscopic pairs that can only capture stereoscopic objects when they are located far from the camera. In other words, this camera acquires stereoscopic views of scenes where the distance from the camera is so large that human binocular perception is just marginal. Conversely, the foreground scene can be captured by individual cameras only; hence there is an important blind stereoscopic region. In its original conception, this idea used off-the-shelf film cameras, but a similar idea has been reused to capture multiple overlapping sections of the image from different viewpoints using lenticular arrays in front of each individual camera.35,46,64

Graphic Jump LocationF20 :

Examples of omnistereoscopic cameras based on multiple sensors: (a) in an early patent from 1965, Clay exploited the overlapping FOVs between cameras with slightly different viewpoints to produce stereopsis.13 (b) Shimada and Tanahashi’s multiple-camera configuration designed to produce omnidirectional depth maps in real time.88,89

An approach for constructing a panoramic depth map using multiple cameras distributed over an icosahedric structure [Fig. 20(b)] was proposed by Shimada et al.88 and by Tanahashi et al.89,90 Each face of the icosahedron houses three cameras, determining an arrangement of 60 cameras, more specifically, 20 sets of three cameras. The authors proposed to use the stereo images from three different stereo pairs per face, e.g., grouping the three cameras in three groups of two, to create disparity maps in three spatial directions. The beauty of this idea is that only one camera per face is used to compose a spherical panorama, while each face’s estimated depth map is registered on the final spherical image. The authors of this camera configuration proposed this idea to detect movements in every direction; however, the concept of independently creating a correct panorama and a panoramic depth map can be exploited to create a synthetic omnistereoscopic image.12 The geometric distribution of cameras makes this configuration attractive to render stereoscopic panoramas in spherical format, unlike the majority of omnistereoscopic acquisition methods that focus on cylindrical topologies. The problem of acquiring spherical stereoscopic views of dynamic scenes is still open to further research.

Firoozfam et al. presented a camera capable of producing omnistereoscopic images by mosaicking six stereoscopic snapshots of the scene.91 The authors of this work proposed to add omnidirectional depth estimation capabilities to their previous panoramic camera design.92 To do so, a configuration based on six stereoscopic camera rigs in a star-like hexagonal pattern was used. This camera corresponds to the off-centered stereoscopic rig acquisition model illustrated in Fig. 3(d), where stereoscopic rigs are located radially in six equality spaced θ angles. An illustration of their camera is shown in Fig. 21(a). A prototype of their omnistereoscopic camera, which was conceived for underwater visualization, was built circa 2002, and even calibration of the stereoscopic camera pairs were reported. Although this camera was proposed for underwater robot navigation, the possibility of real-time omnistereoscopic visualization by a remotely located human operator was foreseen.

Graphic Jump LocationF21 :

Multicamera examples: (a) panoramic camera configuration using multiple stereoscopic pairs in a hexagonal pattern (Baker et al.93), (b) different configurations using narrower FOV lenses and larger number of cameras, and (c) an alternative multicamera configuration proposed by the authors.12

Baker et al. filed a patent application93 in 2008 on an omnistereoscopic camera using Firoozfam et al.’s concept. More specifically, their camera was also based on acquiring six partially overlapped stereoscopic images, using a star-like configuration [Fig. 21(a)]. Unfortunately, this camera configuration still lacks a theoretical framework to justify the geometric distribution of cameras. In terms of rendering, the problem of stitching images acquired from cameras with different projection points in this configuration needs to be addressed.

In 2012, Baker and Constantin filed a patent application94 on a different multicamera configuration. An example of this camera is illustrated in Fig. 21(b) for 12 cameras, although the authors suggested using a larger number of cameras. As in the cases discussed above, the idea is to acquire partially overlapped stereoscopic snapshots of the scene, which can be mosaicked to render a cylindrical omnistereoscopic image. The authors suggested using 16 to 24 individual cameras with 45 to 30 deg FOV in azimuth, respectively. The distance between projection centers of adjacent cameras {[iL,(i+1)L] and [iR,(i+1)R], for i=(1,,6)} can be kept smaller than in the star-like hexagonal distribution illustrated in Fig. 21(a). However, the price to pay for this increased proximity is using a larger number of cameras with narrower angle lenses to prevent self-occlusion. For example, using 24 cameras, the intercamera distance is approximately 0.15×b, where b=65mm for a normal interocular distance, and can be smaller for hypo-stereo. The main attraction of this configuration is reducing the parallax between adjacent cameras’ projection centers while maintaining a larger baseline than the configuration in Fig. 21(a). However, using narrow-angle lenses, the minimum distance to objects in the scene is larger for stereoscopic acquisition, e.g., the distance where the scene appears in both cameras’ FOV of a stereoscopic rig. The same configuration has appeared in another recent patent application,95 but reducing the number of stereoscopic pairs by using wider angle lenses [Fig. 21(b)]. This camera can be modeled as an off-centered stereoscopic rig [Fig. 3(d)].

Our own contribution to multicamera configurations12 is illustrated in Fig. 21(c). This camera addresses the problem of creating a monoscopic panorama with respect to a common cylindrical projection center O, and of acquiring accessory information to render an omnistereoscopic counterpart. The stereoscopic information follows the idea behind the approaches of Huang et al.8 and Yamada et al.,9 but uses a multisensor configuration to satisfy the real-time omnidirectional constraint of the problem. This multisensor configuration can be modeled by the lateral-radial stereoscopic rig acquisition model [Fig. 3(c)], which in this case models acquiring six stereoscopic snapshots separated in equal panning increments (Δθ=60deg). The usage of wide-angle lenses helps to reduce the number of necessary stereoscopic pairs.

The omnistereoscopic rendering based on the images acquired by the camera proposed in Fig. 21(c) can be done by mosaicking stereoscopic snapshots or by synthesizing a stereoscopic view in every direction based on the central panorama and the horizontal disparity and occlusion maps extracted from each stereoscopic image. The central panorama is always rendered by mosaicking images originating from cameras iL [i=(1...6)]. The mosaicking of the images originating from cameras iR [i=(1...6)] produces a right-eye omnidirectional view of the scene, but only when the radial distance rc and baseline b, which in this case are equivalent (rcb), are small (b3.5cm). This is done to prevent excessive ghosting while mosaicking the right-eye panorama. The mosaicking is a low-complexity a