JEI Letters

High-speed parallel very large scale integration architecture for global stereo matching

[+] Author Affiliations
Sungchan Park, Hong Jeong

POSTECH, Department of Electronic and Electrical Engineering, Pohang 790-784, Korea

J. Electron. Imaging. 17(1), 010501 (March 13, 2008). doi:10.1117/1.2892680
History: Received May 23, 2006; Revised October 06, 2007; Accepted January 07, 2008; Published March 13, 2008
Text Size: A A A

Open Access Open Access

Although stereo matching algorithms based on belief propagation (BP) tend to show excellent matching performance, their huge computational complexity has been the major barrier to real-time applications. In this light, we propose a parallel very large scale integration (VLSI) architecture for BP computation, which has only simple integer operations and shows low matching error rate for the Middlebury database.

Figures in this Article

Stereo matching algorithms find corresponding points in a pair of images to locate 3-D positions. They can be classified into either local or global matching approaches.1 Local approaches, like correlation and dynamic programming methods, deal only with subimages. These approaches have the advantage of real-time speed,23 but tend to produce high errors. In contrast, the global approaches, like graph cuts and belief propagations (BPs),4 deal with full images. These approaches have the advantage of low errors, but tend to execute huge computational loads. In real-time applications, like robot vision, the stereo matching system should be compact and fast. In this context, we present a very large scale integration (VLSI) architecture for stereo matching with BP.

Given the left and right images gr, gl, and the parameters cd, cv, Kd, and Kv, we describe the energy model for a 2-D Markov random field (MRF) as follows.4Display Formula

d̂=argmindE(d),E(d)=p,qNV(dp,dq)+pPD(dp),
Display Formula
D(dp)=min[cdgr(dp+p)gr(dp),Kd],
Display Formula
V(dp,dq)=min(cvdpdq,Kv),
where D(dp) denotes the data cost of the label dp[0,dmax1] at the pixel p in the image P, and V(dp,dq) denotes the discontinuity cost between the label dp and dq of the neighbor nodes N. The disparity d̂ can be estimated using the BP’s message̱update̱function (p,q,k,k+1) and the decisioṉfunction (p,K) as follows:Display Formula
mpqk+1(dq)=mindp{V(dp,dq)+Dp(dp)+uN(p)\q[mupk(dp)α]},
Display Formula
2d̂p=argmindp[Dp(dp)+qN(p)mqpK(dp)],
Display Formula
α=dpmspk(dp)dmax,
where N(p)\q denotes the neighbors of p other than q, and α denotes the normalization value. At each node p, the message mpqk+1(dq) at k+1 iteration times is updated synchronously using neighboring message mupk(dp) and sent from node p to neighbor node q. After K iterations, the d̂p at each node is decided by Eq. 2.

Generally, BP can be separated into two methods, mainly according to the update style. The first method updates all the nodes synchronously on the 2-D MRF.4 The second method updates all the nodes sequentially; first in the inward direction from the leaf to the root and next in the reverse direction.56 The sequential method needs to update only once at each node and obtains the final results. Therefore, the nodes can be propagated fast with the small number of operations. In Ref. 5, the authors reported that the sequential update based on spanning trees in MRF can achieve fast convergence. We applied the tree structure to each scan line, as shown in Figs. 1. The tree messages are updated using messages from the neighboring scan lines that have been determined in the previous iteration times. For the image with M×N pixels in Fig. 1, N scan lines can be separated into G groups, and the H lines of each group g[0,G1] can be processed in parallel with H processors. This observation is shown in our VLSI parallel sequences as follows. A node located in a pixel is denoted by a 2-D vector p=[p0p1]T. For synchronous iteration k from 1 to K, for group g from 0 to G1(=NH1), for each parallel processor h from 0 to H1, (p=[gH+hp1]T).

  1. Inward processing to root, for p1=0,,M1, message̱update̱function (p,p+[01]T,k,k) for rightward message.
  2. Outward processing to leaf, for p1=M1,,0, message̱update̱function (p,p[01]T,k,k) for leftward message message̱update̱function (p,p+[10]T,k,k+1) for downward message message̱update̱function (p,p[10]T,k,k+1) for upward message decisioṉfunction (p,K).

  1. Inward processing to root, for p1=0,,M1, message̱update̱function (p,p+[01]T,k,k) for rightward message.
  2. Outward processing to leaf, for p1=M1,,0, message̱update̱function (p,p[01]T,k,k) for leftward message message̱update̱function (p,p+[10]T,k,k+1) for downward message message̱update̱function (p,p[10]T,k,k+1) for upward message decisioṉfunction (p,K).

Graphic Jump LocationF1 :

Update sequence at k iteration times on 2-D MRF: (a) inward processing, (b) outward processing, and (c) parallel processing within group.

As shown in Fig. 2, the H processors calculate the messages in the parallel, receiving the left and right pixel data from the H scan line buffers, and reading and writing with each message buffer. The processor consists of the processing elements (PE) PEf,PEb,PEu, PEd, and PEo. Using the image data and the neighboring messages, PEf calculates the forward message in the inward time and PEb, PEu, PEd, and PEo calculate each direction’s message and disparity d̂ in the outward time.

Graphic Jump LocationF2 :

Parallel and pipeline architecture: (a) processor array and (b) PE.

The PE is the basis logic to calculate the new message at each node as follows.Display Formula

mpqk+1(dp)=mindq[0,dmax1]V(dp,dq)+msum(dq)α,
Display Formula
msum(dq)=Dp(dq)+uN(p)\qmupk(dq).
When V(t,l)=min(Cvtl,Kv), by the recursive backward and forward methods of the distance transform,4 the time complexity is O(5D) for D disparity levels. Due to our pipeline structure, 2-D clocks are necessary for calculating the message mpqk+1(dp). In the forward initialization, D1(1)=B, D2(1)=B (B is as big as possible).

For clock t from 0 to D1 in the forward PE,Display Formula

D1(t)=min[msum(t),D1(t1)+Cv],
Display Formula
D2(t)=min[msum(t),D2(t1)],
Display Formula
mf(t)=D1(t),mf(1)=D2(D1)+Kv,α=D2(D1).
In the backward initialization, D3(1)=B.

For clock t from 0 to D1 in the backward PE,Display Formula

D3(t)=min(mf(D1t),D3(t1)+Cv),mpqk+1(t)=min[D3(t),mf(1)]α.
Figure 2 shows the PE architecture. The data cost PE calculates the data cost D(t) from the left and right image pixels. The forward PE reads msum(t), which is the sum of the messages and the data cost; outputs the forward cost mf(t), which is the minimum value between msum(t) and D1(t1)+Cv; and saves it to the stack. In Eq. 3, the parameters are calculated for the backward time. In our system, the minimum cost of msum(t) is used for the normalized parameter α.

The backward processor reads the mf(D1t) from the stack, calculates the minimum value D3(t) recursively, outputs the minimum value between D3(t) and the parameter mf(1), and then subtracts it by α for the normalization.

As shown in Fig. 3, we tested our system using four grayscale images and ground truth data from the Middlebury database.1 The error rate represents the percentage of disparity error of more than 1 between output d(x,y) and ground truth dTRUE(x,y),Display Formula

error(%)=100Nm(x,y)Pm[d(x,y)dTRUE(x,y)>1],Nm=(x,y)Pm1,
where Pm is the pixel area except for the occlusion part, and Nm is the pixel number in this area. As shown in Fig. 4 and Table 1, our system, iterated a small number of times, shows the results superior to the local method23 and BP in the Tsukuba image. Based on Fig. 4, our disparity error converges rapidly around 12 iterations.

Table Grahic Jump Location
Error rate comparison in several images.
Graphic Jump LocationF3 :

Left images: (a) Tsukuba, (b) Venus, (c) Sawtooth, and (d) Map.

Graphic Jump LocationF4 :

Comparison of outputs in Tsukuba: (a) ground truth, (b) convergence rate, (c) our system at 12 iterations, (d) BP at 12 iterations, (e) local method,2 and (f) local method.3

Given an M×N image, D disparity levels, and T iterations, we need only 2-D forward and backward processing clocks in PE, and 2M inward and outward processing steps at each scan line. G(=NH) groups are iterated T times by H processors in parallel at F frame rates. From Fig. 5, the total necessary number of clocks to process F frames in one second is 49M clocks [=2D(2M)(NH)TF=(16×2)×(256×2)×(24024)×12×25]. Our system’s 65-MHz clock was enough for real-time processing.

Graphic Jump LocationF5 :

Chip specifications: (a) overall system and (b) resource usage and output performance.

As shown in Fig. 2, the overall memory is spent for the scan line buffer and message buffer. For real-time processing, our processors should be allowed to access a pair of images in the scan line buffer, while the buffer loads new images from the cameras continuously. Hence, the size of buffer for four images is allocated on the field programmable gate array (FPGA), which means 2Mbits[=4×(256×240×8)]. Given C(=4)bits as the size of message at each disparity level and H(=24) processors, the message buffer memory size is as follows:  leftward message: 1.5kbits=HCD, rightward message: 0.4Mbits=HCDM,  upward and downward message: 8Mbits=2HCDMG.

In outward processing, the newly calculated leftward message is only used for the next pixel processing. One scan line size of rightward messages in inward time should be stored for outward processing. Since the upward and downward messages are updated synchronously for the entire image, we need to store all the pixel messages in the image. Due to this big memory size, 22 processors access the external memory through an 8×22-bit data bus. Two processors use the internal block RAMs on the FPGA. The overall memory resource usage is described in Fig. 5.

Although BP produces good error performances in the area of image processing, the VLSI architecture has not been fully studied yet. In this context, we propose a parallel VLSI architecture for stereo matching. The test system has only 16 disparity levels, which might not be satisfactory for 3-D recognition tasks. However, for applications, like real-time Z-keying and target-background separation, where low disparity error at the object boundary is important, the proposed chip can be effectively used.

Scharstein  D., and Szeliski  R., “ A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. ,” Int. J. Comput. Vis..  0920-5691 47, (1/2/3 ), 7–42  (Apr. (2002)).
Hirschmuller  H., “ Improvements in real-time correlation-based stereo vision. ,”  IEEE Workshop Stereo Multi-Baseline Vision. , pp. 141–148  ((2001)).
Sukjune  Y., , Sung-Lee  P., , Sungchul  K., , and Yoon-Keun  K., “ Fast correlation-based stereo matching with the reduction of systematic errors. ,” Pattern Recogn. Lett..  0167-8655 26, (14 ), 2221–2231  ((2005)).
Felzenszwalb  P. F., and Huttenlocher  D. R., “ Efficient belief propagation for early vision. ,”  Proc. 2004 IEEE CVPR. 1, , I-261–I-268  ((2004)).
Martin  J. W., , Wainwright  M J., , Jaakkola  T. S., , and Willsky  A. S., “ Tree-based reparameterization framework for analysis of sum-product and related algorithms. ,” IEEE Trans. Inf. Theory.  0018-9448 49, (5 ), 1120–1146  ((2003)).
Tappen  M. F., and Freeman  W. T., “ Comparison of graph cuts with belief propagation for stereo, using identical MRF parameters. ,” in ICCV, pp. 900–907 (2003).
© 2008 SPIE and IS&T

Citation

Sungchan Park and Hong Jeong
"High-speed parallel very large scale integration architecture for global stereo matching", J. Electron. Imaging. 17(1), 010501 (March 13, 2008). ; http://dx.doi.org/10.1117/1.2892680


Figures

Graphic Jump LocationF5 :

Chip specifications: (a) overall system and (b) resource usage and output performance.

Graphic Jump LocationF4 :

Comparison of outputs in Tsukuba: (a) ground truth, (b) convergence rate, (c) our system at 12 iterations, (d) BP at 12 iterations, (e) local method,2 and (f) local method.3

Graphic Jump LocationF3 :

Left images: (a) Tsukuba, (b) Venus, (c) Sawtooth, and (d) Map.

Graphic Jump LocationF2 :

Parallel and pipeline architecture: (a) processor array and (b) PE.

Graphic Jump LocationF1 :

Update sequence at k iteration times on 2-D MRF: (a) inward processing, (b) outward processing, and (c) parallel processing within group.

Tables

Table Grahic Jump Location
Error rate comparison in several images.

References

Scharstein  D., and Szeliski  R., “ A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. ,” Int. J. Comput. Vis..  0920-5691 47, (1/2/3 ), 7–42  (Apr. (2002)).
Hirschmuller  H., “ Improvements in real-time correlation-based stereo vision. ,”  IEEE Workshop Stereo Multi-Baseline Vision. , pp. 141–148  ((2001)).
Sukjune  Y., , Sung-Lee  P., , Sungchul  K., , and Yoon-Keun  K., “ Fast correlation-based stereo matching with the reduction of systematic errors. ,” Pattern Recogn. Lett..  0167-8655 26, (14 ), 2221–2231  ((2005)).
Felzenszwalb  P. F., and Huttenlocher  D. R., “ Efficient belief propagation for early vision. ,”  Proc. 2004 IEEE CVPR. 1, , I-261–I-268  ((2004)).
Martin  J. W., , Wainwright  M J., , Jaakkola  T. S., , and Willsky  A. S., “ Tree-based reparameterization framework for analysis of sum-product and related algorithms. ,” IEEE Trans. Inf. Theory.  0018-9448 49, (5 ), 1120–1146  ((2003)).
Tappen  M. F., and Freeman  W. T., “ Comparison of graph cuts with belief propagation for stereo, using identical MRF parameters. ,” in ICCV, pp. 900–907 (2003).

Some tools below are only available to our subscribers or users with an online account.

Related Content

Customize your page view by dragging & repositioning the boxes below.

Related Book Chapters

Topic Collections

PubMed Articles
Advertisement
  • Don't have an account?
  • Subscribe to the SPIE Digital Library
  • Create a FREE account to sign up for Digital Library content alerts and gain access to institutional subscriptions remotely.
Access This Article
Sign in or Create a personal account to Buy this article ($20 for members, $25 for non-members).
Access This Proceeding
Sign in or Create a personal account to Buy this article ($15 for members, $18 for non-members).
Access This Chapter

Access to SPIE eBooks is limited to subscribing institutions and is not available as part of a personal subscription. Print or electronic versions of individual SPIE books may be purchased via SPIE.org.