PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
This PDF file contains the front matter associated with SPIE Proceedings Volume 6811, including the Title Page, Copyright information, Table of Contents, and the Conference Committee listing.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
An image processing path typically involves color correction or white balance resulting in higher than unity color gains. A gain higher than unity increases the noise in that respective channel, and therefore degrades the SNR performance of the input signal. If the input signal does not have enough SNR to accommodate the extra gain, the resultant color image has increased color noise. This is the usual case for color processing in cell phone cameras, which have sensors with limited SNR and high color crosstalk. This phenomenon degrades images more as illuminants differ from D65. In addition, the incomplete information for clipped pixels often results in unsightly artifacts during color processing. To correct this dual problem, we investigate the use of under unity color gains, which, by increasing the exposure of the sensor, would improve the resultant SNR of the color corrected image. The proposed method preserves the appearance of clipped pixels and the overall luminance of the image, while applying the appropriate color gains.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In Cardiovascular minimal invasive interventions, physicians require low-latency X-ray imaging applications, as
their actions must be directly visible on the screen. The
image-processing system should enable the simultaneous
execution of a plurality of functions. Because dedicated hardware lacks flexibility, there is a growing interest
in using off-the-shelf computer technology. Because memory bandwidth is a scarce parameter, we will focus
on optimization methods for bandwidth reduction within multiprocessor systems at the chip level. We create
a practical realistic model of required compute and memory bandwidth for a given set of image-processing
functions. Similar modeling is applied for the available system resources. We concentrate in particular on X-ray
image processing based on multi-resolution decomposition, noise reduction and image-enhancement techniques.
We derive formulas for which we can optimize the mapping of the application onto processors, cache and memory
for different configurations. The data-block granularity is matched to the memory hierarchy, so that caching
will be optimized for low latency. More specifically, we exploit the locality of the signal-processing functions to
streamline the memory communication. A substantial performance improvement is realized by a new memorycommunication
model that incorporates the data dependencies of the
image-processing functions. Results show
a memory-bandwidth reduction in the order of 60% and a latency reduction in the order of 30-60% compared to
straightforward implementations.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Chaos-based image encryption techniques are very useful for protecting the contents of digital images and videos. They
use traditional block cipher principles known as chaos confusion, pixel diffusion and number of rounds. The complex
structure of the traditional block ciphers makes them unsuitable for real-time encryption of digital images and videos.
Real-time applications require fast algorithms with acceptable security strengths. This paper presents a simple chaosbased
image encryption scheme using cryptographic operations based on Galois field of order 2n with combinations of a
master key, a session key and a key image. A discretized 2D chaotic map and a pseudo random noise are also used to
thwart statistical and differential attacks. The proposed approach, in contrast to traditional chaos-based schemes,
generates a chaotic map from the key image and then uses this chaotic image to destroy the statistical and perceptual
properties of the image to be encrypted using the Galois field operations. The simulation tests use real and synthetic
images (a gradient image as the key image) to demonstrate the performance of the proposed approach. The results show
that the proposed approach is fast enough for real-time applications and provides acceptable security strength.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
It has been shown that one can make use of local instabilities in turbulent video frames to enhance image resolution
beyond the limit defined by the image sampling rate. This paper outlines a real-time solution for the implementation of
super-resolution algorithm on MPEG-4 platforms. The MPEG-4 video compression standard offer, in real-time, several
features, such as motion extraction with quarter pixel accuracy, scene segmentation to video object planes, global motion
compensation and de-blocking and de-ringing filters, which can be incorporated into the super-resolution process to
produce enhanced visual output. Experimental verification on real-life videos is also provided.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A distance transformation (DT) takes a binary image as input and generates a distance map image in which the
value of each pixel is its distance to a given set of object pixels in the binary image. In this research, DT's for
multi class data (MCDTs) are developed which generate both a distance map and a class map containing for each
pixel the class of the closest object. Results indicate that the MCDT based on the Fast Exact Euclidean Distance
(FEED) method is a factor 2 tot 4 faster than MCDTs based on exact or semi-exact euclidean distance (ED)
transformations, and is only a factor 2 to 4 slower than the MCDT based on the crude city-block approximation
of the ED. In the second part of this research, the MCDTs were adapted such that they could be used for the
fast generation of distance and class maps for video sequences. The frames of the sequences contain a number of
fixed objects and a moving object, where each object has a separate label. Results show that the FEED based
version is a factor 2 to 3.5 faster than the fastest of all the other video-MCDTs which is based on the chamfer 3,4
distance measure. FEED is even a factor 3.5 to 10 faster than another fast exact ED transformation. With video,
multi class FEED it will be possible to measure distances from a moving object to various identified stationary
objects with nearly the frame rate of a webcam. This will be very useful when the risk exists that objects move
outside surveillance limits.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
On board video analysis has attracted a lot of interest over the two last decades with as main goal to improve safety by
detecting obstacles or assisting the driver. Our study aims at providing a real-time understanding of the urban road
traffic. Considering a video camera fixed on the front of a public bus, we propose a cost-effective approach to estimate
the speed of the vehicles on the adjacent lanes when the bus operates on a dedicated lane. We work on 1-D segments
drawn in the image space, aligned with the road lanes. The relative speed of the vehicles is computed by detecting and
tracking features along each of these segments. The absolute speed can be estimated from the relative speed if the camera
speed is known, e.g. thanks to an odometer and/or GPS. Using pre-defined speed thresholds, the traffic can be classified
into different categories such as 'fluid', 'congestion' etc. The solution offers both good performances and low computing
complexity and is compatible with cheap video cameras, which allows its adoption by city traffic management authorities.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
There is growing interest in video-based solutions for people monitoring and counting in business and security
applications. Compared to classic sensor-based solutions the
video-based ones allow for more versatile functionalities,
improved performance with lower costs. In this paper, we propose a real-time system for people counting
based on single low-end non-calibrated video camera.
The two main challenges addressed in this paper are: robust estimation of the scene background and the number
of real persons in merge-split scenarios. The latter is likely to occur whenever multiple persons move closely,
e.g. in shopping centers. Several persons may be considered to be a single person by automatic segmentation
algorithms, due to occlusions or shadows, leading to under-counting. Therefore, to account for noises, illumination
and static objects changes, a background substraction is performed using an adaptive background model
(updated over time based on motion information) and automatic thresholding. Furthermore, post-processing
of the segmentation results is performed, in the HSV color space, to remove shadows. Moving objects are
tracked using an adaptive Kalman filter, allowing a robust estimation of the objects future positions even under
heavy occlusion. The system is implemented in Matlab, and gives encouraging results even at high frame rates.
Experimental results obtained based on the PETS2006 datasets are presented at the end of the paper.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In the H.264/AVC video coding standard, the mode decision component involves a large amount of
computation. This paper presents a fast or computationally efficient mode prediction and selection
approach which has the following attributes: (a) both the spatial and temporal information are used to
achieve early termination using adaptive thresholds, (b) inclusion of a modulator capable of trading off
computational efficiency and accuracy, (c) a homogenous region detection procedure for 8×8 blocks based
on adaptive thresholds. The developed approach consists of three main steps: (1) mode prediction, (2) early
termination based on adaptive thresholds, and (3) refinement by checking all the modes. In addition, in
order to avoid sub-partitions into smaller block sizes for 8x8 blocks, texture information is utilized. It is
shown that the developed approach leads to a computationally efficient video coding implementation as
compared to the previous fast approaches. The results obtained on QCIF, CIF, and HD format video
sequences based on x264 are presented to demonstrate the computational efficiency of the developed
approach at the expense of acceptably low losses in video quality.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Currently no information about different shots has been used in the H264/AVC video coding standard. This kind of information can help use choose the size of GOPs and finally reduce the bit rate and PSNR fluctuation when video sequences have multiple shots. We developed an MPEG-7 based rate control scheme in H264/AVC standard. Our proposed scheme outperforms the rate control of the H264 AVC significantly in terms of reducing the average bit rate fluctuation (variance) by 7-60% and the average PSNR fluctuation (variance) by 24-90% between shots. It is also applicable in computationally and memory restricted devices since it needs maximum 2 frames buffer space for MPEG-7 descriptor calculation, while the average amount of extra processing is only 5.8% of the total CPU cycles.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper presents an efficient VLSI architecture for the intra prediction of the H.264 video compression standard. To
address the computational complexity issue, we propose a dedicated processor that can compute multiple intra prediction
modes in parallel. The proposed architecture accelerates the intra coding process. It can support large video format at
high frame rate in real-time.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper presents a fast implementation of a wavelet-based video codec. The codec consists of motion-compensated temporal filtering (MCTF), 2-D spatial wavelet transform, and SPIHT for wavelet coefficient coding. It offers compression efficiency that is competitive to H.264. The codec is implemented in software running on a general purpose PC, using C programming language and streaming SIMD extensions intrinsics, without assembly language. This high-level software implementation allows the codec to be portable to other general-purpose computing platforms. Testing with a Pentium 4 HT at 3.6GHz (running under Linux and using the GCC compiler, version 4), shows that the software decoder is able to decode 4CIF video in real-time, over 2 times faster than software written only in C language. This paper describes the structure of the codec, the fast algorithms chosen for the most computationally intensive elements in the codec, and the use of SIMD to implement these algorithms.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Rank filter is a non-linear filter used in image processing for impulse noise removal, morphological
operations, and image enhancement. Real-time applications, such as video and high-speed acquisition
cameras, often require the rank filter, and the much simpler median filter. Implementing the rank filter in
hardware, can achieve the required speeds for these applications. Bit-serial algorithm can increase the
speed of rank filter by eliminating the time-consuming sorting network. In this paper, an 8-stage pipelined
architecture for rank filter is described using the bit-serial algorithm. It also includes an efficient window
extraction and boundary-processing scheme. This rank filter design was simulated and synthesized on the
Xilinx family of FPGAs. For 3×3 window size, the maximum operating frequency achieved was 75 MHz
on a low-end device XC3S200 of Spartan-3 family, and 180 MHz on a high-end device XC4VSX25 of
Virtex-4 family. For 5×5 window size, the maximum operating frequency achieved was 67 MHz on
XC3S200, and 138 MHz on XC4VSX25. With a pixel filtered out at every clock cycle, the achieved
speeds are sufficient for most of the video applications. The 3×3 window size design used 31% of slices on
XC3S200, and 5% on XC4VSX25. The 5×5 window size design used 60% of slices on XC3S200, and 11%
on XC4VSX25. This IP design may be used as a hardware accelerator in a fast image processing SOC.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper describes an implementation of the Hough Transform (HT) that uses a hybrid-log structure for the main
arithmetic components instead of fixed or floating point architectures. A major advantage of this approach is a reduction
in the overall computational complexity of the HT without adversely affecting its overall performance when compared to
fixed point solutions. The proposed architecture is compatible with the latest FPGA architectures allowing multiple units
to operate in parallel without exhausting the dedicated (but limited) on-chip signal processing resources that can instead
be allocated to other image processing and classification tasks. The solution proposed is capable of performing a real-time
HT on megapixel images at frame rates of up to 25 frames per second using a Xilinx VirtexTM architecture.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
For industrial print flaw detection images are acquired and then compared to a specimen (master image). Due to the
production process, the images are not exactly aligned to each other. Therefore, preceding a pixel-by-pixel comparison,
the acquired image has to be rectified in order to match the master image' properties-it has to be warped into the
master image' coordinate system. To achieve the required detection speed, several Megapixels per second have to be
processed. It proved to be very advantageous to continuously process the stream of image data in an image processing
pipeline. The first stage is the warping process. In this paper we introduce a streaming warper unit which implements
affine backward mapping and cubic spline interpolation. Since a complete pixel transformation is computed per clock
cycle the performance-implemented on contemporary FPGA
devices--can be up to 200 Megapixels per second. The
implementation of several streaming warper units within a single FPGA is possible. This enables image processing
systems which allow high data rates even under real-time constraints.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The System-on-Chip design of specific image analysis architectures, which are based on massively parallel Markov Random Field (MRF) processing principles is so far an unstructured, faultprone and complex task. Up to now neither a systematically derived architecture-template nor an industrial approved tool-chain is available to support the VLSI design task for these kind of digital architectures. In this contribution, we report on a theoretical sound and systematically derived architecture-template for massively parallel MRF processing devices. The paper is finalized by prototypical implementations of selected architecture parts using FPGA technologies. These results demonstrate the capability of the proposed architecture-template and manifest the industrial relevance of the template.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Video encoding algorithms require processing of data arranged in blocks of pixels. For efficient computation, pixel
blocks are expected to be stored contiguously in memory, and within each block, pixels are to be arranged in a raster
scan (left to right, top to bottom order). Since data captured from the video port is linearly arranged in memory (one line
after the other), it is necessary to arrange the data in the two-dimensional form before processing for encoding. A
common approach to achieve the two-dimensional arrangement is through optimized functions (in C or Assembly) to
arrange the captured data, which is stored in an intermediate buffer, into an input buffer from which it is ready for
encoding. However, this approach has two main drawbacks. First, a portion of the CPU MHZ budget is consumed only
on data arrangement. Second, an intermediate data buffer is required to hold the data before the arrangement into the
input buffer takes place, and hence increasing the memory requirements. In this paper, a memory and MHZ efficient
EDMA transfer scheme is introduced for simultaneous data transfer and two-dimensional arrangement from the video
port to the DSP memory. The proposed scheme is described in details for TI TMS320DM642TM.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The level set method for curve evolution is a popular technique used in image processing applications. However,
the numerics involved make its use in high performance systems computationally prohibitive. This paper proposes
an approximate level set scheme that removes much of the computational burden while maintaining accuracy.
Abandoning a floating point representation for the signed distance function, we use the integral values to
represent the interior, zero level set, and exterior. We detail rules governing the evolution and maintenance of
these three regions. Arbitrary energies can be implemented with the definition of three operations: initialize
iteration, move points in, move points out.
This scheme has several nice properties. First, computations are only performed along the zero level set.
Second, this approximate distance function representation requires only a few simple integer comparisons for
maintenance. Third, smoothness regularization involves only a few integer calculations and may be handled apart
from the energy itself. Fourth, the zero level set is represented exactly removing the need for interpolation off the
interface. Lastly, evolution proceeds on the order of milliseconds per iteration using conventional uniprocessor
workstations.
To highlight its accuracy, flexibility and speed, we demonstrate the technique on standard intensity tracking
and stand alone segmentation.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A novel method to accelerate the application of linear filters that have multiple identical coefficients on arbitrary kernels
is presented. Such filters, including Gabor filters, gray level morphological operators, volume smoothing functions, etc.,
are widely used in many computer vision tasks. By taking advantage of the overlapping area between the kernels of
the neighboring points, the reshuffling technique prevents from the redundant multiplications when the filter response
is computed. It finds a set of unique coefficients, constructs a set of relative links for each coefficient, and then sweeps
through the input data by accumulating the responses at each point while applying the coefficients using their relative links.
Dual solutions, single input access and single output access, that achieve 40% performance improvement are provided. In
addition to computational advantage, this method keeps a minimal memory imprint, which makes it an ideal method for
embedded platforms. The effects of quantization, kernel size, and symmetry on the computational savings are discussed.
Results prove that the reshuffling is superior to the conventional approach.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Motion estimation in video sequences is a classical intensive computational task that is required for a wide
range of applications. Many different methods have been proposed to reduce the computational complexity,
but the achieved reduction is not enough to allow real time operation in a non-specialized hardware. In this
paper an efficient selection of singular points for fast matching between consecutive images is presented, which
allows to achieve real time operation. The selection of singular points lies in finding the image points that
are robust to the noise and the aperture problem. This is accomplished by imposing restrictions related to
the gradient magnitude and the cornerness. The neighborhood of each singular point is characterized by a
complex descriptor vector, which presents a high robustness to illumination changes and small variations in the
3D camera viewpoint. The matching between singular points of consecutive images is performed by maximizing
a similarity measure based on the previous descriptor vector. The set of correspondences yields a sparse motion
vector field that accurately outlines the image motion. In order to demonstrate the efficiency of this approach, a
video stabilization application has been developed, which uses the sparse motion vector field as input. Excellent
results have been efficiency of the proposed motion
estimation technique.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The conceptual configuration and special features of a new
high-precision real-time signal processing system for a
Cooled Infrared Focal Plan Array with 320×240 detectors is presented in this note. The most critical case in the image
detectors based on array elements is the Non-Uniformity Correction (NUC) between the sensitive elements due to the
different characteristics of the materials in fabrication phase, especially for the IRFPAs with high elements which their
non-uniformities are inherently more severe. It is the case that a mechanism for NUC between the sensitive elements
needs a structure for compensation of gradual drift in the detectors' output by update the correction factors in a regular
way. A feasible method, Least-Mean-Square under a compact hardware is introduced. The correction formula deduced
from this approach is the best approximation polynomial of the analytic formula through theoretical analysis. Inherently,
the intended detector has not a suitable timing for sending out for standard display equipments, so it is essential to have
hardware for frame-to-frame grab which can help for processing applications too. Applying a capability of the
sophisticated FPGAs, the contrast enhancement based on Bi-Histogram Equalization which preserves the brightness of
infrared image precisely is described.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The usage of spatial-temporal information is more efficient than just their usage in a separate way. It has been designed a
new fuzzy logic adaptive scheme applying directional and fuzzy processing technique with motion detection and spatialtemporal
filtering of video sequences. The proposed method can distinguish the uniform regions, edges and details
features in the images decreasing the processing time charges, taking only in account the samples, which demonstrate
high level of corruption or motion. The algorithm runs adapting spatial-temporal information to smooth an additive
noise. The non-stationary noise, which left after temporal algorithm, is removed employing a magnitude algorithm that is
adapted using parameters obtained during the filtering. The designed algorithm is compared with other filters found in
literature, showing the effectiveness of proposed fuzzy logic filtering approach.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
For real-time imaging with digital video cameras, good tonal rendition of video is important to ensure high visual comfort for the user. Except local contrast improvements, High Dynamic Range (HDR) scenes require adaptive gradation correction (tone-mapping function) which should enable good visualization of details at lower brightness. We discuss how to construct and control optimal tone-mapping functions, which enhance visibility of image details in the dark regions while not excessively compressing the image in the bright image parts. The result of this method is a 21-dB expansion of the dynamic range. The new algorithm was successfully evaluated in HW and although suited for any video system, it is particularly beneficial for those processing HDR video.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We develop a rapid object-candidates detector using Increment Sign Correlation (ISC). Our method aims to detect
candidates of objects such as people or vehicles in real time using ISC and a simple shape model. Our method is similar
to Generalized Hough Transform (GHT). However we modify its voting process. We use ISC for detecting object
candidates instead of the shape voting done by GHT. ISC is robust against shading and low image contrast due to
lighting changes because Increment Sign (IS) is insensitive to a perturbation of direction of intensity gradient. The
computational cost of IS is lower than that of the gradient also. From the results of our experiment, our detector can run
with a 320×240 pixel image within 32 milliseconds on a Pentium 4 processor at 2.8 GHz. Given the initial template size
of 10×20 pixels, the number of candidates decreases from 170,196 sub-windows in a 320×240 pixel image to 400 at
most with the miss rate of 0.2 %. The detection rate is enough for more precise detectors which need to use richer image
features. The experimental results using real image sequences are reported.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Normal mapping is a powerful technique for simulation of surface roughness by means of normal maps. The high polygon count model is represented by coarse polygon mesh with fine details stored in the normal map. Thus, the technique greatly reduces geometric complexity of models and shifts the demands on effective normal map compression algorithms. In this paper we present normal compression algorithm which extends the commonly used 3Dc algorithm introduced by ATI with wavelet compression based on Haar basis. Each block component is coded by one of two modes and the one which introduce the smallest error is chosen for block component representation. This allows for better adaptation to normal map data and improves the peak signal to noise ratio as compared to standalone 3Dc.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The wavelet transform is currently being used in many engineering fields. The real-time implementation of the Discrete
Wavelet Transform (DWT) is a current area of research as it is one of the most time consuming steps in the JPEG2000
standard. The standard implements two different wavelet transforms: irreversible and reversible Daubechies. The former
is a lossy transform, whereas the latter is a lossless transform. Many current JPEG2000 implementations are software-based
and not efficient enough to meet real-time deadlines. Field Programmable Gate Arrays (FPGAs) are
revolutionizing image and signal processing. Many major FPGA vendors like Altera and Xilinx have recently developed
SIMULINK tools to support their FPGAs. These tools are intended to provide a seamless path from system-level
algorithm design to FPGA implementation. In this paper, we investigate FPGA implementation of 2-D lifting-based
Daubechies 9/7 and Daubechies 5/3 transforms using a Matlab/Simulink tool that generates synthesizable VHSIC
Hardware Description Language (VHDL) code. The goal is to study the feasibility of this approach for real time image
processing by comparing the performance of the high-level toolbox with a handwritten VHDL implementation. The
hardware platform used is an Altera DE2 board with a 50MHz Cyclone II FPGA chip and the Simulink tool chosen is
DSPBuilder by Altera.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we propose algorithms that map the low-level motion compensation and transformation functions of MPEG-1/2, H.263/MPEG-4 ASP and H.264/MPEG-4 AVC video codecs onto common workflows. This way, a single discrete
implementation of luma prediction, chroma prediction and residual transform stages is sufficient for all covered video
coding standards.
The proposed luma prediction is based on 4×4 blocks to cover the H.264 specifications as well as the elder standards. The
design consists of a singular four stage pipeline for two block interpolation and two block averaging stages. Targeted for
hardware implementation, a strictly linear execution is provided, avoiding branch operations. The algorithmic behavior is
entirely dictated by the contents of the parameter ROM.
Since chrominance prediction must cover blocks as small as 2×2 pixels, a distinct operation is proposed for chroma. The
bilinear operation scheme in H.264 is able to carry out the operations for the elder standards with minor changes only.
In H.264, the classic 8×8 DCT transformation was replaced by a simplified 4×4 integer transform, based on a heavily
quantized DCT scheme. By modifications of a well-known multiplier-adder-based scheme, a generalized transformation
stage can be derived.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we introduce an FPGA implementation for correcting radial distortion which is non-linear and non-uniform, and inherently observed in images taken by wide angle lenses. In the implementation, the correction is performed in on-the-fly manner by employing a parallel architecture which focuses on efficient manipulation
of look-up table (LUT) for coordinate translation: LUT decomposition and single-LUT-multiple-access method. 2D LUT is decomposed into three 1D LUTs to reduce the resource usage. The strategy of single-LUT-multipleaccess is inspired by the fact that there exists spatial and temporal proximity among the LUT accesses, even the nature of the mapping is non-linear and non-uniform. In addition, a way to eliminate redundancy, which occurs as the backward mappings and the interpolations are overlapping, is incorporated into the implementation. The series of effort aims to alleviate problems observed in conventional FPGA implementations of image handling
algorithms, which are parallelization of function blocks for higher throughput and minimization of the number of access to off-chip memory. As the result, the corrected image to a distorted input frame can be stored within a vertical blank interval, with less usage of hardware resources and without unnecessary access to off-chip memory.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A method of real-time object detection for video surveillance systems has been developed. The method aims to realize robust object detection by using Radial Reach Correlation (RRC). We also apply a statistical background estimation to cope with dynamic and complex environments. The computational cost of RRC is higher than the simple subtraction method and the background estimation method based on statistical approach needs large memory. It is necessary to reduce the calculation cost in order to apply to an embedded image processing device. Our method is composed of two techniques: fast RRC algorithm and background estimation based on statistical approach with cumulative averaging process. As a result, without deterioration in detection accuracy, the processing time of object detection can be decreased to about 1/4 in comparison with normal RRC.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
All-zero blocks (AZB) denote blocks with all zero DCT coefficients after quantization. Early determination of AZB can
avoid unnecessary DCT/Q/IQ/IDCT computation. Existing techniques in the literature primarily address more efficient
thresholds for early determination of AZB. This paper deals with the selection of such thresholds based on low level
features including motion activity and texture information. This aspect is then utilized to avoid any: (1) unnecessary
quarter accuracy motion estimation, (2) unnecessary multiple reference frame motion estimation, and (3) unnecessary
DCT/Q/IQ/IDCT computation. The developed approach has been applied to two different format video sequences CIF
and QCIF. The results show that the computational complexity is significantly reduced while the video quality is
maintained at a tolerable loss level.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Here, a new and efficient strategy is introduced which allows moving objects detection and segmentation in video
sequences. Other strategies use the mixture of gaussians to detect static areas and dynamic areas within the images so
that moving objects are segmented [1], [2], [3], [4]. For this purpose, all these strategies use a fixed number of gaussians
per pixel. Typically, more than two or three gaussians are used to obtain good results when images contain noise and
movement not related to objects of interest. Nevertheless, the use of more than one gaussian per pixel involves a high
computational cost and, in many cases, it adds no advantages to single gaussian segmentation. This paper proposes a
novel automatic moving object segmentation which uses an adaptive variable number of gaussians to reduce the overall
computational cost. So, an automatic strategy is applied to each pixel to determine the minimum number of gaussians
required for its classification. Taking into account the temporal context that identifies the reference image pixels as
background (static) or moving (dynamic), either the full set of gaussians or just one gaussian are used. Pixels classified
with the full set are called MGP (Multiple Gaussian Pixel), while those classified with just one gaussian are called SGP
(Single Gaussian Pixel). So, a computation reduction is achieved that depends on the size of this last set. Pixels with a
dynamic reference are always MGP. They can be Dynamic-MGP (DMGP) when they belong to the dynamic areas of the
image. However, if the classification result shows that the pixel matches one of the gaussian set, then the pixel is labeled
static and therefore it is called Static-MGP (SMGP). Usually, these last ones are noise pixels, although they could belong
to areas with movement not related to objects of interest. Finally, pixels with a static reference that still match the same
gaussian are SGP and they belong to the static background of the image. However, if they do not match the associated
gaussian, they are changed either to SMGP or DMGP. In addition, any pixel can maintain its status and SMGP can be
changed to DMGP and SGP. A state diagram shows the transition schemes and its characterizations, allowing the
forecasting of the reduction of the computational cost of the segmentation process. Tests have shown that the use of the
proposed strategy implies a limited loss of accuracy in the segmentations obtained, when comparing with other strategies
that use a fixed number of gaussians per pixel, while achieving very high reductions of the overall computational cost of
the process.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Video tracking is widely used for surveillance, security, and defense purposes. In cases where the camera is not fixed due to pans and tilts, or due to being fixed on a moving platform, tracking can become more difficult. Camera motion must be taken into account, and objects that come and go from the field of view should be continuously and uniquely tracked. We propose a tracking system that can meet these needs by using a frame registration technique to estimate camera motion. This estimate is then used as the input control signal to a Kalman filter which estimates the target's motion model based on measurements from a mean-shift localization scheme. Thus we decouple the camera and object motion and recast the problem in terms of a principled control theory solution. Our experiments show that using a controller built on these principles we are able to track videos with multiple objects in sequences with moving cameras. Furthermore, the techniques are computationally efficient and allow us to accomplish these results in real-time. Of specific importance is that when objects are lost off-frame they
can still be uniquely identified and reacquired when they return to the field of view.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.