KEYWORDS: Scalable video coding, Video, Volume rendering, Digital photography, Associative arrays, Quantization, Resolution enhancement technologies, Signal processing, Electroluminescence, Internet
We present a smoothed reference inter-layer texture prediction mode for bit depth scalability based on the
Scalable Video Coding extension of the H.264/MPEG-4 AVC standard. In our approach, the base layer encodes
an 8-bit signal that can be decoded by any existing H.264/MPEG-4 AVC decoder and the enhancement layer
encodes a higher bit depth signal (e.g. 10/12-bit) which requires a bit depth scalable decoder. The approach
presented uses base layer motion vectors to conduct motion compensation upon enhancement layer reference
frames. Then, the motion compensated block is tone mapped and summed with the co-located base layer residue
block prior to being inverse tone mapped to obtain a smoothed reference predictor. In addition to the original
inter-/intra-layer prediction modes, the smoothed reference prediction mode enables inter-layer texture prediction
for blocks with inter-coded co-located block. The proposed method is designed to improve the coding efficiency
for sequences with non-linear tone mapping, in which case we have gains up to 0.4dB over the CGS-based BDS
framework.
Modern video coding schemes such as H.264/AVC employ multi-hypothesis motion compensation for an improved
coding efficiency. However, an additional cost has to be paid for the improved prediction performance in these
schemes. Based on the observed high correlation among the multiple hypothesis in H.264/AVC, in this paper,
we propose a new method (Prediction Matching) to jointly combine explicit and implicit prediction approaches.
The first motion hypothesis on a predicted block is explicitly coded, while the eventual additional hypotheses are
implicitly derived at the decoder based on the first one and the available data from previously decoded frames.
Thus, the overhead to indicate motion information is reduced, while prediction accuracy may be better with
respect to fully implicit multi-hypothesis prediction. Proof-of-concept simulation results show that up to 7.06%
bitrate saving with respect to state-of-the-art H.264/AVC can be achieved using our Prediction Matching.
KEYWORDS: Cameras, Video, Video compression, Video coding, Imaging systems, Transform theory, Error analysis, 3D video compression, Computer programming, Virtual colonoscopy
We consider the effect of depth-image compression artifacts on the quality of virtual views rendered using neighboring views. Such view rendering processes are utilized in new video applications such as 3D television (3DTV) and free viewpoint video (FVV). We first analyze how compression artifacts in compressed depth-images result in distortions in rendered views. We show that the rendering position error is a monotonic function of the coding error. For the scenario in which cameras are arranged with parallel optical axes, we further demonstrate specific properties of rendering position error. Exploiting special characteristics of depth-images, namely smooth regions separated by sharp edges, we investigate a possible solution to suppress compression artifacts by encoding depth-images with a recently published sparsity-based in-loop de-artifacting filter. Simulation results show that applying such techniques not only provides significantly higher coding efficiency for depth-image coding, but, more importantly, also improves the quality of rendered views in terms of PSNR and subjective quality.
In this paper, we analyze focus mismatches among cameras utilized in a multiview system, and propose techniques
to efficiently apply our previously proposed adaptive reference filtering (ARF) scheme to inter-view prediction in
multiview video coding (MVC). We show that, with heterogeneous focus setting, the differences exhibit in images
captured by different cameras can be represented in terms of the focus setting mismatches (view-dependency) and
the depths of objects (depth-dependency). We then analyze the performance of the previously proposed ARF
in MVC inter-view prediction. The gains in coding efficiency show a strong view-wise variation. Furthermore,
the estimated filter coeffcients demonstrate strong correlation when the depths of objects in the scene remain
similar. By exploiting the properties derived from the theoretical and performance analysis, we propose two
techniques to achieve effcient ARF coding scheme: i) view-wise ARF adaptation based on RD-cost prediction,
which determines whether ARF is beneficial for a given view, and ii) filter updating based on depth-composition
change, in which the same set of filters will be used (i.e., no new filters will be designed) until there is significant
change in the depth-composition within the scene. Simulation results show that significant complexity savings
are possible (e.g., the complete ARF encoding process needs to be applied to only 20% ~ 35% of the frames)
with negligible quality degradation (e.g., around 0.05 dB loss).
KEYWORDS: Digital filtering, Cameras, Video, Video coding, Optical filters, Video surveillance, Expectation maximization algorithms, Computer programming, Electronic filtering, Video compression
We consider the problem of coding multi-view video that exhibits mismatches in frames from different views.
Such mismatches could be caused by heterogeneous cameras and/or different shooting positions of the cameras.
In particular, we consider focus mismatches across views, i.e., such that different portions of a video frame can
undergo different blurriness/sharpness changes with respect to the corresponding areas in frames from the other
views. We propose an adaptive filtering approach for cross-view prediction in multi-view video coding. The
disparity fields are exploited as an estimation of scene depth. An Expectation-maximization (EM) algorithm
is applied to classify the disparity vectors into groups. Based on the classification result, a video frame is
partitioned into regions with different scene-depth levels. Finally, for each scene-depth level, a two-dimensional
filter is designed to minimize the average residual energy of cross-view prediction for all blocks in the class.
The resulting filters are applied to the reference frames to generate better matches for cross-view prediction.
Simulation results show that, when encoding across views, the proposed method achieves up to 0.8dB gain over
current H.264 video coding.
It is highly desirable for many broadcast video applications to be able to provide support for many diverse user devices, such as devices supporting different resolutions, without incurring the bitrate penalty of simulcast encoding. On the other hand, video decoding is a very complex operation, while the complexity is very dependent on the resolution of the coded video. Low power portable devices typically have very strict complexity restrictions and reduced-resolution displays. For such environments total bitrate efficiency of combined layers is an important requirement, but the bitrate efficiency of a lower layer individually, although desired, is not a requirement. In this paper, we propose a complexity constrained scalable system, based on the Reduced Resolution Update mode that enables low decoding complexity, while achieving better Rate-Distortion performance than an equivalent simulcast based system. Our system is targeted on broadcast environment with some terminals having very limited computational and power resources.
The new H.264 video coding standard supports picture and macroblock level adaptive frame/field coding, which can improve coding efficiency when coding interlaced sequences. A good design of an encoder needs to support all these modes and be able to decide which one is the most appropriate mode for encoding a macroblock or picture. It could be argued that the optimal solution can be found by employing a multi-pass strategy, that is to encode a macroblock or picture using all possible coding modes, and by selecting the one that yields the best coding performance. Unfortunately the computational complexity of such a multi-pass encoder is relatively high. In this paper, we propose a novel single-pass algorithm based on motion activity detection. The proposed scheme is performed in a pre-analysis stage and can reduce complexity by approximately 40%-60% compared to the two-pass frame/field encoder, while maintaining similar coding efficiency.
This paper addresses the problem of selecting a single transcoding method from multiple transcoding possibilities that satisfy a specified delivery constraint. We first discus the rate-distortion modeling of DCT coefficients. Then, we quantify the distortion of the video transcoding output by examining the rate-distortion relationship of the popular transcoding techniques, including requantization, spatial downsampling and temporal downsampling. The use of our model is illustrated in some typical applications.
KEYWORDS: Video, Error analysis, Data hiding, Internet, Error control coding, Forward error correction, Video coding, Computer programming, Visualization, Video compression
In network delivery of compressed video, packets may be lost if the channel is unreliable. Such losses tend to occur in burst. In this paper, we develop an error resilient video encoding approach to help error concealment at the decoder. We introduce a new block shuffling scheme to isolate erroneous blocks caused by packet losses. And we apply data hiding to add additional protection for motion vectors. The incorporation of these scheme adds little complexity to the standard encoder. Experimental results suggest that our approach can achieve a reasonable quality for packet loss up to 30% over a wide range of video materials.
This paper discusses the problem of reduced-resolution transcoding of compressed video bitstreams. An analysis of drift errors is provided to identify the sources of quality degradation when transcoding to a lower spatial resolution. Two types of drift error are considered: a reference picture error, which has been identified in previous works, and error due to the non-commutative property of motion compensation and down-sampling, which is unique to this work. To overcome these sources of error, four novel architectures are presented. One architecture attempts to compensate for the reference picture error in the reduced resolution, while another architecture attempts to do the same in the original resolution. We present a third architecture that attempts to eliminate the second type of drift error and a final architecture that relies on an intra block refresh method to compensate all types of errors. In all these architectures, a variety of macroblock level conversions are required, such as motion vector mapping and texture down-sampling. These conversions are discussed in detail. Another important issue for the transcoder is rate control. This is especially important for the intra refresh architecture since it must find a balance between number of intra blocks used to compensate errors and the associated rate-distortion characteristics of the low-resolution signal. The complexity and quality of the architectures are compared. Based on the results, we find that the intra refresh architecture offers the best trade-off between quality and complexity, and is also the most flexible.
The development and spread of multimedia services require authentication techniques to prove the originality and integrity of multimedia data and (or) to localize the alterations made on the media. A wide variety of authentication techniques have been proposed in the literature, but most studies have been primarily focused on still images. In this paper, we will mainly address video authentication. We first summarize the classification of video tampering methods. Based on our proposed classification, the quality of existing authentication techniques can be evaluated. We then propose our own authentication system to combat those tampering methods. The comparison of two basic authentication categories, fragile watermark and digital signature, are made and the need for combining them are discussed. Finally, we address some issues on authenticating a broad sense video, the mixture of visual, audio and text data.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.