Regular Articles

Reduced-reference video quality assessment using a static video pattern

[+] Author Affiliations
Michail-Alexandros Kourtis, Harilaos Koumaras

National Center for Scientific Research “Demokritos,” Institute of Informatics and Telecommunications, Agia Paraskevi, Patriarxou Grigoriou, Athens 15310, Greece

Fidel Liberal

University of the Basque Country, Department of Communications Engineering, Alameda de Urquijo, Bilbao 48013, Spain

J. Electron. Imaging. 25(4), 043011 (Jul 19, 2016). doi:10.1117/1.JEI.25.4.043011
History: Received October 9, 2015; Accepted June 30, 2016
Text Size: A A A

Open Access Open Access

Abstract.  A reduced-reference video quality assessment (VQA) method was proposed by using structural similarity (SSIM) index as a tool to extract features from both the original and the target video sequences, using a reference video pattern. The method is suitable for monitoring the video quality in real time and across the service provision chain. The performance of the proposed method was evaluated using a large experimental set of reference and nonreference video sequences and achieves an accuracy higher than 2.56% in comparison to SSIM. Additionally, comparison to subjectively evaluated scores of Laboratory for Image and Video Engineering video quality dataset, based on difference mean opinion scores, shows that the performance of the proposed method is within the range of the full reference VQA methods.

Figures in this Article

The thriving interest of consumers for video content has brought the world closer to the universe of digital video provision than ever before. The recent market success of media services such as internet protocol television and video on demand through consumer electronic products, has created a considerable increase of the network traffic, which has caused the network operators to apply restriction policies, creating a controversial issue on the network neutrality and traffic differentiation.

This situation has created the need for more advanced encoding techniques (e.g., H.265/HEVC) and adaptation schemes1 which achieve higher compression ratios and provide agility to the adaptation of the media service,2 alleviating the network operators from the media-related network traffic. Thus, during the service provision, the video stream may be needed to be dynamically transcoded at different formats/profiles (e.g., such as in the paradigm of mobile-edge computing), resulting in adapted media services that dynamically fit the current network conditions and the terminal device specifications. However, this in-service transcoding process, which today can be supported by the emerging software-defined networking and network function virtualization techniques,3 introduces to the media service a wide variety of encoding impairments that degrade the deduced quality level of the encoded media service. Thus, the quality degradation of the media service, caused by the in-service transcoding process, creates the need for defining flexible video quality assessment (VQA) methods that will be able to evaluate the quality with the same accuracy as the well claimed video quality metrics [e.g., structural similarity (SSIM) index4]. These VQA methods would be able to assess the quality not only during the initial coding process of the source signal, but also across the media service delivery path to the end user, providing useful feedback both to the content provider for service adaptation actions and to the network operator for optimal traffic steering decisions.

To monitor the quality in real time and across the service provision chain,5 it is necessary to use flexible VQA tools, which are suitable for in-service integration, evaluating the media service along its network delivery path.

Currently, the available VQA methods are divided into two categories: the subjective and the objective ones. More specifically, the subjective VQA methods6 are based upon the opinion score of a group of viewers regarding the visual degradation of an encoded video sequence compared to the original uncompressed sequence, establishing them as the primary choice for video quality evaluation tests in terms of reliability, but also practically from a commercial perspective. The subjective video quality evaluation methods are expensive and time-consuming mainly due to their demanding setup within a controlled room/environment with sophisticated apparatus, which leads to the fact that they cannot be commercially exploited, especially within the service provision chain for monitoring purposes of the delivered video service.

Correspondingly, the objective VQA techniques are mathematical computational models that utilize various image characteristics (e.g., luma and chroma),716 or other image statistics (e.g., blockiness),17 to approximate as well as possible the subjective test results in an efficient and cost-effective way. The objective methods are categorized into three groups determined by their approach and the metric used for the quality assessment: the full reference (FR) ones, the reduced reference (RR) ones, and the no-reference (NR) ones.

The FR methods evaluate the video quality by comparing the frames of the original video and the target video. The methods perform multiple channel decomposition of the video signal, where the objective method is applied on each channel, which feature a different weight factor according to the characteristics of the human visual system (HVS), using contrast sensitivity functions, channel decomposition, error normalization, weighting, and finally Minkowski error pooling for combining the error measurements into a single perceived quality estimation.18 Also, in the bibliography, FR methods for a single channel have been proposed where the proposed objective metric is applied on the video signal, without considering varying weight functions. Some FR metrics that are based on the video structural distortion have been proposed,19 among which is the widely known SSIM index, which has a very wide range of applicability across many different fields.2022 All FR methods, including SSIM, provide higher accuracy and credibility in comparison to the rest of the categories (RR and NR), but in the evaluation process, they require both the original and the encoded video sequences at the same site, making them inappropriate for integration in the service provision chain where the original signal is not available at the end-user site.

The RR methods are able to evaluate the video quality level based on metrics which use only some extracted structural features from the original signal.23,24 The concept of the RR metrics was introduced by Refs. 7, 24, and 25, where the RR metric was based upon the extraction of various spatial and temporal features of the reference video, which are easily exposed to distortions added by the standard video compression process. RR metrics can be roughly categorized into three categories.

The first category includes all methods based upon models of low-level statistical properties of the original natural image. An RR metric that belongs to this category is described in Ref. 26, which provides a condensed amount of RR information obtained by the comparison of the marginal probability distribution of wavelet coefficients in different wavelet subbands with the probability density function of the wavelet coefficients of the decoded signal, while using the Kullback–Leiber divergence as a distance between distributions.

The second category of RR metrics includes methods that capture visual distortions, to quantify the decoded signal’s quality.2730 However, this type of metrics only performs well, when there is sufficient knowledge about the degradation process that the signal underwent. It is not efficient to apply these techniques to general cases whose distortion has not been previously assumed.

The third and the last categories of RR metrics are based upon models of the viewer’s perception, e.g., the HVS.3134 These models exploit and apply different psychological and psychological vision studies on the end users in an attempt to imitate the behavior of subjective test groups.

RR methods are more flexible for in-service integration since they require only partial information of the original video signal, but they have reduced accuracy and credibility in comparison to the FR metrics.

Finally, the NR methods evaluate the video quality on the basis of processing the frames of the target video alone. As they do not require any information from the original video sequence, they can be easily integrated within the service provision chain. However, their performance is limited to specific visual artifacts (e.g., tiling), restricting their range of applicability to special cases only. Thus, from the family of the objective methods, the RR and the NR metrics are more suitable for in-service integration, but they suffer from limited efficiency in comparison to the FRs, which offer high accuracy, but applicability limitations, since they require the original video sequence for assessing the video quality.

It is evident that RR methods require an additional communication channel within the network architecture to transmit the extracted features from the video provider site to the end-user terminals. It is obvious that the required bandwidth depends on the specific RR method. A bandwidth in the range of 1 to 150 kbps is usually required, depending on the RR method and the feature extraction type.23,35 However, the method described in Ref. 35, which requires under 1-kbps bandwidth, performs very poorly in terms of absolute difference compared to subjective tests.

An alternative technique is that the extracted features (or in general the reference information) will be encapsulated inside the forward link, along with the video transmission. The more features that are extracted and transmitted to the end user, the more accurate is the objective video quality. However, more features require higher bandwidth of the communication network. So in RR methods, there is a trade-off between the accuracy of the VQA and the constraints in the network bandwidth.

In this paper, an RR method is proposed that is suitable for in-service use. The features extracted from both the original and the target video frames are based on the evaluation of the SSIM index. In this respect, the SSIM index is calculated for each frame of the original video using as reference a static white pattern, i.e., a video whose frames are all white. The SSIM index for each frame is transmitted to the end-user site. At the end-user terminal, the SSIM index is calculated between each frame of the received (target) video and the same static white pattern. The proposed RR metric is the ratio of the two SSIM indices. Through experimental measurements, it is shown that the proposed metric has a value very close to the SSIM index as calculated from the comparison of the original and the target video frames. The accuracy of the proposed metric as compared to SSIM depends on the video degradation, i.e., the higher the degradation, the worse is the accuracy. Experimental measurements show that for an acceptable video quality level,36,37 the mean absolute percentage deviation (MAPD) of the proposed method is lower than 2.56%.

Another advantage of the proposed method is the low bit rate reference information signal that needs to be sent to the end user, which ranges between 400 and 600 bps.

The rest of the paper is organized as follows: Sec. 2 describes the use of SSIM as a feature extraction method from both the original and the target videos and introduces the metric SSIM RR (SRR), which can be directly compared to the original SSIM index. Section 3 presents a qualitative interpretation of the proposed video quality metric SRR. In Sec. 4, the performance evaluation of the proposed method is presented and analyzed, and finally Sec. 5 concludes the paper.

Among the most reliable objective evaluation metrics is the SSIM, which measures the SSIM between two image sequences, exploiting the general principle that the main function of the HVS is the extraction of structural information from the viewing field. If x and y are two video signals, then SSIM is defined as Display Formula

SSIM(x,y)=(2μxμy+C1)(2σxy+C2)(μx2+μy2+C1)(cx2+σy2+C2),(1)
where μx and μy are the mean of x and y, σx, σy, and σxy are the variances of x, y and the covariance of x and y, respectively. The constants C1 and C2 are defined as Display Formula
C1=(K1L)2C2=(K2L)2,(2)
where L is the dynamic pixel range and K1=0.01 and K2=0.03, respectively.

In the typical SSIM index evaluation process, it is assumed that both the original and the target video sequences are available at the same site, as shown in Fig. 1. SSIM(x,y) evaluation for every frame can be based on any software implementation of Eq. (1), where x is the original video sequence (VSo) in Fig. 1, y is the target video sequence VSt, and SSIMot is their SSIM(x,y) index.

Graphic Jump Location
Fig. 1
F1 :

Typical SSIM index evaluation with original and target video sequences available at the same site.

According to Ref. 4, where SSIM index is defined and introduced, SSIM comprises three image characteristics components: the luminance, the contrast, and the structure comparison. Their combination results in the widely used SSIM index. The three separate comparison components are Display Formula

l(x,y)=2(1+R)1+(1+R)2+C1μx2,(3)
Display Formula
c(x,y)=2σxσy+C2σx2+σy2+C2,(4)
Display Formula
s(x,y)=σxy+C3σxσy+C3.(5)
The combination of Eqs. (3) luminance, (4) contrast, and (5) structure comparisons results in the SSIM index equation Display Formula
SSIM(x,y)=[l(x,y)]a*[c(x,y)]β*[s(x,y)]γ.(6)
According to the SSIM index equation, the parameters α, β, and γ adjust the relative importance of each of the three components in the calculation of the SSIM, but for simplicity reasons, the authors of Ref. 4 have selected the case that α=β=γ=1, which results in the well-known expression of the SSIM index. However, the decision that the three components participate equally in the final calculation of SSIM index is not sufficiently justified by the authors in Ref. 4 and it seems that it is a decision reached only for simplicity reasons.

This motivated us to further research the sensitivity analysis of the SSIM accuracy under different α, β, and γ weights. More specifically, considering the purpose of this paper is to develop a flexible metric suitable for in-service applicability, we notice that the parameter γ specifies the importance of the SSIM between signals x and y, which is the most influential factor for the FR requirement in the applicability of the SSIM, since according to the following type, the σxy factor needs to be calculated for the measurement of Eq. (5).

In the proposed method, in order to investigate the in-service applicability of the SSIM [i.e., in in-service cases where the calculation of Eq. (5) is not feasible due to the lack of the reference signal], we research in this paper the case that γ0, so the relevant importance of Eq. (5) in the SSIM calculation is limited. Therefore, the requirement for the reference signal to be available together with the encoded signal for the estimation of Eq. (5) ceases to exist, allowing the decomposition of the SSIM index exclusively for the Eqs. (3) and (4) parameters, where no ex parameters exist that require the existence of both the reference and the encoded signal at the same place.

Based on this analysis, in the proposed method (see Fig. 2), the SSIM index is used as a tool to extract features from both the original and the target video sequences using a reference video pattern. In this respect, an initial SSIMor value is evaluated for every frame at the service provider site, by comparing the VSo with a video reference pattern (VSr), e.g., a video sequence of white video frames of the same resolution and frame rate, which is artificially generated [later in this section, the authors test the primary colors (i.e., white, black, red, green, and blue) as reference video patterns and it is shown that the white video pattern performs better than the others]. The evaluation of SSIMor can be based on any software implementation of Eq. (1), where x is VSo and y is VSr and refers to each frame. SSIMor can be considered as a feature of each frame of the original video and is sent to the end-user site by any means, i.e., either through the same communication channel as the video, or any other communication channel with a sufficient bandwidth, or by embedding it inside the transport stream of the video sequence. In any case, it is considered that the SSIMor value is recovered at the end-user site.

Graphic Jump Location
Fig. 2
F2 :

SRR evaluation method using a reference video pattern as reference at both the service provider and end-user sites.

Referring to Fig. 2, an SSIMtr value is evaluated at the end-user site by comparing the received (target) video signal (VSt) with a reference video pattern (VSr), which is identical to the one used at the service provider site and is also artificially generated at the site. The evaluation of SSIMtr is based on the same software implementation of Eq. (1) which used at the transmitter site and refers to each frame of the target video sequence. SSIMtr can be considered as a feature of each frame of the target video.

The ratio of these two SSIM values can be considered as a new metric based on SSIM, namely SRR, i.e., Display Formula

SRR=SSIMor/SSIMtr.(7)
Comparing SRR with the SSIM index between the original and the target video sequences, experimental results in Sec. 5 show that the SRR efficiently approximates the SSIM, with an MAPD <2.56% and satisfactory correlation coefficient values with subjective mean opinion scores (MOS).

Concerning the kind of reference video pattern, the applicability of the SRR method was evaluated based on using the primary colors (i.e., white, black, red, green, and blue) as reference video patterns. The selected primary colors portray a distinct RGB diversity, which is essential to demonstrate and evaluate the behavior of the proposed method, maintaining a simplicity in the implementation and representation of the pattern.

In this framework, the proposed method was applied on the experimental set of the 40 test signals, which are encoded with H-264 at three distinct quantization parameters (QP) values, specifically QP=12, 22, and 32, each time utilizing a different primary color as a reference video pattern. The QP value regulates how much spatial detail is maintained during the encoding/compression process (modifying, respectively, the quantization step). A low QP value denotes a low quantization step, therefore, almost all the spatial detail of the video is retained, while a high QP value corresponds to high quantization step and the video spatial detail is aggregated resulting in increased distortion and degradation of video quality. For each color and QP value, the proposed SRR method was applied and each time the MAPD value in relevance to the SSIM was calculated.

The experimental results of the process are provided in Table 1, where it is observed that among the primary colors, the white video pattern provides a better performance across different QP values, while the rest of the primary colors perform notably worse than the white. Therefore, considering its better performance than the rest of the colors and its main characteristic as the easiest representation with only one luminance parameter, the white reference video pattern is recommended for implementing the proposed SRR method and it is selected for executing the experimental parts of this paper.

Table Grahic Jump Location
Table 1MAPD for 40 test signals with different reference video patterns.

Concerning the channel requirements and the overhead of the proposed method, SSIMor is a number <1 and it can be represented by 2 bytes per frame for an accuracy of four decimal places (104). In this case, the required bit rate to be transmitted to the end-user site is 400 bps per 25  frames/s. For an increased accuracy per frame of six decimal places (106), 3 bytes are required, resulting in 600 bps for the SSIMor. Even 600 bps is significantly lower than the value of 1 to 150 kbps required for other RR methods, as mentioned in Sec. 1.

In order to interpret the proposed quality metric and its relation to the SSIM index, we consider a video sequence which is encoded at various QP values, resulting in different bit rates and quality levels. The SSIM index between the original and the target video sequence for each frame is denoted as SSIMot(QP), which is a descending function of QP. Given that the maximum value of SSIMot is equal to 1, the SSIM variation is an exponential function as shown in Ref. 38. Therefore Display Formula

SSIMot(QP)=  eα*QP,(8)
where α is a coefficient that depends on the content of the video signal.38

The plot of SSIMot(QP) versus QP is shown in Fig. 3, curve a. As QP increases, more and more information from the original video frame is lost, i.e., the spatial information (i.e., color and intensity of each pixel) of VSt is lost. In the extreme case, where the information is completely lost, all pixels are equal in color and density, i.e., the VSt is degraded to a sequence of uniform frames, as for example a white video pattern (similar to the used reference video pattern). In this extreme case, the lowest value of SSIMot(QP) is the SSIM index between the original video sequence and the reference video pattern, which can be denoted as SSIMor.

Graphic Jump Location
Fig. 3
F3 :

Variation of: (a) SSIMot, (b) ideal SSIMtr, and (c) real conditions SSIMtr versus QP.

The previous analysis refers to the comparison between the original and the target video frames. If we consider the comparison between the target video frames and the reference video (i.e., the white video pattern explained earlier), the SSIMtr(QP) index versus QP will be an ascending function.

This is because at low QPs the target frame is almost identical to the original and, therefore, will greatly differ from the (white) reference video pattern, resulting in a low SSIMtr value. Furthermore, as QP increases, the target frame will gradually become similar to the reference white video pattern, resulting in a higher value of SSIMtr. For very high QP, the target frame is identical to the reference one and the maximum SSIMtr is equal to 1.

Considering an exponential variation of SSIMtr (similar to SSIMot), it is deduced that Display Formula

SSIMtr(QP)=SSIMor*eα*QP.(9)
The plot of SSIMot(QP) versus QP is shown in Fig. 3, curve b.

From Eqs. (8) and (9), it can be deduced that Display Formula

SSIMot(QP)*SSIMtr(QP)=SSIMorSSIMot=SSIMor/SSIMtr.(10)

Comparing Eqs. (7) and (10), it is evident that Display Formula

SRR=  SSIMot.(11)
That is, in the ideal case, the proposed SRR is equal to the SSIM index. However, in real conditions, the SSIMtr variation differs from Eq. (9), because for high QP values the degraded frame will not become identical to the white reference one. This is due to the fact that the maximum value of SSIMtr in real conditions will not reach the ideal value of 1, as is shown in the plot of SSIMtr in real conditions in curve c of Fig. 3 (dotted line). This deviation will affect the accuracy of Eq. (11), and SRR will differ from the SSIM index by an amount that depends on the difference between curves b and c of Fig. 3.

Performance Comparison of Structural Similarity Reduced Reference to Structural Similarity

The performance of the proposed method is evaluated by comparing SRR with the original SSIM index (SSIMot) for a large number of video frames. In this respect, a wide range of video sets were selected, which include 40 video sequences of various length, resolution, and content. The selected video sequences include 11 reference video sequences,39 2 long-duration sequences (Bigbuckbunny and Elephantsdream), and 27 nonreference video sequences retrieved from movie trailers. The total number of unique frames which were used for evaluating the proposed method is 60,866.

The original uncompressed video sequences were encoded at three QP values: 12, 22, and 32, which satisfactorily cover the achieved video quality range of the encoded/compressed video signals. Higher QP values were not examined because they lead to unacceptable video quality.36 An initial qualitative comparison between SRR and SSIM index is shown in Fig. 4, which shows the variation of SRR and SSIM index for each frame of a video sequence (Kristen&Sara) with QP=32. From Fig. 4, the qualitative similarity of RSS and SSIM index is obvious.

Graphic Jump Location
Fig. 4
F4 :

Qualitative comparison of RSS and SSIM index applied on Kristen&Sara video sequence with QP=32.

For the quantitative measurement of the performance of the proposed method, the MAPD for each frame i between SRR and SSIM index is calculated. MAPD is a widely used metric for measurement of the accuracy of a prediction method, specifically in trend estimation such as the the proposed method Display Formula

MAPD=1ni=1n|SSIMiPredicted_SSIMi|SSIMi,(12)
where SSIMi is the SSIM index per frame i and Predicted_SSIMt is the SRR value for the frame t, according to Eq. (7).

Table 2 presents the MAPD for the experimental set of the 40 video sequences at the three QP values (i.e., 12, 22, and 32) and the mean value of MAPD for each QP.

Table Grahic Jump Location
Table 2MAPD for SRR and SSIM index for a white video reference pattern.

Table 2 shows that the accuracy of the proposed method ranges from 0.62% (QP=12) to 2.56% (QP=32), which represents the worst case performance, showing that the proposed SRR method maintains a satisfactory performance across all the potential range of QP values, although better accuracy is achieved at lower QP values. The comparison between the correlation of QP and ground truth (i.e., MOS) versus the correlation of SRR scores and ground truth provides the result of 28.65, showing the advantage and the better performance of the proposed metric.

The experimental variation of MAPD versus QP is shown in Fig. 5(a), where the trend line is the dashed line. For comparison reasons, the theoretical MAPD is also shown in Fig. 5(b).

Graphic Jump Location
Fig. 5
F5 :

Variation of MAPD versus QP: (a) experimental and (b) theoretical.

It is calculated from Eq. (12), where SSIMi equals SSIMot(QP), as calculated from Eq. (8). Also, Predicted_SSIMt equals SRR, i.e., Display Formula

SRR=SSIMor(QP)SSIMtr(QP),(13)
where SSIMtr(QP) corresponds to curve c of Fig. 3, which refers to an exemplary practical case.

The slopes of the two lines of Figs. 5(a) and 5(b) are the same, which shows that the theoretically calculated SRR is very close to the experimental results.

Performance Comparison of Structural Similarity Reduced Reference to Subjective Difference Mean Opinion Scores and Other Assessment Methods

According to the video quality experts group research,40 in order to obtain a linear relationship between an objective assessment method score and its corresponding subjective score, each metric score x is mapped to q(x). The nonlinear best-fitting logistic function q(x) is given as follows: Display Formula

q(x)=β1{1211+exp[β2(xβ3)]}+β4χ+β5.(14)

The parameters (β1, β2, β3, β4, and β5) are calculated through minimizing the sum of squared differences among the subjective and the mapped scores. To compare the performance of a newly proposed SRR method with the existing ones, performance evaluation metrics are used, such as the Pearson’s linear correlation coefficient (LCC), which is the LCC between the predicted MOS and subjective MOS. LCC is a measure of prediction accuracy of an objective assessment metric, i.e., the capability of the metric to predict the subjective scores with low error. The LCC can be calculated using the below equation Display Formula

LCC=i=1Md(qiq¯)(sis¯)[i=1Md(qiq¯)2]12[i=1Md(sis¯)2]12,(15)
where si and qi are the subjective and the mapped scores for the i’th frame of a video of size Md, respectively, and s¯ and q¯ are the means of the mapped and subjective scores, respectively. A good objective assessment metric is expected to have high LCC (close to 1) in contrast to MAPD, which should have low values (i.e., close to 0), as shown in Sec. 4.1.

Moreover, the Spearman rank order correlation coefficient (SROCC), which measures the monotonicity of the proposed method against subjective human scores, was also applied. SROCC is a nonparametric measure of statistical dependence between two variables, which assesses how well the relationship between two variables can be described using a monotonic function. If there are no repeated data values, a perfect Spearman correlation (equal to 1) occurs when each of the variables is a perfect monotone function of the other.

Therefore, in order to evaluate the performance of the SRR method and derive the LCC, following the aforementioned methodology, the Laboratory for Image and Video Engineering (LIVE) video quality dataset6,41 was used. The LIVE video quality database (VQDB) uses 10 uncompressed high-quality videos with a wide variety of content as reference videos. A set of 150 distorted videos were created from these reference videos (15 distorted videos per reference) using H.264-based compression. Then each degraded video in the LIVE VQDB was assessed by 38 human subjects in a single stimulus study with hidden reference removal, where the subjects scored the video quality on a continuous quality scale. The mean and variance of the difference mean opinion scores (DMOS) were obtained from the subjective evaluations.

A scatter plot of proposed objective SRR scores versus DMOS for all H.264 videos in the LIVE VQDB is shown in Fig. 6 along with the best-fitting logistic function.

Graphic Jump Location
Fig. 6
F6 :

Scatter plot of objective RSS scores versus DMOS with the best-fitting logistic function.

The SROCC and the LCC are computed between the objective and the subjective scores. Table 3 shows the performance of the proposed SRR model against other VQA methods, both FR and RR, in terms of the SROCC and LCC.

Table Grahic Jump Location
Table 3Comparison of the performance of VQA algorithms—LCC and SROCC.

According to Table 3, the accuracy of the proposed SRR method is measured more accurately than the RR VQA methods RR-LHS,42 J.246,33 and Yang’s RR VQA 43 both in terms of LCC and monotonicity (SROCC), except for the RR metric,35 which provides better results. Similarly, the accuracy of the proposed SRR method is better than those of the PSNR and VSNR FR VQA methods in terms of LCC, while it is slightly lower than the performance of VQM and SSIM index, as expected, due to the reduced reference nature of the proposed methodology. In terms of monotonicity (SROCC), the proposed method performs better than the PSNR, but lower than the rest of the VQA methods, without, however, significantly deviating from their performance range.

This paper proposes an RR VQA method using SSIM index as a tool to extract features from both the original and the target video sequences, using a reference video pattern. The method is suitable for monitoring the video quality in real time and across the service provision chain. The performance of the proposed method was evaluated using a large experimental set of 40 reference and nonreference video sequences, with spatial resolution ranging from common intermediate format up to high definition, utilizing a static white video as a relative reference pattern. The proposed method maintains a satisfactory performance across all the potential range of QP values, although better accuracy is achieved at lower QP values. Moreover, comparison to subjectively evaluated scores of LIVE video quality dataset shows that the accuracy of the proposed method is better than the average performance of RR VQA methods and is within the performance range of the FR VQA methods. Finally, another advantage of the proposed method is the low bit rate reference information signal that must be sent to the end user, which ranges between 400 and 600 bps.

This work was undertaken under the Information Communication Technologies (FP7-ICT-2013-11) EU FP7 T-NOVA project, which is funded by the European Commission under the Grant No. 619520.

Koumaras  H., , Kourtis  M. A., and Martakos  D., “Benchmarking the encoding efficiency of H.265/HEVC and H.264/AVC,” in  Future Network & Mobile Summit 2012 , 4–6 July 2012,  Berlin, Germany  (2012).
Koumaras  H.  et al., “Impact of H.264 advanced video coding inter-frame block sizes on video quality,” in  VISAPP 2012 , 24–26 February 2012,  Rome, Italy  (2012).
Liberal  F.  et al., “Multimedia content delivery in SDN and NFV-based towards 5G networks,” IEEE COMSOC MMTC E-Lett.. 10, (4 ), 6 –10 (2015).
Wang  Z.  et al., “Image quality assessment: from error visibility to structural similarity,” IEEE Trans. Image Process.. 13, (4 ), 600 –612 (2004). 1057-7149 CrossRef
Koumaras  H.  et al., “A framework for end-to-end video quality prediction of MPEG video,” J. Visual Commun. Image Represent.. 21, , 139 –154 (2010).CrossRef
Seshadrinathan  K.  et al., “Study of subjective and objective quality assessment of video,” IEEE Trans. Image Process.. 19, (6 ), 1427 –1441 (2010). 1057-7149 CrossRef
Wang  Z., and Simoncelli  E. P., “Reduced-reference image quality assessment using a wavelet-domain natural image statistic model,” Proc. SPIE. 5666, , 149  (2005). 0277-786X CrossRef
Li  Q., and Wang  Z., “Reduced-reference image quality assessment using divisive normalization-based image representation,” IEEE J. Sel. Top. Signal Process.. 3, (2 ), 201 –211 (2009).CrossRef
Cover  T. M., and Thomas  J. A., Element of Information Theory. ,  Wiley ,  New York , (1991).
Ma  L.  et al., “Reduced-reference image quality assessment using reorganized DCT-based image representation,” IEEE Trans. Multimedia. 13, (4 ), 824 –829 (2011).CrossRef
Soundararajan  R., and Bovik  A. C., “RRED indices: reduced reference entropic differencing framework for image quality assessment,” in  Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing , pp. 1149 –1152 (2011).CrossRef
Redi  J. A.  et al., “Color distribution information for reduced-reference assessment of perceived image quality,” IEEE Trans. Circuits Syst. Video Technol.. 20, (12 ), 1757 –1769 (2010).CrossRef
Zeng  K., and Wang  Z., “Temporal motion smoothness measurement for reduced-reference video quality assessment,” in  Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing , pp. 1010 –1013 (2010).CrossRef
Zeng  K., and Wang  Z., “Quality-aware video based on robust embedding of intra and interframe reduced-reference features,” in  Proc. IEEE Int. Conf. Image Processing , pp. 3229 –3232 (2010).CrossRef
Le Callet  P., , Gaudin  C. V., and Barba  D., “Continuous quality assessment of MPEG2 video with reduced reference,” in  Proc. Int. Workshop on Video Processing Quality Metrics for Consumer Electronics , pp. 11 –16 (2005).
Wang  Z., , Sheihk  H. R., , Bovik  A. C., “Objective video quality assessment,” in The Handbook of Video Databases: Design and Application. , , Furht  B., and Marqure  O., Eds., pp. 1041 –1078,  CRC Press ,  Boca Raton, Florida  (2003).
Wang  Z., , Lu  L., and Bovik  A. C., “Video quality assessment based on structural distortion measurement,” Signal Process. Image Commun.. 19, (2 ), 121 –132 (2004). 0923-5965 CrossRef
Gunawan  I. P., and Ghanbari  M., “Reduced-reference picture quality estimation by using local harmonic amplitude information,” in  Proc. London Communications Symp. 2003 , pp. 137 –140,  University College London ,  United Kingdom  (2003).
Gunawan  I., and Ghanbari  M., “Reduced-reference video quality assessment using discriminative local harmonic strength with motion consideration,” IEEE Trans. Circuits Syst. Video Technol.. 18, , 71 –83 (2008). 1051-8215 CrossRef
Wang  Y. K.  et al., “Supervised automatic detection of UWB ground-penetrating radar targets using the regression SSIM measure,” IEEE Geosci. Remote Sens. Lett.. 13, (5 ), 621 –625 (2016).CrossRef
Chang  T.-C., , Xu  S. S.-D., and Su  S.-F., “SSIM-based quality-on-demand energy-saving schemes for OLED displays,” IEEE Trans. Syst. Man Cybern. Syst.. 46, (5 ), 623 –635 (2016).CrossRef
Fang  Y.  et al., “Optimized multioperator image retargeting based on perceptual similarity measure,” IEEE Trans. Syst. Man Cybern. Syst.. 99, , 1 –11 (2016).CrossRef
Webster  A.  et al., “An objective video quality assessment system based on human perception,” Proc. SPIE. 1913, , 15  (1993). 0277-786X CrossRef
Wolf  S., and Pinson  M. H., “Spatial-temporal distortion metric for in-service quality monitoring of any digital video system,” Proc. SPIE. 3845, , 266 –277 (1999). 0277-786X CrossRef
Wolf  S., and Pinson  M. H., “Low bandwidth reduced reference video quality monitoring system,” in  Proc. Int. Workshop on Video Processing Quality Metrics for Consumer Electronics , pp. 76 –79 (2005).
Kusuma  T., and Zepernick  H.-J., “A reduced-reference perceptual quality metric for in-service image quality assessment,” in  Joint First Workshop on Mobile Future and Symp. on Trends in Communications (SympoTIC ’03) , pp. 71 –74 (2003).CrossRef
Gunawan  I. P., and Ghanbari  M., “Reduced reference picture quality estimation by using local harmonic amplitude information,” in  Proc. London Communications Symp. , pp. 137 –140 (2003).
Chono  K.  et al., “Reduced-reference image quality assessment using distributed source coding,” in  2008 IEEE Int. Conf. on Multimedia and Expo , pp. 609 –612 (2008).CrossRef
Carnec  M., , Le Callet  P., and Barba  D., “An image quality assessment method based on perception of structural information,” in  Proc. 2003 Int. Conf. on Image Processing (ICIP ’03) , Vol. 3, pp. 185 –188 (2003).CrossRef
Carnec  M., , Le Callet  P., and Barba  D., “Visual features for image quality assessment with reduced reference,” in  Proc. IEEE Int. Conf. Image Processing , Vol. 1, pp. 421 –424 (2005).CrossRef
Barlow  H. B., “Possible principles underlying the transformation of sensory messages,” in Sensory Communication. , and Rosenblith  W. A., Ed., pp. 217 –234,  MIT Press ,  Cambridge, Massachusetts  (1961).
Simoncelli  E. P., and Olshausen  B., “Natural image statistics and neural representation,” Annu. Rev. Neurosci.. 24, , 1193 –1216 (2001). 0147-006X CrossRef
ITU-T, “Perceptual visual quality measurement techniques for multimedia services over digital cable television networks in the presence of a reduced bandwidth reference,” ITU-T Rec. J.246, 2008, http://www.itu.int/rec/T-REC-J.246/en (22  June  2016).
Pinson  M. H., and Wolf  S., “A new standardized method for objectively measuring video quality,” IEEE Trans. Broadcast.. 50, (3 ), 312 –322 (2004). 0018-9316 CrossRef
Ma  L., , Li  S., and Ngan  K. N., “Reduced-reference video quality assessment of compressed video sequences,” IEEE Trans. Circuits Syst. Video Technol.. 22, (10 ), 1441 –1456 (2012).CrossRef
Razaak  M., , Martini  M., and Savino  K., “A study on quality assessment for medical ultrasound video compressed via HEVC,” IEEE J. Biomed. Health Inform.. 18, (5 ), 1552 –1559 (2014).CrossRef
Koumaras  H., “Method for predicting perceived quality of encoded video in relation to encoding bit rate and spatiotemporal content dynamics,” PhD Dissertation, University of Athens, Computer Science and Telecommunications Department, Athens, Greece (2007).
Koumaras  H.  et al., “Quantified PQoS assessment based on fast estimation of the spatial and temporal activity level,” Multimedia Tools Appl.. 34, (3 ), 355 –374 (2007).CrossRef
Hannover University, ftp://ftp.tnt.uni-hannover.de (22  June  2016).
Video quality expert group (VQEG), “Final report from the video quality experts group on the validation of objective models of video quality assessment II,” http://www.vqeg.org (22  June  2016).
Seshadrinathan  K.  et al., “A subjective study to evaluate video quality assessment algorithms,” Proc. SPIE. 7527, , 75270H  (2010). 0277-786X CrossRef
Gunawan  I. P., and Ghanbari  M., “Reduced-reference video quality assessment using discriminative local harmonic strength with motion consideration,” IEEE Trans. Circuits Syst. Video Technol.. 18, (1 ), 71 –83 (2008). 1051-8215 CrossRef
Yang  S., “Reduced reference MPEG-2 picture quality measure based on ratio of DCT coefficients,” Electron. Lett.. 47, (6 ), 382 –383 (2011). 0013-5194 CrossRef
Chandler  D. M., and Hemami  S. S., “VSNR: a wavelet-based visual signal-to-noise ratio for natural images,” IEEE Trans. Image Process.. 16, (9 ), 2284 –2298 (2007). 1057-7149 CrossRef

Michail-Alexandros Kourtis received his diploma and master’s degree in computer science from Athens University of Economics and Business in 2011 and 2014, respectively. Since January 2010, he has been working at the National Center of Scientific Research (NCSR) “Demokritos” in various research projects. Currently, he is pursuing his PhD at the University of the Basque Country [universidad de pais vasco/Euskal Herriko Unibertsitatea (UPV/EHU)]. His research interests include video processing, video quality assessment, image processing, network function virtualization, and software defined.

Harilaos Koumaras received his BSc degree in physics in 2002, his MSc degree in electronic automation and information systems in 2004 and his PhD in 2007 in computer science, all awarded from the University of Athens. He is a research associate at the NCSR “Demokritos,” with publications at international conferences, journals, and book chapters. Currently, he is the author or coauthor of more than 80 scientific papers, numbering at least 350 nonself-citations.

Fidel Liberal received his MSc degree in telecommunication engineering in 2001. He received his PhD from the University of the Basque Country (UPV/EHU) in 2005 for his work in the area of holistic management of quality in telecommunications services. He has coauthored more than 80 conference and journal papers with more than 300 citations. His current research interests include quality management and multicriteria optimization for multimedia services in 5G.

© The Authors. Published by SPIE under a Creative Commons Attribution 3.0 Unported License. Distribution or reproduction of this work in whole or in part requires full attribution of the original publication, including its DOI.

Citation

Michail-Alexandros Kourtis ; Harilaos Koumaras and Fidel Liberal
"Reduced-reference video quality assessment using a static video pattern", J. Electron. Imaging. 25(4), 043011 (Jul 19, 2016). ; http://dx.doi.org/10.1117/1.JEI.25.4.043011


Figures

Graphic Jump Location
Fig. 1
F1 :

Typical SSIM index evaluation with original and target video sequences available at the same site.

Graphic Jump Location
Fig. 2
F2 :

SRR evaluation method using a reference video pattern as reference at both the service provider and end-user sites.

Graphic Jump Location
Fig. 3
F3 :

Variation of: (a) SSIMot, (b) ideal SSIMtr, and (c) real conditions SSIMtr versus QP.

Graphic Jump Location
Fig. 4
F4 :

Qualitative comparison of RSS and SSIM index applied on Kristen&Sara video sequence with QP=32.

Graphic Jump Location
Fig. 5
F5 :

Variation of MAPD versus QP: (a) experimental and (b) theoretical.

Graphic Jump Location
Fig. 6
F6 :

Scatter plot of objective RSS scores versus DMOS with the best-fitting logistic function.

Tables

Table Grahic Jump Location
Table 1MAPD for 40 test signals with different reference video patterns.
Table Grahic Jump Location
Table 2MAPD for SRR and SSIM index for a white video reference pattern.
Table Grahic Jump Location
Table 3Comparison of the performance of VQA algorithms—LCC and SROCC.

References

Koumaras  H., , Kourtis  M. A., and Martakos  D., “Benchmarking the encoding efficiency of H.265/HEVC and H.264/AVC,” in  Future Network & Mobile Summit 2012 , 4–6 July 2012,  Berlin, Germany  (2012).
Koumaras  H.  et al., “Impact of H.264 advanced video coding inter-frame block sizes on video quality,” in  VISAPP 2012 , 24–26 February 2012,  Rome, Italy  (2012).
Liberal  F.  et al., “Multimedia content delivery in SDN and NFV-based towards 5G networks,” IEEE COMSOC MMTC E-Lett.. 10, (4 ), 6 –10 (2015).
Wang  Z.  et al., “Image quality assessment: from error visibility to structural similarity,” IEEE Trans. Image Process.. 13, (4 ), 600 –612 (2004). 1057-7149 CrossRef
Koumaras  H.  et al., “A framework for end-to-end video quality prediction of MPEG video,” J. Visual Commun. Image Represent.. 21, , 139 –154 (2010).CrossRef
Seshadrinathan  K.  et al., “Study of subjective and objective quality assessment of video,” IEEE Trans. Image Process.. 19, (6 ), 1427 –1441 (2010). 1057-7149 CrossRef
Wang  Z., and Simoncelli  E. P., “Reduced-reference image quality assessment using a wavelet-domain natural image statistic model,” Proc. SPIE. 5666, , 149  (2005). 0277-786X CrossRef
Li  Q., and Wang  Z., “Reduced-reference image quality assessment using divisive normalization-based image representation,” IEEE J. Sel. Top. Signal Process.. 3, (2 ), 201 –211 (2009).CrossRef
Cover  T. M., and Thomas  J. A., Element of Information Theory. ,  Wiley ,  New York , (1991).
Ma  L.  et al., “Reduced-reference image quality assessment using reorganized DCT-based image representation,” IEEE Trans. Multimedia. 13, (4 ), 824 –829 (2011).CrossRef
Soundararajan  R., and Bovik  A. C., “RRED indices: reduced reference entropic differencing framework for image quality assessment,” in  Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing , pp. 1149 –1152 (2011).CrossRef
Redi  J. A.  et al., “Color distribution information for reduced-reference assessment of perceived image quality,” IEEE Trans. Circuits Syst. Video Technol.. 20, (12 ), 1757 –1769 (2010).CrossRef
Zeng  K., and Wang  Z., “Temporal motion smoothness measurement for reduced-reference video quality assessment,” in  Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing , pp. 1010 –1013 (2010).CrossRef
Zeng  K., and Wang  Z., “Quality-aware video based on robust embedding of intra and interframe reduced-reference features,” in  Proc. IEEE Int. Conf. Image Processing , pp. 3229 –3232 (2010).CrossRef
Le Callet  P., , Gaudin  C. V., and Barba  D., “Continuous quality assessment of MPEG2 video with reduced reference,” in  Proc. Int. Workshop on Video Processing Quality Metrics for Consumer Electronics , pp. 11 –16 (2005).
Wang  Z., , Sheihk  H. R., , Bovik  A. C., “Objective video quality assessment,” in The Handbook of Video Databases: Design and Application. , , Furht  B., and Marqure  O., Eds., pp. 1041 –1078,  CRC Press ,  Boca Raton, Florida  (2003).
Wang  Z., , Lu  L., and Bovik  A. C., “Video quality assessment based on structural distortion measurement,” Signal Process. Image Commun.. 19, (2 ), 121 –132 (2004). 0923-5965 CrossRef
Gunawan  I. P., and Ghanbari  M., “Reduced-reference picture quality estimation by using local harmonic amplitude information,” in  Proc. London Communications Symp. 2003 , pp. 137 –140,  University College London ,  United Kingdom  (2003).
Gunawan  I., and Ghanbari  M., “Reduced-reference video quality assessment using discriminative local harmonic strength with motion consideration,” IEEE Trans. Circuits Syst. Video Technol.. 18, , 71 –83 (2008). 1051-8215 CrossRef
Wang  Y. K.  et al., “Supervised automatic detection of UWB ground-penetrating radar targets using the regression SSIM measure,” IEEE Geosci. Remote Sens. Lett.. 13, (5 ), 621 –625 (2016).CrossRef
Chang  T.-C., , Xu  S. S.-D., and Su  S.-F., “SSIM-based quality-on-demand energy-saving schemes for OLED displays,” IEEE Trans. Syst. Man Cybern. Syst.. 46, (5 ), 623 –635 (2016).CrossRef
Fang  Y.  et al., “Optimized multioperator image retargeting based on perceptual similarity measure,” IEEE Trans. Syst. Man Cybern. Syst.. 99, , 1 –11 (2016).CrossRef
Webster  A.  et al., “An objective video quality assessment system based on human perception,” Proc. SPIE. 1913, , 15  (1993). 0277-786X CrossRef
Wolf  S., and Pinson  M. H., “Spatial-temporal distortion metric for in-service quality monitoring of any digital video system,” Proc. SPIE. 3845, , 266 –277 (1999). 0277-786X CrossRef
Wolf  S., and Pinson  M. H., “Low bandwidth reduced reference video quality monitoring system,” in  Proc. Int. Workshop on Video Processing Quality Metrics for Consumer Electronics , pp. 76 –79 (2005).
Kusuma  T., and Zepernick  H.-J., “A reduced-reference perceptual quality metric for in-service image quality assessment,” in  Joint First Workshop on Mobile Future and Symp. on Trends in Communications (SympoTIC ’03) , pp. 71 –74 (2003).CrossRef
Gunawan  I. P., and Ghanbari  M., “Reduced reference picture quality estimation by using local harmonic amplitude information,” in  Proc. London Communications Symp. , pp. 137 –140 (2003).
Chono  K.  et al., “Reduced-reference image quality assessment using distributed source coding,” in  2008 IEEE Int. Conf. on Multimedia and Expo , pp. 609 –612 (2008).CrossRef
Carnec  M., , Le Callet  P., and Barba  D., “An image quality assessment method based on perception of structural information,” in  Proc. 2003 Int. Conf. on Image Processing (ICIP ’03) , Vol. 3, pp. 185 –188 (2003).CrossRef
Carnec  M., , Le Callet  P., and Barba  D., “Visual features for image quality assessment with reduced reference,” in  Proc. IEEE Int. Conf. Image Processing , Vol. 1, pp. 421 –424 (2005).CrossRef
Barlow  H. B., “Possible principles underlying the transformation of sensory messages,” in Sensory Communication. , and Rosenblith  W. A., Ed., pp. 217 –234,  MIT Press ,  Cambridge, Massachusetts  (1961).
Simoncelli  E. P., and Olshausen  B., “Natural image statistics and neural representation,” Annu. Rev. Neurosci.. 24, , 1193 –1216 (2001). 0147-006X CrossRef
ITU-T, “Perceptual visual quality measurement techniques for multimedia services over digital cable television networks in the presence of a reduced bandwidth reference,” ITU-T Rec. J.246, 2008, http://www.itu.int/rec/T-REC-J.246/en (22  June  2016).
Pinson  M. H., and Wolf  S., “A new standardized method for objectively measuring video quality,” IEEE Trans. Broadcast.. 50, (3 ), 312 –322 (2004). 0018-9316 CrossRef
Ma  L., , Li  S., and Ngan  K. N., “Reduced-reference video quality assessment of compressed video sequences,” IEEE Trans. Circuits Syst. Video Technol.. 22, (10 ), 1441 –1456 (2012).CrossRef
Razaak  M., , Martini  M., and Savino  K., “A study on quality assessment for medical ultrasound video compressed via HEVC,” IEEE J. Biomed. Health Inform.. 18, (5 ), 1552 –1559 (2014).CrossRef
Koumaras  H., “Method for predicting perceived quality of encoded video in relation to encoding bit rate and spatiotemporal content dynamics,” PhD Dissertation, University of Athens, Computer Science and Telecommunications Department, Athens, Greece (2007).
Koumaras  H.  et al., “Quantified PQoS assessment based on fast estimation of the spatial and temporal activity level,” Multimedia Tools Appl.. 34, (3 ), 355 –374 (2007).CrossRef
Hannover University, ftp://ftp.tnt.uni-hannover.de (22  June  2016).
Video quality expert group (VQEG), “Final report from the video quality experts group on the validation of objective models of video quality assessment II,” http://www.vqeg.org (22  June  2016).
Seshadrinathan  K.  et al., “A subjective study to evaluate video quality assessment algorithms,” Proc. SPIE. 7527, , 75270H  (2010). 0277-786X CrossRef
Gunawan  I. P., and Ghanbari  M., “Reduced-reference video quality assessment using discriminative local harmonic strength with motion consideration,” IEEE Trans. Circuits Syst. Video Technol.. 18, (1 ), 71 –83 (2008). 1051-8215 CrossRef
Yang  S., “Reduced reference MPEG-2 picture quality measure based on ratio of DCT coefficients,” Electron. Lett.. 47, (6 ), 382 –383 (2011). 0013-5194 CrossRef
Chandler  D. M., and Hemami  S. S., “VSNR: a wavelet-based visual signal-to-noise ratio for natural images,” IEEE Trans. Image Process.. 16, (9 ), 2284 –2298 (2007). 1057-7149 CrossRef

Some tools below are only available to our subscribers or users with an online account.

Related Content

Customize your page view by dragging & repositioning the boxes below.

Related Book Chapters

Topic Collections

PubMed Articles
Advertisement
  • Don't have an account?
  • Subscribe to the SPIE Digital Library
  • Create a FREE account to sign up for Digital Library content alerts and gain access to institutional subscriptions remotely.
Access This Article
Sign in or Create a personal account to Buy this article ($20 for members, $25 for non-members).
Access This Proceeding
Sign in or Create a personal account to Buy this article ($15 for members, $18 for non-members).
Access This Chapter

Access to SPIE eBooks is limited to subscribing institutions and is not available as part of a personal subscription. Print or electronic versions of individual SPIE books may be purchased via SPIE.org.