Regular Articles

Three-dimensional visual comfort assessment via preference learning

[+] Author Affiliations
Qiuping Jiang, Feng Shao, Gangyi Jiang, Mei Yu, Zongju Peng

Ningbo University, Faculty of Information Science and Engineering, Fenghua Road 818, Ningbo 315211, China

J. Electron. Imaging. 24(4), 043002 (Jul 20, 2015). doi:10.1117/1.JEI.24.4.043002
History: Received December 15, 2014; Accepted June 18, 2015
Text Size: A A A

Open Access Open Access

Abstract.  Three-dimensional (3-D) visual comfort assessment (VCA) is a particularly important and challenging topic, which involves automatically predicting the degree of visual comfort in line with human subjective judgment. State-of-the-art VCA models typically focus on minimizing the distance between predicted visual comfort scores and subjective mean opinion scores (MOSs) by training a regression model. However, obtaining precise MOSs is often expensive and time-consuming, which greatly constrains the extension of existing MOS-aware VCA models. This study is inspired by the fact that humans tend to conduct a preference judgment between two stereoscopic images in terms of visual comfort. We propose to train a robust VCA model on a set of preference labels instead of MOSs. The preference label, representing the relative visual comfort of preference stereoscopic image pairs (PSIPs), is generally precise and can be obtained at much lower cost compared with MOS. More specifically, some representative stereoscopic images are first selected to generate the PSIP training set. Then, we use a support vector machine to learn a preference classification model by taking a differential feature vector and the corresponding preference label of each PSIP as input. Finally, given a testing sample, by considering a full-round paired comparison with all the selected representative stereoscopic images, the visual comfort score can be estimated via a simple linear mapping strategy. Experimental results on our newly built 3-D image database demonstrate that the proposed method can achieve a better performance compared with the models trained on MOSs.

Figures in this Article

With the significant progress in three-dimensional (3-D) display and other technologies, 3-D related multimedia service has witnessed emerging deployments in recent years. The additional depth information contained in 3-D content can greatly enhance the visual experience. However, despite the immersive visual experience from three dimensions, there are an increasing number of people reporting the annoyance of visual discomfort or visual fatigue when watching 3-D visual signals. Various factors, e.g., excessive disparity, unnatural blur, spatial frequency, interview mismatch, crosstalk, and so on, will all evoke visual discomfort/fatigue to viewers.14 Visual comfort is still a prominent and unsettling problem in various 3-D applications. Therefore, how to quantify the impacts of these factors on visual comfort and how to design efficient visual comfort assessment (VCA) metrics are of great significance for the further development of advanced 3-D video systems.

The issues have been widely investigated by subjective experiments to understand the relevant factors affecting the visual comfort.5,6 In the literature, it has been demonstrated that 3-D scenes with a larger amount of disparities are more likely to evoke visual discomfort. Additionally, the crossed disparities (perceived in front of the screen) and uncrossed disparities (perceived behind the screen) have different impacts on visual comfort, and the crossed disparities play a more important role in affecting 3-D visual comfort.79 When viewing 3-D images, according to the binocular fusion mechanism in the human visual system (HVS), stereoscopic depth perception will occur when depth cues created by binocular disparity exhibit distribution regularities similar to the ones in the real world. However, binocular fusion takes place only in a small finite region around the fixation point. This particular small finite region is known as the binocular fusion limit. In Refs. 10 and 11, a zone of clear single binocular vision (ZCSBV) is defined to quantitatively describe the binocular fusion limit within which the objects can be fused into a single and clear vision. Regions with excessive binocular disparities may locate outside the ZCSBV, inducing severe diplopia symptoms to viewers. Besides binocular disparity, spatial frequency was also reported to be correlated with visual discomfort as demonstrated in Refs. 12 and 13. Previous research has revealed that the binocular fusion limit increased as the spatial frequency decreased by conducting subjective experiments on visual stimulus smoothed by the difference of a Gaussian filter. In general, stereoscopic images with larger spatial frequency tend to have a greater probability of inducing visual discomfort under the same binocular disparity. Therefore, binocular disparities (especially crossed disparity) together with spatial frequency are usually computed as important predictive features for objective VCA. From another perspective, unnatural conflict between human eyes’ accommodation and convergence mechanism is considered as the essential factor to 3-D visual discomfort.1417 Viewers have different manifestations of accommodation and convergence under different viewing conditions, such as natural viewing and 3-D viewing. In natural viewing, the accommodation and convergence distance are always at the same distance and therefore are consistent to one another. In 3-D viewing, the convergence distance is unconsciously adjusted to the depth location of the perceived object, which may locate behind or in front of screen plane, whereas the accommodation still remains on the screen. This unnatural conflict between accommodation and convergence created by 3-D viewing goes against the normal visual physiological mechanism of HVS, leading to the occurrence of visual discomfort. For more comprehensive factors relevant to visual comfort, researchers can refer to Refs. 14.

Recently, some objective VCA metrics have been proposed by exploiting the relationship between visual-comfort-related features and corresponding human subjective scores [e.g., mean opinion scores (MOSs)] via different regression models.1825 For better representation, we call them MOS-aware models throughout this paper. The MOS-aware models aim to train a robust VCA model by treating the MOS as a benchmark. For example, Sohn et al.18 investigated the effect of depth perception on 3-D visual comfort by applying polynomial regression model to establish the relationship between binocular disparity and the degree of visual comfort. Jung et al.19 proposed a VCA index to estimate the degree of visual comfort by using a logarithmic function to map from the disparity features to the degrees of visual comfort. Kim and Sohn20 proposed a visual fatigue prediction metric by considering excessive horizontal and vertical disparities. In this metric, an additive first-order linear regression was adopted as the combined scheme to predict the overall degree of visual fatigue. Lee et al.21 exploited the effect of stimulus width on perceived visual comfort by measuring subjective visual comfort and binocular fusion time. Sohn et al.22 derived an efficient VCA model based on the object-dependent disparity features (i.e., relative disparity and object thickness) and used support vector regression (SVR) to construct the relationship between disparity features and MOSs. Recently, Park et al.23 proposed to extract features predictive of vergence and accommodation conflict when viewing a stereoscopic display and then derived an efficient VCA model named the 3-D accommodation-vergence mismatches (AVM) predictor by deploying the SVR as the regression tool between the predictive features and MOSs. Other relevant works can also be found in Refs. 24 and 25.

The aforementioned MOS-aware VCA metrics are based on the 2-norm to optimize the distance between the predicted scores and the human subjective MOSs, which are generated from large-scale standard subjective studies. In the standard subjective studies for visual comfort, each observer is asked to assign each image a number within a specific range, e.g., 1 to 5, reflecting a specific degree of experienced visual discomfort. Afterwards, all the graded scores are collected and screened for MOS generation. Although these metrics seem rather effective for VCA, they still suffer from the following limitations. First, observers cannot definitely rate an image in such fine-graded absolute judgment. In order to obtain precise MOS value, the process of alignment for raw absolute graded judgments is required, which is usually sophisticated.26 Second, the quality scales with different subjective testing methodologies,27 e.g., absolute category rating (ACR), subjective assessment methodology for video quality, are inconsistent. Hence, one MOS-aware VCA model trained on a specific database may not work well on other databases. Third, a large scale of subjective assessment must be conducted to obtain graded judgments, which is generally time-consuming, cumbersome, and expensive. These limitations motivate us to train a VCA model without MOSs while preserving comparable or even better performance than the models trained on MOSs. In fact, the same attempt has been made in blind image quality assessment.2830

Based upon such considerations and inspired by the fact that human tends to conduct a preference judgment between two stereoscopic images, we propose to train a robust VCA model from a set of preference labels. The preference label, representing the relative visual comfort of preference stereoscopic image pair (PSIP), is generally precise and consistent.3133 After constructing the PSIP set with reliable preference labels, the task of VCA can be simplified as a preference classification problem. In brief, the main contributions of this work are summarized as follows:

  1. A new stereoscopic image database for VCA task is constructed, and stereoscopic images with significant subjective visual comfort differences are selected to construct PSIPs as the training database.
  2. The positive- and negative-labeled PSIPs are simultaneously contained in the training database, and visual comfort related features from zone of comfort, depth of focus (DoF), and spatial frequency are integrated to represent the visual comfort properties of a stereoscopic image.
  3. Once the preference classification model is constructed, the task of VCA can be simplified as a preference classification problem, which is demonstrated to be much more robust and consistent with human subjective perception.

The remainder of this paper is organized as follows. Section 2 discusses the motivation and problem descriptions. Details of the proposed VCA metric are presented in Sec. 3. In Sec. 4, we illustrate some details about the newly built 3-D image database. Extensive experiments are conducted and analyzed in this section as well. Conclusions are drawn in Sec. 5.

It is worth noting that generating positive- and negative-labeled PSIPs in an efficient way is of great importance in this paper. In this section, we discuss the disadvantages of the existing MOS-aware VCA metrics followed by the concept of pairwise comparison and investigate how to generate PSIPs with reliable preference labels through a pairwise comparison methodology from the existing database.

Pairwise Comparison

From the perspective of objective VCA, based on the obtained human subjective scores (e.g., MOSs), many MOS-aware VCA metrics were proposed by training feature-score regression models.1825 These metrics focus on minimizing the distance between the predicted and human subjective scores using 2-norm optimization. This 2-norm optimization process can be formulated as Display Formula

α*=argminα{iΦα(xi)mi},(1)
where Φα(·) denotes a regression function to predict the visual comfort score, α is the regression parameter vector, and mi is the human subjective score usually given by MOS. The regression function takes the visual comfort feature vector (e.g., xi) as input and the predicted visual comfort score as output. The regression is an iterative learning process to find an optimal regression parameter vector α* over the entire training set.

However, this kind of 2-norm optimization requires a large number of training samples associated with MOSs, which are generally imprecise and expensive to obtain. Moreover, due to the inconsistency of the quality scale by different subjective testing methodologies, an MOS-aware VCA model trained on a specific database may not perform well on other databases. Therefore, from this point, we consider that MOS may not be an optimal benchmark for training an assuredly robust VCA model.

On the other hand, previous studies have revealed that humans tend to conduct preference judgment rather than an absolute scale measurement.3133 In order to better imitate the preference manner of HVS, the pairwise comparison methodology has been widely used in subjective quality evaluation.34,35 Compared to the traditional ACR testing methodology,36,37 in pairwise comparison experiments, each observer is only asked to make a judgment discriminating which one is more comfortable based on the relative experienced visual comfort of two stereoscopic images. Considering subjective VCA using the pairwise comparison methodology, when the visual comfort strengths of the two stereoscopic images are similar, the observers are difficult and confused to make an absolute judgment of the levels of experienced visual comfort, especially when the disparity ranges of the two stereoscopic images are similar while the image content are diverse, as shown in Fig. 1. In this case, it is unreasonable and confusing to label the preference. Thus, in order to produce reliable preference labels, a large number of observers and sophisticated data processing are needed, leading to the same limitations involved in traditional ACR subjective testing. However, if the difference between two stereoscopic images in terms of visual comfort scales is sufficiently large, observers can easily discriminate between them, regardless of the factors from image content or viewing conditions. Moreover, different observers may offer a consistent and unanimous preference label. This is consistent with the observations in Fig. 2. Since the visual comfort difference of the presented two stereoscopic images is sufficiently large, observers consistently prefer the left-side stereoscopic image rather than the right-side one without exception. In other words, it means that only a very small number of observers are required to generate valid preference labels for stereoscopic image pairs with a sufficiently large difference in terms of visual comfort.

Graphic Jump Location
Fig. 1
F1 :

Example of two stereoscopic images with small difference in terms of visual comfort. In this situation, it is difficult and confusing for observers to label the preference. Therefore, in order to obtain the reliable preference label, a large number of observers are required, leading to the same limitation as traditional absolute category rating subjective testing.

Graphic Jump Location
Fig. 2
F2 :

Example of two stereoscopic images with large difference in terms of visual comfort. In this situation, it is easy and consistent for the observers to label the preference. Therefore, only a small number of observers are needed to generate a reliable preference label.

The previous discussions have shown that the MOS-aware VCA model is both inefficient and sensitive to different databases, whereas the preference label obtained by preference judgment, representing the relative visual comfort of two stereoscopic images, is generally precise and robust. Furthermore, preference label of stereoscopic image pairs with sufficiently large difference in visual comfort can be generated at lower cost (only a rather small number of observers are needed). Taking all these issues into account, we are inspired to train a better VCA model on a training database with a sufficiently large visual comfort difference. For better representation, the stereoscopic image pair with sufficiently large visual comfort difference is termed the PSIP hereafter. Since we aim to train a VCA model on the training database composed of a set of PSIPs, how to generate PSIPs efficiently via pairwise comparison is of great importance to the success of the proposed framework.

PSIP Training Set Generation

The proposed VCA metric is largely dependent on the selected PSIPs and their corresponding preference labels. In this paper, we introduce a strategy to generate reliable PSIPs from an existing NBU 3D-VCA image database. This database consists of 82 indoor and 118 outdoor stereoscopic images in all with a full HD resolution of 1920×1080 pixels. All the images are captured at the campus of Ningbo University using a 3-D digital camera with dual lenses (Sony HDR-TD30E). The corresponding MOS is provided for each stereoscopic image in the database. More information about this database will be presented in Sec. 4.1.

We first divide all the stereoscopic images into five classes based on their subjective scores (i.e., MOSs). In the detailed implementation, M representative stereoscopic images are selected from the five classes (M/5 stereoscopic images from each class) to cover all possible ranges of visual comfort, denoted by C1, C2, C3, C4, and C5, respectively. Here, M is set to 50. The details of the selection procedure are listed in Table 1, and the selected training stereoscopic images and the corresponding MOSs are shown in Fig. 3. It is clear that the selected training database definitely covers wide ranges of visual comfort (from extremely uncomfortable to very comfortable). This ensures that two stereoscopic images with sufficient discrimination can be generated from this database.

Table Grahic Jump Location
Table 1The selection procedure of representative training stereoscopic images.
Graphic Jump Location
Fig. 3
F3 :

The selected representative training stereoscopic images and the corresponding mean opinion scores (MOSs).

In order to keep sufficient discrimination of two stereoscopic images, only those stereoscopic images with sufficiently large difference of MOSs are selected to construct the PSIP training set, whereas those with similar or small difference of MOSs will not be contained in the training set. The generation of the PSIP training set is therefore formulated as follows.

  • Let Ik(i) represent the kth stereoscopic image in Ci and Ih(j) represent the hth stereoscopic image in Cj. If the visual comfort level of Ik(i) is significantly better than that of Ih(j), denote +1 as the preference label for (Ik(i),Ih(j)) (+1 denotes positive-labeled PSIP); otherwise, 1 is denoted as the preference label for (Ik(j),Ih(i)) (1 denotes negative-labeled PSIP).
  • The positive-labeled PSIP subset P+ is given by Display Formula
    P+{(Ik(i),Ih(j))|ji2k,h=1,,M}.(2)
  • The negative-labeled PSIP subset P is given by Display Formula
    P{(Ih(j),Ik(i))|ij2k,h=1,,M}.(3)

For each PSIP in P+, it has always an opposite PSIP in P. Consequently, the constructed PSIP set P will contain two subsets (i.e., positive-labeled subset P+ and negative-labeled subset P). To be more specific, we randomly select N/2 PSIPs from P+ and N/2 PSIPs from P (if one PSIP is selected in P+, its opposite version of PSIP will not be included in P to eliminate the possible overlaps). Finally, a PSIP training set with N PSIPs (N/2 positive-labeled PSIP and N/2 negative-labeled PSIP, respectively) will be obtained. The influence of different ratios between positive- and negative-labeled PSIPs will be further analyzed in Sec. 4.3.

Let L denote the set of preference labels in P, such that Display Formula

L={L1,,LN}{+1,1}N.(4)

A point worth emphasizing is that the generated training PSIP set is based on MOS. It seems that the PSIP training set generation strategy still requires the MOS that suffers from the same limitation with traditional MOS-aware metrics. However, as an alternative, the PSIPs can also be generated by pairwise comparisons only with a small number of observers. For example, we first collect n stereoscopic images that are diverse in visual comfort and image content. Then, we randomly construct m stereoscopic image pairs from the collected stereoscopic images. Finally, we invite one observer to assign preference labels for the constructed stereoscopic image pairs. If the observer can easily and certainly discriminate the relative visual comfort, the preference labels will be assigned as +1 or 1; otherwise, the corresponding preference label will be assigned as 0. Especially, only those stereoscopic image pairs with positive and negative preference labels are selected to generate the PSIP training set. The procedure of the pairwise comparison experiment is rather convenient and flexible.38 Only one observer is needed to conduct preference judgment in the experiment. Meanwhile, the observer is not required to assess all the pairs in a single session, because there is practically no contextual effect or scale mismatch problem, and PSIPs judged in different sessions can be directly aggregated into a single data set for future applications without complex processing. Therefore, in this regard, we can conclude that the generation of PSIPs is convenient and the preference labels are reliable, avoiding the limitations in the acquisition of MOS.

The high-level diagram for the proposed VCA method is shown in Fig. 4. The overall process is composed of two stages: training and testing. We aim to train a robust VCA model from a set PSIPs associated with preference labels. The essence of the proposed metric is to construct a mapping function from differential feature vectors to their corresponding preference labels from a set of PSIPs. Specifically, in the training stage, we first construct a PSIP training set from existing database to learn a binary preference classification model via a support vector machine (SVM). Particularly, features from zone of comfort, DoF, and spatial frequency are extracted and integrated to represent the visual comfort properties of a stereoscopic image. In the testing stage, for a given testing stereoscopic image, its corresponding preference labels relative to each training stereoscopic image are predicted with respect to the learned preference classification model, and the final visual comfort score is obtained from all the predicted preference labels via a simple mapping strategy.

Graphic Jump Location
Fig. 4
F4 :

The high-level diagram for the proposed visual comfort assessment (VCA) method.

Integrated Visual Comfort Features

As investigated in previous studies, visual discomfort can be induced by many factors, such as excessive binocular disparity, accommodation-convergence conflict, binocular mismatch, depth motion, and so on. In this paper, we adopt a fusion of the features that have been investigated in the existing studies, i.e., zone of comfort, DoF, and spatial frequency.

Zone of comfort

Binocular disparity is an important factor for 3-D visual comfort, because excessive binocular disparity that exceeds maximum tolerated depth perception may fail to form binocular fusion. If the disparity does not lie within the zone of comfort, the viewer will feel visual discomfort. Based on the cognitive psychology discoveries, the zone of comfort is typically defined to be where the angular disparity is in the range of (1deg, 1 deg). It is known that objects producing crossed disparity are perceived in front of the screen, whereas objects producing uncrossed disparity are perceived behind the screen. Meanwhile, the crossed disparities are especially important in affecting visual comfort. Therefore, to characterize the factor of zone of comfort, we compute the features’ disparity range (d1), mean crossed disparity (d2), mean uncrossed disparity (d3), and relative depth (d4). The definitions of these features are as follows: Display Formula

d1=max{d(x,y)}min{d(x,y)},(5)
Display Formula
d2=1A1(x,y)Ω|d(x,y)|,(6)
Display Formula
d3=1A2(x,y)Ω+|d(x,y)|,(7)
Display Formula
d4=d2/d3,(8)
where ΩandΩ+ are the pixel sets with uncrossed and crossed disparities, respectively; and A1 and A2 are the number of pixels in each set, respectively.

Depth of focus

In order to capture the human eye fixation, HVS attempts to perceive a sharp image by focusing accommodations within the DoF.39 The DoF can serve as a threshold on accommodation to form a comfortable viewing zone. When accommodation focuses on an object, the object is sharply perceived, whereas when objects are placed out of the focal range (away from the accommodation), out-of-focus blur will occur. Thus, DoF can measure the human eyes’ capacity in tolerating retinal defocus to perceive a sharp object without any accommodation adjustment. To better quantify the range of DoF, the retinal defocus circle (RDC) is defined. The radius of the RDC increases with the distance of the perceived objects from the screen plane (i.e., the accommodation point) and can be used to capture the occurrences of blur, as depicted in Fig. 5. The radius γ of the RDC is calculated as40Display Formula

γ={ρ·(sV)·|1VP|,pixelswithuncrossseddisparitiesρ·(sV)·|1VN|,pixelswithcrosseddisparities,(9)
where P and N are the depths from uncrossed and crossed (positive and negative) disparities, respectively; ρ denotes the pupil diameter (approximately 0.3 cm); s denotes the nodal length (approximately 0.16 cm); and V represents the viewing distance (three times the height of the display in our experiment). The DoF map is defined at each pixel (x,y) in terms of the radius γ. Since depths created by crossed disparities will have a greater probability to induce visual discomfort, we separately compute the mean values of DoF map with uncrossed and crossed disparities. As a result, we use γ¯+ and γ¯ to characterize the DoF.

Graphic Jump Location
Fig. 5
F5 :

Geometrical illustration of depth-of-focus and retinal defocus circle for different depth distances (different disparities).

Spatial frequency

Binocular fusion limit is also dependent on various spatial and temporal fixation properties, e.g., exposure duration,41 spatial frequency,12,13 and motion velocity in depth direction.42 Binocular fusion limit has been shown to increase with the decreased spatial frequency.12,13 Generally speaking, larger spatial frequency tends to have a greater probability to induce visual discomfort. Here, we investigate the role of spatial frequency as an important factor to 3-D visual comfort. For simplicity, we directly use the method presented in 43 to calculate spatial frequency fr by applying Sobel operator to the right-view image. The ratios between spatial frequency and binocular disparity features are computed to interpret their interaction effects. The spatial frequency related features are defined as Display Formula

fr=1H(x,y)SB(x,y)255,(10)
Display Formula
τ1=frμD,τ2=frd1,τ3=frd4,(11)
where SB(x,y) denotes the Sobel response, H is the number of pixels in an image, and μD is the mean absolute disparity.

Finally, all features are concatenated into a single vector Fall by Display Formula

Fall=[d1,d2,d3,d4,γ¯+,γ¯,fr,τ1,τ2,τ3].(12)
Thus, Fall is of the dimension 10.

Preference Learning via SVM

For each PSIP in the training set, Pk=(Ii,Ij)P, k=1,,N, we first compute its differential feature vector F˜kall by Display Formula

F˜kall=FiallFjall,(13)
where Fiall and Fjall are the visual comfort feature vectors of Ii and Ij, respectively.

Then, the training stage involves learning a mapping function from the differential feature vectors F˜kall to their preference labels Lk{+1,1}, which can be achieved by using the existing classification algorithms. In the detailed implementation, we use a SVM to solve this classification problem due to its high accuracy and low risk of overfitting compared with other competing algorithms. Specifically, for all samples in the PSIP training set {F˜kall,Lk}k=1N, we formulate the procedure by learning the mapping from F˜kall to Lk. Display Formula

Lk=l=1PwlK(F˜kall,F˜lall)+b,(14)
where wl is a weight vector and b is a real constant bias; K(F˜kall,F˜lall)=φ(F˜kall)T·φ(F˜lall), φ(·) is a kernel function and defines a feature mapping from the original input space to the high-dimensional feature space. By training on the selected PSIP training set, the optimal wl and the optimal b are obtained, denoted as woptl and bopt, respectively. In the implementation, the popular libSVM package44 was adopted with the default setting of radial basis kernel.

Visual Comfort Score Prediction

For a testing stereoscopic image It, we consider a full-round comparison with all the representative training stereoscopic images {Ii}i=1M. A set of preference labels {Lt,i}i=1M{+1,1} and the associated probabilities {pt,i}i=1M can be predicted with respect to the previously learned preference classification model, such that Display Formula

[Lt,i,pt,i]=l=1PwloptK(F˜t,iall,F˜lall)+bopt,(15)
where F˜t,iall denotes the differential feature vector between It and Ii.

Then, based on the predicted preference labels {Lt,i}i=1M{+1,1} and the corresponding probabilities {pt,i}i=1M, a preference gain referred to different ranges of visual comfort (from C1 to C5) is calculated by Display Formula

gt=i=1M[Lt,i·pt,i+(Lt,i)·(1pt,i)].(16)

The remaining issue is how to map the predicted preference gain to final visual comfort score. In order to establish the mapping function between gt and final visual comfort score qt, we propose to solve this problem by considering all preference gains in the training stage. For each representative training stereoscopic image Ii, its corresponding preference gain is similarly calculated by summing all the preference labels {Li,j}j=1,jiM associated with {(Ii,Ij)}j=1,jiM, such that Display Formula

gi=ji,j=1MLi,j.(17)

The maximum and minimum preference gains in Eq. (17) correspond to the two extreme cases (extremely uncomfortable to very comfortable), to cover the visual comfort scores from 1 to 5. Then, a mapping function (linear function is considered for simplicity) can be established from [A,B] to [1, 5] to quantify the perceived visual comfort score. Display Formula

R()=[A,B][1,5],(18)
where A=max[gi] and B=min[gi].

Finally, for the testing stereoscopic image It, with the computed preference gain gt, its visual comfort score can be estimated by Display Formula

qt=R(gt)15,(19)
where 15 denotes the normalization operation mapping to [1,5].

In this section, we will first illustrate the construction details of our NBU 3D-VCA image database, followed by comprehensive experiments on two databases to evaluate the performance of the proposed VCA model with other representative models. Finally, the influences of different training set generation methods and classification algorithms are analyzed.

NBU 3D-VCA Image Database
3-D image acquisition

A total of 200 stereoscopic images with full HD (1920×1080 pixels) resolution were contained in our NBU 3D-VCA image database. All images are captured at the campus of Ningbo University using a Sony HDR-TD30E dual-lens 3-D camera. The content includes indoor and outdoor scenes with a large variety of color, texture, and depth ranges (82 indoor scenes and 118 outdoor scenes). The maximum range of crossed disparity in the dataset ranges from 0.02 to 4.79 deg to comprehensively reflect various degrees of visual comfort, even though the range of the disparity is not so high in the current TV or cinema 3-D contents. Forty selected right images in the database are shown in Fig. 6. We use the stereo matching algorithm presented in 45 for disparity estimation since its performance is prominent for high-quality stereoscopic images.

Graphic Jump Location
Fig. 6
F6 :

Forty selected right-view images in the NBU 3D-VCA image database.

Participants

Sixteen nonexpert adult viewers (seven females and nine males) with age ranging from 22 to 38 participated in the subjective evaluation of the database. All the participants in the experiment met the minimum stereo acuity less than 60 s of arc (sec-arc) and passed a color version test. The participants were asked to rate the stereoscopic images based on their experienced visual comfort.

Environment

The subjective tests were conducted in the laboratory designed for subjective quality tests according to the recommendations ITU-R BT.500-1146 and ITU-R 1438.47 All stereoscopic images were randomly displayed on a Samsung UA65F9000 65-in. Ultra HD 3-D LED TV. The resolution of the Ultra HD display was 3840×2160, and all the full HD stereoscopic images were shown in this resolution. Three-dimensional shutter glasses were used in the subjective experiments. The display had low crosstalk levels (left: 0.38%; and right: 0.15%) compared to the visibility threshold of crosstalk in 48. The peak luminance of this displayer was adjusted to 50cd/m2. The viewing distance was three times the height of the display screen.

Test methodology

A single-stimulus ACR test methodology described in ITU-T P.91036 and ITU-T P.91137 was used in the experiment. Each stereoscopic image was randomly displayed on the screen lasting for 10 s, and another 5 s for voting. The subjective ratings for the stereoscopic image were obtained on a scale of 1 to 55=very comfortable, 4=comfortable, 3=mildly comfortable, 2=uncomfortable, and 1 = extremely uncomfortable. In the data processing stage, after detecting and discarding outliers of all opinion scores, the final MOS for each stereoscopic image was calculated as the mean of the remaining opinion scores. Figure 7 shows the MOS distribution of all stereoscopic images in the database (the red error bars represent the standard deviation of all opinion scores for each stereoscopic image). The selected stereoscopic images cover all degrees of visual comfort, e.g., 16 images with range from 1 to 2, 42 images with range from 2 to 3, 63 images with range from 3 to 4, and 79 images with range from 4 to 5.

Graphic Jump Location
Fig. 7
F7 :

MOS distribution of all stereoscopic images in the NBU 3D-VCA database.

Subjective assessment results analysis

To further investigate the consistency of subjective assessment results, we conducted two consistency analyses as in 49. In the first experiment, we randomly divide all participants into two nonoverlapping groups and evaluate the Spearman rank-order correlation coefficient (SRCC) between the MOSs obtained by the participants in the two groups. Figure 8(a) shows the curve of SRCC versus the number of participants. The error bars in Fig. 8(a) represent the standard deviation of SRCC after 50 random trials. It is obvious that larger SRCC with lower standard deviation can be achieved with more participants. In the second experiment, we randomly divide all participants into two nonoverlapped groups (each group has the same participants), denoted as group 1 and group 2. The top N ranked images are selected based on the MOSs in group 1, and for these selected images, their corresponding MOSs in group 2 are averaged. The process is repeated 50 times. Figure 8(b) shows the curve of average MOS versus N. The error bars in Fig. 8(b) also represent the standard deviation of MOS for 50 random trials. The average MOS values in the two groups are highly close. Therefore, our subjective assessment results have high consistency and can reflect the degree of experienced visual comfort.

Graphic Jump Location
Fig. 8
F8 :

Consistency analyses of human subjective assessment scores. (a) Spearman’s rank correlation between the MOSs given by the two nonoverlapping halves as a function of the mean number of opinion scores per image. (b) Images were ranked by MOSs from subjects in group 1 and plotted against the corresponding average MOSs given by subjects in group 2. Error bar plotted in Fig. 8 represents the standard deviation for 50 random trails.

Overall Performance Comparison
Performance indicators and experiment protocols

To quantify the performance of different VCA metrics, four commonly used performance indicators are used: Pearson linear correlation coefficient (PLCC), SRCC, Kendall rank-order correlation coefficient (KRCC), and root mean squared error (RMSE), between the objective and subjective scores. Among the four criteria, PLCC and RMSE are used to measure the prediction accuracy, and SRCC and KRCC are used to benchmark the prediction monotonicity. For a perfect match between the objective and subjective scores, PLCC=SRCC=KRCC=1, and RMSE=0.

The proposed VCA model is also tested on another publicly available IVY LAB stereoscopic 3-D image database.50 The database is also generated for visual comfort prediction task, which contains 120 stereoscopic image pairs with different binocular disparity ranges. Also, the mean opinion scores of visual comfort are provided to evaluate the performance of visual discomfort prediction. For the details of the database, refer to 50. In the detailed implementation, we select a subset of the NBU 3D-VCA image database (M=50) to learn a preference learning model and use the learnt model to test the remaining stereoscopic images in this database, and the entire IVY LAB database. Specifically, performances of the proposed method are the average results of 100 times random trails. For each trail, the positive- and negative-labeled PSIP subsets are randomly selected from the original PSIP training set.

Performance comparisons

We compare the proposed VCA model with five state-of-the-art models on the two databases, denoted as model 1,18 model 2,8 model 3,20 model 4,22 and model 5,25 respectively. In model 1, perceptually significant regions of stereoscopic images are extracted, and the relationship between binocular disparity and visual comfort is established by using a polynomial regression mode. In model 2, various visual fatigue factors were selected to predict the visual fatigue score using a linear combination model. In model 3, in our implementation, only horizontal disparity characteristics (method 1 in the paper) are utilized to estimate visual comfort score. In model 4, mean disparity, variance of disparity, maximum crossed disparity, relative disparity, and object thickness are extracted as representative visual comfort features. In method 5, based on the estimated 3-D visual importance map, perceptually significant disparity features (saliency-weighted absolute disparity and differential disparity features) are extracted to represent visual comfort features. In our implementation, the 3-D visual importance map was estimated by averagely weighing graph-based visual saliency-based saliency map51 and disparity map. The same SVR regression model (100 times 10-fold cross-validation trails) is used in method 4 and method 5.

In order to investigate the effect of different integrated visual comfort features, we design three integration schemes for comparison, denoted as model disparity, model DoF, and model frequency. Here, zone of comfort related features ([d1,d2,d3,d4]), DOF features ([γ¯+,γ¯]), and spatial frequency features ([fr,τ1,τ2,τ3]) are independently used to learn the preference model, and other settings are the same with the proposed scheme. Also, in order to demonstrate the superiority of the proposed preference model, a direct SVR-based scheme using the same feature vector Fall defined in Eq. (12) is designed for comparison, denoted as Fall-SVR. For a fair comparison, the performance of the Fall-SVR model on the IVY LAB database is measured by training on the entire NBU 3D-VCA database and testing on the IVY 3D-VCA database.

Table 2 shows the performance comparison results on both NBU 3D-VCA and IVY LAB databases. Obviously, the proposed model demonstrates the best performance on the NBU 3D-VCA database in terms of all indicators. Even though model 4 may be effective on the IVY LAB database in terms of PLCC and RMSE, the proposed model performs the best on SRCC and KRCC. Compared with model disparity, model DoF and model frequency schemes, zone of comfort related features and DoF features are especially significant in characterizing visual comfort. Compared with Fall-SVR model, it is expected that the proposed preference learning model is insensitive to different databases that present prominent performance.

Table Grahic Jump Location
Table 2Performance comparison results of different visual comfort assessment (VCA) models on both NBU 3D-VCA and IVY LAB databases (the cases in bold indicate the best performance).
Influences of Training Set

Since the proposed VCA model is essentially a learning-based model, it is necessary to verify whether the performance of the proposed model is sensitive to the training set. Towards this end, we design schemes with different ratios between positive- and negative-labeled PSIP training samples for training. Let K+ and K denote the number of training samples in P+ and P, respectively; the ratio is defined as ρ=K+/K. Table 3 shows the comparison results of different schemes with different ratios in the NBU 3D-VCA database. All the performances shown in Table 3 are the averaged results over 100 random trails. For each random trail, the positive- and negative-labeled samples are randomly selected based on the designated ratio. Obviously, the influence of different ratios is not significant in the proposed preference learning model, and more importantly, the adopted 11 ratio performs the best.

Table Grahic Jump Location
Table 3The comparison results of different schemes with different ratios between training samples in positive-labeled and negative-labeled subsets (the cases in bold indicate the best performance).
Influences of Classification Algorithm

In the proposed VCA model, we use the SVM classification models to conduct preference classification. In this subsection, we further conduct experiments to demonstrate the superiority of SVM over other existing classification algorithms. To be more specific, we replace the SVM with other classification algorithms, e.g., K-nearest neighbor (KNN),52 naïve Bayes classifier (NB),53 and random forest (RF),54 and other parts are the same with the proposed model. For better representation, we denote the three VCA models as proposedKNN, proposedNB, and proposedRF, respectively. Table 4 shows the comparison results of these models with different ratios on the NBU 3D-VCA database. Although proposedNB and proposedRF models can obtain performance comparable to the proposed model at a 1∶1 ratio, their performances decrease rapidly with increased imbalance of the training samples. The proposed model outperforms the three models at different ratios. Therefore, the SVM algorithm is more suitable for the proposed VCA model as a preference classification algorithm.

Table Grahic Jump Location
Table 4The influence of different classification algorithms (performances under different ratios are given).

This paper has presented a visual comfort assessment method via preference learning. Compared with the traditional MOS-aware metrics, the significant feature of the proposed method is that we propose to train a robust VCA model on a set of preference labels instead of MOSs. The main advantages of the proposed method are summarized as follows: (1) a new stereoscopic image database for VCA task is constructed, and stereoscopic images with significant subjective visual comfort differences are selected as PSIPs to construct the training database; (2) the positive- and negative-labeled PSIPs are simultaneously contained in the training database, and visual comfort related features such as zone of comfort, DoF, and spatial frequency are integrated to represent the visual comfort properties of a stereoscopic image; and (3) once the preference classification model is constructed, the task of VCA can be simplified into a preference classification problem, which is demonstrated to be much more robust and consistent with human subjective perception. In future work, to further augment the performance of the proposed framework, the following issues should be further addressed: (1) the diverse features relevant to perceived visual comfort should be comprehensively exploited; (2) multiple kernel learning may be more effective to different features by designing different kernels; a (3) a more comprehensive database comprehensively addressing various perceptual scales (e.g., image distortion, depth perception, and visual comfort) must be constructed via the paired comparison method.

This work was supported by the Natural Science Foundation of China (Grant Nos. 61271021, 61271270, and U1301257). It was also sponsored by the K.C. Wong Magna Fund at Ningbo University.

Lambooij  M.  et al., “Visual discomfort and visual fatigue of stereoscopic displays: a review,” J. Imaging Sci. Technol.. 53, (3 ), 030201  (2009).CrossRef
Lambooij  M., , IJsselsteijn  W. A., and Heynderickx  I., “Visual discomfort of 3-D TV: assessment methods and modeling,” Displays. 32, (4 ), 209 –218 (2011). 0141-9382 CrossRef
Urvoy  M.  et al., “How visual fatigue and discomfort impact 3D-TV quality of experience: a comprehensive review of technological, psychophysical, and psychological factors,” Ann. Telecommun.. 68, (11–12 ), 641 –655 (2013).CrossRef
Urvoy  M., , Barkowsky  M., and Le Callet  P., “Stereoscopic 3D-TV: visual comfort,” IEEE Trans. Broadcast.. 57, (2 ), 335 –346 (2011). 0018-9316 CrossRef
Li  J., , Barkowsky  M., and Le Callet  P., “Visual discomfort induced by relative disparity and planar motion of stereoscopic images,” in  Proc. of the First Sino French Workshop on Information and Communication Technologies , pp. 1 –2 (2011).
Li  J., , Barkowsky  M., and Le Callet  P., “Visual discomfort of stereoscopic 3D videos: influence of 3D motion,” Displays. 35, (1 ), 49 –57 (2014). 0141-9382 CrossRef
Wang  J.  et al., “Study of depth bias of observers in free viewing of still stereoscopic synthetic stimuli,” J. Eye Mov. Res.. 5, (5 ), 1 –11 (2012).CrossRef
Choi  J.  et al., “Visual fatigue modeling and analysis for stereoscopic video,” Opt. Eng.. 51, (1 ), 017206  (2012).CrossRef
Nojiri  Y.  et al., “Parallax distribution and visual comfort on stereoscopic HDTV,” in  Proc. IBC , pp. 373 –380 (2006).
Fry  G. A., “Further experiments on the accommodative convergence relationship,” Am. J. Optom.. 16, , 325 –334 (1939).CrossRef
Hofstetter  H., “The zone of clear single binocular vision,” Am. J. Optom.. 22, (7 ), 301 –384 (1945). 0002-9408 CrossRef
Schor  C., , Wood  I., and Ogawa  J., “Binocular sensory fusion is limited by spatial resolution,” Vis. Res.. 24, (7 ), 661 –665 (1984). 0042-6989 CrossRef
Schor  C., , Heckmann  T., and Tyler  C. W., “Binocular fusion limits are independent of contrast, luminance gradient and component phases,” Vis. Res.. 29, (7 ), 821 –835 (1989). 0042-6989 CrossRef
Shibata  T.  et al., “The zone of comfort: predicting visual discomfort with stereo displays,” J. Vis.. 11, (8 ), 11  (2011). 1534-7362 CrossRef
Hoffman  D. M.  et al., “Vergence-accommodation conflicts hinder visual performance and cause visual fatigue,” J. Vis.. 8, (3 ), 33  (2008). 1534-7362 CrossRef
Howarth  P. A., “Potential hazards of viewing 3-D stereoscopic television, cinema and computer games: a review,” Ophthalmic Physiol. Opt.. 31, , 111 –122 (2011). 0275-5408 CrossRef
Yang  S., and Sheedy  J., “Effects of vergence and accommodative responses on viewer’s comfort in viewing 3D stimuli,” Proc. SPIE. 7863, , 78630Q  (2011).CrossRef
Sohn  H.  et al., “Attention model-based visual comfort assessment for stereoscopic depth perception,” in  Proc. of Intl. Conf. on Digital Signal Processing , pp. 1 –6 (2011).
Jung  Y.  et al., “Visual comfort assessment metric based on salient object motion information in stereoscopic video,” J. Electron. Imaging. 21, (1 ), 011008  (2012). 1017-9909 CrossRef
Kim  D., and Sohn  K., “Visual fatigue prediction for stereoscopic image,” IEEE Trans. Circuits Syst. Video Technol.. 21, (2 ), 231 –236 (2011).CrossRef
Lee  S.  et al., “Effect of stimulus width on the perceived visual discomfort in viewing stereoscopic 3D-TV,” IEEE Trans. Broadcast.. 59, (4 ), 580 –590 (2013). 0018-9316 CrossRef
Sohn  H.  et al., “Predicting visual discomfort using object size and disparity information in stereoscopic images,” IEEE Trans. Broadcast.. 59, (1 ), 28 –37 (2013). 0018-9316 CrossRef
Park  J., , Lee  S., and Bovik  A., “3D visual discomfort prediction: vergence, foveation, and the physiological optics of accommodation,” IEEE J. Sel. Top. Signal Process.. 8, (3 ), 415 –426 (2014).CrossRef
Choi  J.  et al., “Visual fatigue evaluation and enhancement for 2D-plus-depth video,” in  Proc. of IEEE Intl. Conf. on Image Processing , pp. 2981 –2984 (2010).
Yong  J.  et al., “Predicting visual discomfort of stereoscopic images using human attention model,” IEEE Trans. Circuits Syst. Video Technol.. 23, (12 ), 2077 –2082 (2013).CrossRef
Sheikh  H. R., , Sabir  M. F., and Bovik  A. C., “A statistical evaluation of recent full reference image quality assessment algorithms,” IEEE Trans. Image Process.. 15, (11 ), 3440 –3451 (2006). 1057-7149 CrossRef
Rouse  D. M.  et al., “Tradeoffs in subjective testing methods for image and video quality assessment,” Proc. SPIE. 7527, , 75270F  (2010).CrossRef
Xue  W., , Zhang  L., and Mou  X., “Learning without human scores for blind image quality assessment,” in  Proc. of IEEE Intl. Conf. on Computer Vision and Pattern Recognition , pp. 995 –1002 (2013).
Ye  P., , Kumar  J., and Doermann  D., “Beyond human opinion scores: blind image quality assessment based on synthetic scores,” in  Proc. of IEEE Intl. Conf. on Computer Vision and Pattern Recognition , pp. 4241 –4248 (2014).
Mittal  A.  et al., “Blind image quality assessment without training on human opinion scores,” Proc. SPIE. 8651, , 86510T  (2013).CrossRef
Ye  P., and Doermann  D., “Combining preference and absolute judgments in a crowd-sourced setting,” in  Proc. of Intl. Conf. on Machine Learning , pp. 1 –7 (2013).
Carterette  B.  et al., “Here or there,” In  Proc. of 30th European Conf. on Advances in Information Retrieval , pp. 16 –27 (2008).
Halonen  R., , Westman  S., and Oittinen  P., “Naturalness and interestingness of test images for visual quality evaluation,” Proc. SPIE. 7867, , 78670Z  (2011).CrossRef
Lee  J. S., , Goldmann  L., and Ebrahimi  T., “Paired comparison-based subjective quality assessment of stereoscopic images,” Multimed. Tools Appl.. 67, (1 ), 31 –48 (2013).CrossRef
Li  J., , Barkowsky  M., and Le Callet  P., “Analysis and improvement of a paired comparison method in the application of 3DTV subjective experiment,” in  Proc. of IEEE Intl. Conf. on Image Processing , pp. 629 –632 (2012).
ITU-T P.910, “Subjective video quality assessment methods for multimedia applications,” Recommendation ITU-T P.910, ITU Telecom. Sector of ITU (1999).
ITU-T P.911, “Subjective video quality assessment methods for multimedia applications,” Recommendation ITU-T P.911, ITU Telecom. Sector of ITU (1999).
Bradley  R. A., and Terry  M. E., “Rank analysis of incomplete block designs: I. The method of paired comparisons,” Biometrika. 39, (3–4 ), 324 –345 (1952).CrossRef
Chen  W.  et al., “Exploration of quality of experience of stereoscopic images: binocular depth,” in  Proc. of VPQM , pp. 1 –6 (2012).
Pentland  A. P., “A new sense for depth of field,” IEEE Trans. Pattern Anal. Mach. Intell.. 9, (4 ), 523 –531 (1987).CrossRef
Suzuki  Y.  et al., “Effects of an eyeglass-free 3-D display on the human visual system,” Jpn. J. Ophthalmol.. 48, (1 ), 1 –6 (2004). 0021-5155 CrossRef
Lee  S.  et al., “Visual discomfort induced by fast salient object motion in stereoscopic video,” Proc. SPIE. 7863, , 786305  (2011). 0277-786X CrossRef
Cho  S. H., and Kang  H. B., “Prediction of visual discomfort in watching 3D video using multiple features,” In  Proc. of IEEE Southwest Symp. on Image Analysis and Interpretation , pp. 65 –68 (2014).
Chang  C., and Lin  C., “LIBSVM: a library for support vector machines,” 2001, http://www.csie.ntu.edu.tw/~cjlin/libsvm/
Sun  D., , Roth  S., and Black  M. J., “Secrets of optical flow estimation and their principles,” in  Proc. of IEEE Intl. Conf. on Computer Vision and Pattern Recognition , pp. 2432 –2439 (2010).
ITU-R BT-500.11, “Methodology for the subjective assessment of the quality of television pictures,” ITU-R BT-500.11 (2002).
ITU-R BT.1438, “Subjective assessment for stereoscopic television pictures,” ITU-R BT.1438 (2000).
Woods  A., “Understanding crosstalk in stereoscopic displays,” in  Keynote Presentation at the Three-Dimensional Systems and Applications Conference ,  Tokyo, Japan , pp. 19 –21 (2010).
Isola  P.  et al., “What makes an image memorable?,” in  Proc. of IEEE Int. Conf. on Computer Vision and Pattern Recognition , pp. 145 –152 (2011).
Sohn  H.  et al., “IVY LAB stereoscopic 3D image database for visual discomfort prediction,” 2013, http://ivylab.kaist.ac.kr/demo/3DVCA/3DVCA.htm.
Harel  J., , Koch  C., and Perona  P., “Graph-based visual saliency,” in  Proc. of Advances in Neural Information Processing Systems , pp. 545 –552 (2006).
Cover  T., and Hart  P., “Nearest neighbor pattern classification,” IEEE Trans. Inf. Theory. 13, (1 ), 21 –27 (1967). 0018-9448 CrossRef
Metsis  V., , Androutsopoulos  I., and Paliouras  G., “Spam filtering with naive Bayes—which naive Bayes?,” in  Third Conf. on Email and Anti-Spam , pp. 27 –28 (2006).
Breiman  L., “Random forests,” Mach. Learn.. 45, (1 ), 5 –32 (2001). 0885-6125 CrossRef

Qiuping Jiang received his BS degree in communications engineering from China Jiliang University, Hangzhou, China, in 2012. He is now pursuing his MS degree in electronics and communications engineering at Ningbo University, Ningbo, China. His research interests include three-dimensional (3-D) visual perception, quality assessment, and so on.

Feng Shao received his BS and PhD degrees from Zhejiang University, Hangzhou, China, in 2002 and 2007, respectively, both in electronic science and technology. He is currently a professor in the Faculty of Information Science and Engineering, Ningbo University, China. He was a visiting fellow with the School of Computer Engineering, Nanyang Technological University, Singapore, from February 2012 to August 2012. His research interests include 3-D video coding, 3-D quality assessment, image perception, and so on.

Gangyi Jiang received his MS degree from Hangzhou University in 1992 and his PhD degree from Ajou University, Korea, in 2000. He is now a professor in the Faculty of Information Science and Engineering, Ningbo University, China. His research interests mainly include digital video compression, multiview video coding, and so on.

Mei Yu received her MS degree from Hangzhou Institute of Electronics Engineering, China, in 1993; and her PhD degree from Ajou University, Korea, in 2000. She is now a professor in the Faculty of Information Science and Engineering, Ningbo University, China. Her research interests include image/video coding and video perception.

Zongju Peng received his BS degree from Sichuan Normal College, China, in 1995; his MS degree from Sichuan University, China, in 1998; and his PhD degree from the Institute of Computing Technology, Chinese Academy of Sciences, in 2010. He is now an associate professor in the Faculty of Information Science and Engineering, Ningbo University, China. His research interests mainly include image/video compression, 3-D video coding, and video perception.

© 2015 SPIE and IS&T

Citation

Qiuping Jiang ; Feng Shao ; Gangyi Jiang ; Mei Yu and Zongju Peng
"Three-dimensional visual comfort assessment via preference learning", J. Electron. Imaging. 24(4), 043002 (Jul 20, 2015). ; http://dx.doi.org/10.1117/1.JEI.24.4.043002


Figures

Graphic Jump Location
Fig. 1
F1 :

Example of two stereoscopic images with small difference in terms of visual comfort. In this situation, it is difficult and confusing for observers to label the preference. Therefore, in order to obtain the reliable preference label, a large number of observers are required, leading to the same limitation as traditional absolute category rating subjective testing.

Graphic Jump Location
Fig. 2
F2 :

Example of two stereoscopic images with large difference in terms of visual comfort. In this situation, it is easy and consistent for the observers to label the preference. Therefore, only a small number of observers are needed to generate a reliable preference label.

Graphic Jump Location
Fig. 3
F3 :

The selected representative training stereoscopic images and the corresponding mean opinion scores (MOSs).

Graphic Jump Location
Fig. 4
F4 :

The high-level diagram for the proposed visual comfort assessment (VCA) method.

Graphic Jump Location
Fig. 5
F5 :

Geometrical illustration of depth-of-focus and retinal defocus circle for different depth distances (different disparities).

Graphic Jump Location
Fig. 6
F6 :

Forty selected right-view images in the NBU 3D-VCA image database.

Graphic Jump Location
Fig. 7
F7 :

MOS distribution of all stereoscopic images in the NBU 3D-VCA database.

Graphic Jump Location
Fig. 8
F8 :

Consistency analyses of human subjective assessment scores. (a) Spearman’s rank correlation between the MOSs given by the two nonoverlapping halves as a function of the mean number of opinion scores per image. (b) Images were ranked by MOSs from subjects in group 1 and plotted against the corresponding average MOSs given by subjects in group 2. Error bar plotted in Fig. 8 represents the standard deviation for 50 random trails.

Tables

Table Grahic Jump Location
Table 1The selection procedure of representative training stereoscopic images.
Table Grahic Jump Location
Table 2Performance comparison results of different visual comfort assessment (VCA) models on both NBU 3D-VCA and IVY LAB databases (the cases in bold indicate the best performance).
Table Grahic Jump Location
Table 3The comparison results of different schemes with different ratios between training samples in positive-labeled and negative-labeled subsets (the cases in bold indicate the best performance).
Table Grahic Jump Location
Table 4The influence of different classification algorithms (performances under different ratios are given).

References

Lambooij  M.  et al., “Visual discomfort and visual fatigue of stereoscopic displays: a review,” J. Imaging Sci. Technol.. 53, (3 ), 030201  (2009).CrossRef
Lambooij  M., , IJsselsteijn  W. A., and Heynderickx  I., “Visual discomfort of 3-D TV: assessment methods and modeling,” Displays. 32, (4 ), 209 –218 (2011). 0141-9382 CrossRef
Urvoy  M.  et al., “How visual fatigue and discomfort impact 3D-TV quality of experience: a comprehensive review of technological, psychophysical, and psychological factors,” Ann. Telecommun.. 68, (11–12 ), 641 –655 (2013).CrossRef
Urvoy  M., , Barkowsky  M., and Le Callet  P., “Stereoscopic 3D-TV: visual comfort,” IEEE Trans. Broadcast.. 57, (2 ), 335 –346 (2011). 0018-9316 CrossRef
Li  J., , Barkowsky  M., and Le Callet  P., “Visual discomfort induced by relative disparity and planar motion of stereoscopic images,” in  Proc. of the First Sino French Workshop on Information and Communication Technologies , pp. 1 –2 (2011).
Li  J., , Barkowsky  M., and Le Callet  P., “Visual discomfort of stereoscopic 3D videos: influence of 3D motion,” Displays. 35, (1 ), 49 –57 (2014). 0141-9382 CrossRef
Wang  J.  et al., “Study of depth bias of observers in free viewing of still stereoscopic synthetic stimuli,” J. Eye Mov. Res.. 5, (5 ), 1 –11 (2012).CrossRef
Choi  J.  et al., “Visual fatigue modeling and analysis for stereoscopic video,” Opt. Eng.. 51, (1 ), 017206  (2012).CrossRef
Nojiri  Y.  et al., “Parallax distribution and visual comfort on stereoscopic HDTV,” in  Proc. IBC , pp. 373 –380 (2006).
Fry  G. A., “Further experiments on the accommodative convergence relationship,” Am. J. Optom.. 16, , 325 –334 (1939).CrossRef
Hofstetter  H., “The zone of clear single binocular vision,” Am. J. Optom.. 22, (7 ), 301 –384 (1945). 0002-9408 CrossRef
Schor  C., , Wood  I., and Ogawa  J., “Binocular sensory fusion is limited by spatial resolution,” Vis. Res.. 24, (7 ), 661 –665 (1984). 0042-6989 CrossRef
Schor  C., , Heckmann  T., and Tyler  C. W., “Binocular fusion limits are independent of contrast, luminance gradient and component phases,” Vis. Res.. 29, (7 ), 821 –835 (1989). 0042-6989 CrossRef
Shibata  T.  et al., “The zone of comfort: predicting visual discomfort with stereo displays,” J. Vis.. 11, (8 ), 11  (2011). 1534-7362 CrossRef
Hoffman  D. M.  et al., “Vergence-accommodation conflicts hinder visual performance and cause visual fatigue,” J. Vis.. 8, (3 ), 33  (2008). 1534-7362 CrossRef
Howarth  P. A., “Potential hazards of viewing 3-D stereoscopic television, cinema and computer games: a review,” Ophthalmic Physiol. Opt.. 31, , 111 –122 (2011). 0275-5408 CrossRef
Yang  S., and Sheedy  J., “Effects of vergence and accommodative responses on viewer’s comfort in viewing 3D stimuli,” Proc. SPIE. 7863, , 78630Q  (2011).CrossRef
Sohn  H.  et al., “Attention model-based visual comfort assessment for stereoscopic depth perception,” in  Proc. of Intl. Conf. on Digital Signal Processing , pp. 1 –6 (2011).
Jung  Y.  et al., “Visual comfort assessment metric based on salient object motion information in stereoscopic video,” J. Electron. Imaging. 21, (1 ), 011008  (2012). 1017-9909 CrossRef
Kim  D., and Sohn  K., “Visual fatigue prediction for stereoscopic image,” IEEE Trans. Circuits Syst. Video Technol.. 21, (2 ), 231 –236 (2011).CrossRef
Lee  S.  et al., “Effect of stimulus width on the perceived visual discomfort in viewing stereoscopic 3D-TV,” IEEE Trans. Broadcast.. 59, (4 ), 580 –590 (2013). 0018-9316 CrossRef
Sohn  H.  et al., “Predicting visual discomfort using object size and disparity information in stereoscopic images,” IEEE Trans. Broadcast.. 59, (1 ), 28 –37 (2013). 0018-9316 CrossRef
Park  J., , Lee  S., and Bovik  A., “3D visual discomfort prediction: vergence, foveation, and the physiological optics of accommodation,” IEEE J. Sel. Top. Signal Process.. 8, (3 ), 415 –426 (2014).CrossRef
Choi  J.  et al., “Visual fatigue evaluation and enhancement for 2D-plus-depth video,” in  Proc. of IEEE Intl. Conf. on Image Processing , pp. 2981 –2984 (2010).
Yong  J.  et al., “Predicting visual discomfort of stereoscopic images using human attention model,” IEEE Trans. Circuits Syst. Video Technol.. 23, (12 ), 2077 –2082 (2013).CrossRef
Sheikh  H. R., , Sabir  M. F., and Bovik  A. C., “A statistical evaluation of recent full reference image quality assessment algorithms,” IEEE Trans. Image Process.. 15, (11 ), 3440 –3451 (2006). 1057-7149 CrossRef
Rouse  D. M.  et al., “Tradeoffs in subjective testing methods for image and video quality assessment,” Proc. SPIE. 7527, , 75270F  (2010).CrossRef
Xue  W., , Zhang  L., and Mou  X., “Learning without human scores for blind image quality assessment,” in  Proc. of IEEE Intl. Conf. on Computer Vision and Pattern Recognition , pp. 995 –1002 (2013).
Ye  P., , Kumar  J., and Doermann  D., “Beyond human opinion scores: blind image quality assessment based on synthetic scores,” in  Proc. of IEEE Intl. Conf. on Computer Vision and Pattern Recognition , pp. 4241 –4248 (2014).
Mittal  A.  et al., “Blind image quality assessment without training on human opinion scores,” Proc. SPIE. 8651, , 86510T  (2013).CrossRef
Ye  P., and Doermann  D., “Combining preference and absolute judgments in a crowd-sourced setting,” in  Proc. of Intl. Conf. on Machine Learning , pp. 1 –7 (2013).
Carterette  B.  et al., “Here or there,” In  Proc. of 30th European Conf. on Advances in Information Retrieval , pp. 16 –27 (2008).
Halonen  R., , Westman  S., and Oittinen  P., “Naturalness and interestingness of test images for visual quality evaluation,” Proc. SPIE. 7867, , 78670Z  (2011).CrossRef
Lee  J. S., , Goldmann  L., and Ebrahimi  T., “Paired comparison-based subjective quality assessment of stereoscopic images,” Multimed. Tools Appl.. 67, (1 ), 31 –48 (2013).CrossRef
Li  J., , Barkowsky  M., and Le Callet  P., “Analysis and improvement of a paired comparison method in the application of 3DTV subjective experiment,” in  Proc. of IEEE Intl. Conf. on Image Processing , pp. 629 –632 (2012).
ITU-T P.910, “Subjective video quality assessment methods for multimedia applications,” Recommendation ITU-T P.910, ITU Telecom. Sector of ITU (1999).
ITU-T P.911, “Subjective video quality assessment methods for multimedia applications,” Recommendation ITU-T P.911, ITU Telecom. Sector of ITU (1999).
Bradley  R. A., and Terry  M. E., “Rank analysis of incomplete block designs: I. The method of paired comparisons,” Biometrika. 39, (3–4 ), 324 –345 (1952).CrossRef
Chen  W.  et al., “Exploration of quality of experience of stereoscopic images: binocular depth,” in  Proc. of VPQM , pp. 1 –6 (2012).
Pentland  A. P., “A new sense for depth of field,” IEEE Trans. Pattern Anal. Mach. Intell.. 9, (4 ), 523 –531 (1987).CrossRef
Suzuki  Y.  et al., “Effects of an eyeglass-free 3-D display on the human visual system,” Jpn. J. Ophthalmol.. 48, (1 ), 1 –6 (2004). 0021-5155 CrossRef
Lee  S.  et al., “Visual discomfort induced by fast salient object motion in stereoscopic video,” Proc. SPIE. 7863, , 786305  (2011). 0277-786X CrossRef
Cho  S. H., and Kang  H. B., “Prediction of visual discomfort in watching 3D video using multiple features,” In  Proc. of IEEE Southwest Symp. on Image Analysis and Interpretation , pp. 65 –68 (2014).
Chang  C., and Lin  C., “LIBSVM: a library for support vector machines,” 2001, http://www.csie.ntu.edu.tw/~cjlin/libsvm/
Sun  D., , Roth  S., and Black  M. J., “Secrets of optical flow estimation and their principles,” in  Proc. of IEEE Intl. Conf. on Computer Vision and Pattern Recognition , pp. 2432 –2439 (2010).
ITU-R BT-500.11, “Methodology for the subjective assessment of the quality of television pictures,” ITU-R BT-500.11 (2002).
ITU-R BT.1438, “Subjective assessment for stereoscopic television pictures,” ITU-R BT.1438 (2000).
Woods  A., “Understanding crosstalk in stereoscopic displays,” in  Keynote Presentation at the Three-Dimensional Systems and Applications Conference ,  Tokyo, Japan , pp. 19 –21 (2010).
Isola  P.  et al., “What makes an image memorable?,” in  Proc. of IEEE Int. Conf. on Computer Vision and Pattern Recognition , pp. 145 –152 (2011).
Sohn  H.  et al., “IVY LAB stereoscopic 3D image database for visual discomfort prediction,” 2013, http://ivylab.kaist.ac.kr/demo/3DVCA/3DVCA.htm.
Harel  J., , Koch  C., and Perona  P., “Graph-based visual saliency,” in  Proc. of Advances in Neural Information Processing Systems , pp. 545 –552 (2006).
Cover  T., and Hart  P., “Nearest neighbor pattern classification,” IEEE Trans. Inf. Theory. 13, (1 ), 21 –27 (1967). 0018-9448 CrossRef
Metsis  V., , Androutsopoulos  I., and Paliouras  G., “Spam filtering with naive Bayes—which naive Bayes?,” in  Third Conf. on Email and Anti-Spam , pp. 27 –28 (2006).
Breiman  L., “Random forests,” Mach. Learn.. 45, (1 ), 5 –32 (2001). 0885-6125 CrossRef

Some tools below are only available to our subscribers or users with an online account.

Related Content

Customize your page view by dragging & repositioning the boxes below.

Related Book Chapters

Topic Collections

Advertisement
  • Don't have an account?
  • Subscribe to the SPIE Digital Library
  • Create a FREE account to sign up for Digital Library content alerts and gain access to institutional subscriptions remotely.
Access This Article
Sign in or Create a personal account to Buy this article ($20 for members, $25 for non-members).
Access This Proceeding
Sign in or Create a personal account to Buy this article ($15 for members, $18 for non-members).
Access This Chapter

Access to SPIE eBooks is limited to subscribing institutions and is not available as part of a personal subscription. Print or electronic versions of individual SPIE books may be purchased via SPIE.org.