Computer Laboratory

Projects

From pairwise comparisons and rating to a unified quality scale

Maria Perez-Ortiz, Aliaksei Mikhailiuk, Emin Zerman, Vedad Hulusic, Giuseppe Valenzise, Rafał K. Mantiuk

The proposed method builds on a well-established field of psychometrics and sensory evaluation and scales together results of two most commonly used experimental protocols: rating and pairwise comparisons. Such scaling can be used for merging existing datasets of subjective nature and for experimental protocols in which both rating and pairwise comparisons are collected.

The figure above shows the comparison of the unified quality scale (Just-Objectionable-Differences / JOD) with commonly used scales: difference-of-mean-opinion scores (DMOS) and vote counts (VC). Colors used in scales correspond to the underlines below each image. The difference of 1 JOD means that 75% of observers will choose one image over the other. The main limitation of DMOS is that the scale is arbitrary and can differ from one experiment to the other. The shortcoming of vote counts collected independently of image content is that the quality measure is inaccurate when two different contents are considered (fails to recognize that blue-labeled image is different from the green-labeled image. More details in the paper.

Abstract

The goal of psychometric scaling is the quantification of perceptual experiences, understanding the relationship between an external stimulus, the internal representation and the response. In this paper, we propose a probabilistic framework to fuse the outcome of different psychophysical experimental protocols, namely rating and pairwise comparisons experiments. Such a method can be used for merging existing datasets of subjective nature and for experiments in which both measurements are collected. We analyze and compare the outcomes of both types of experimental protocols in terms of time and accuracy in a set of simulations and experiments with benchmark and real-world image quality assessment datasets, showing the necessity of scaling and the advantages of each protocol and mixing. Although most of our examples focus on image quality assessment, our findings generalize to any other subjective quality-of-experience task.

Materials

Publication

Maria Perez-Ortiz, Aliaksei Mikhailiuk, Emin Zerman, Vedad Hulusic, Giuseppe Valenzise, Rafal K. Mantiuk. From pairwise comparisons and rating to a unified quality scale. IEEE Transactions on Image Processing, (in print), 2019