Paper
27 May 2024 CLEVR-BT-DB: a benchmark dataset to evaluate the reasoning abilities of deep neural models in visual question answering problems
Insan-Aleksandr Latipov, Andrey Borevskiy, Attila Kertesz-Farkas
Author Affiliations +
Proceedings Volume 13169, Fifth International Conference on Computer Vision and Computational Intelligence (CVCI 2024); 1316909 (2024) https://doi.org/10.1117/12.3027602
Event: Fifth International Conference on Computer Vision and Computational Intelligence (CVCI 2024), 2024, Bangkok, Thailand
Abstract
Deep learning-based machine reasoning and visual question answering models achieve a near-human performance on their respective datasets; however, their performance dramatically drops under domain shift suggesting that models fail to generalize to the level of human-like reasoning. In this paper we present a new CLEVR-like dataset consisting of images-question pairs to evaluate the visual reasoning capability of deep models. The objects in the images are arranged in a way that the first half of the question is ambiguous and multiple answers seem to be correct up to this point; however, the second half of the question clarifies the situation and makes the whole visual question-answering (VQA) task unambiguous, and a unique answer can be reported. Therefore, deep models during their reasoning process need to handle ambiguousness in their neurons. They can handle this either via graph (or tree) traversing in the search space with using back-tracking technique or via refining a candidate set of possibly correct answers by iteratively eliminating incorrect ones upon some reasoning calculations. We call this data-set CLEVR with Back-Tracking Database, CLEVR-BT-DB. It consists of 2,500 images and 10,000 questions in the same format as the standard CLEVR, and it is available at https://huggingface.co/datasets/Aborevsky01/CLEVR-BT-DB site. The code to generate additional data is available at https://github.com/AFigaro/CLEVR_BT_DB site. We tested MDETR method, a recent deep model for VQA from Meta Research, it achieved an accuracy of 99.7 % on the Standard CLEVR dataset; however, it achieves an accuracy of 28.01 % on our CLEVR-BT-DB dataset.
(2024) Published by SPIE. Downloading of the abstract is permitted for personal use only.
Insan-Aleksandr Latipov, Andrey Borevskiy, and Attila Kertesz-Farkas "CLEVR-BT-DB: a benchmark dataset to evaluate the reasoning abilities of deep neural models in visual question answering problems", Proc. SPIE 13169, Fifth International Conference on Computer Vision and Computational Intelligence (CVCI 2024), 1316909 (27 May 2024); https://doi.org/10.1117/12.3027602
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Data modeling

Visualization

Performance modeling

Visual process modeling

Systems modeling

Artificial intelligence

Object detection

Back to Top