Training, Validation, and Test Sets
Details about each visual question are in the following format:
"image":"VQAonline_00021900.png",
"question":"What is the history of this coin? By Mandate of Heaven) Emperor of the Great Qin", Nurhaci - 1616-1626",
"context":"I want to know the complete history of this coin. ",
"answer":"This particular coin is part of a set of commemorative tokens (aka fantasy coins) made in (modern) China, with a token for each of the Qing Dynasty emperors. This one shows Emperor Nurhaci with the dates when he was in power shown below his likeness on the "coin".",
"topic":"history",
"url":"https://history.stackexchange.com/questions/26573/"
Characterization of existing VQA datasets and our VQAonline dataset in terms of mean question length (i.e., Q), mean answer length (i.e., A), number of images (i.e., Nimg), number of authentic topics (i.e., # Auth. Topics), inclusion of context, inclusion of authentic visual questions (i.e., Auth. VQ), inclusion of authentic context, and inclusion of answers validated by those asking the questions (i.e., Vld. A)).
@article{chen2023vqaonline,
title={Fully Authentic Visual Question Answering Dataset from Online Communities},
author={Chen, Chongyan and Liu, Mengchen and Codella, Noel and Li, Yunsheng and Yuan, Lu and Gurari, Danna},
journal={arXiv preprint arXiv:2311.15562},
year={2023}}
@article{liu2023evalvqaonline,
title={An Evaluation of GPT-4V and Gemini in Online VQA},
author={Liu, Mengchen and Chen, Chongyan and Gurari, Danna},
journal={arXiv preprint arXiv:2312.10637},
year={2023}}