Annotation Instruction

Task Definition

INPUT
- premise a clip of 4 seconds long from the video
- hypothesis a sentence describing a subsequent event moments later from or right after the premise
- question a question asking about the event happening in between the premise and the hypothesis (abductive) or after the hypothesis (predictive)
OUTPUT
- answer answer(s) to the question
Auxiliary INFO
- thumbnail the thumbnail/cover of the video
- clip-bg title and desc of the clip
- movie-bg title and background of the movie

Annotation

Feasibility
1-1. Whether there is ghost entity or typo in answer that cannot be detected or interpreted from premise or hypothesis?
A. yes; B. yes, but can be predicted by inference; C. no
1-2. If you choose A for 1-1, then whether the ghost entity or typo can be predicted or interpreted based on thumbnail, clip-bg, or movie-bg?
A. yes for thumbnail; B. yes for clip-bg C. yes for thumbnail+clip-bg; D. yes for clip-bg+movie-bg; E. yes for thumbnail+clip-bg+movie-bg; F. no
1-3. If you choose C for 1-1 or A/B/C/D/E for 1-2, then whether there is detail leakage from textual information that makes it too easy to get the answer? (e.g. just change 1 or 2 words of the source sentence)
A. yes; B. no
Multimodality
2-1. If you choose B/C for 1-1 or A/B/C/D/E for 1-2, then whether the answer can be generated based on information from only one modality (e.g. visual or textual)?
A. yes, based on visual info; B. yes, based on textual info; C. no, we need both
2-2. If you choose A/C for 2-1, then what kind(s) of visual information is required to be distilled for answering? (multi-choice)
A. object-attribute; B. scene/place signals; C. human-emotion; D. motion/action; E. spatio-temporal relation; F. others
2-3. If you choose C for 2-1, then how can one associate information from the two modalities together? (multi-choice)
A. basic grounding/alignment; B. the two modalities can help specify the events described by each other; C. daily commonsense reasoning; D. others
CommonSense Reasoning
3-1. If you choose B/C for 1-1 or A/B/C/D/E for 1-2, then whether commonsense knowledge is required to get the answer? If so, what kind(s) of commonsense knowledge is included? (multi-choice)
A. no; B. object-attribute; C. basic actions/motions of people/objects; D. correlation between events; E. change in people’s mental states; F. social interactions among people & objects; G. others
3-2. Please write down the rationale of how to get the answer to the question. If you think the provided information is insufficient, then explain why the question is unanswerable.

Examples

Visual-Linguistic Commonsense Reasoning Sample 117

Visual-Linguistic Commonsense Reasoning Sample 116

Visual-Linguistic Commonsense Reasoning Sample 115

Visual-Linguistic Commonsense Reasoning Sample 114

Visual-Linguistic Commonsense Reasoning Sample 113

Visual-Linguistic Commonsense Reasoning Sample 112

Visual-Linguistic Commonsense Reasoning Sample 111

Visual-Linguistic Commonsense Reasoning Sample 110

Visual-Linguistic Commonsense Reasoning Sample 109

Visual-Linguistic Commonsense Reasoning Sample 108

Visual-Linguistic Commonsense Reasoning Sample 107

Visual-Linguistic Commonsense Reasoning Sample 106

Visual-Linguistic Commonsense Reasoning Sample 105

Visual-Linguistic Commonsense Reasoning Sample 104

Visual-Linguistic Commonsense Reasoning Sample 103

Visual-Linguistic Commonsense Reasoning Sample 102

Visual-Linguistic Commonsense Reasoning Sample 101

Visual-Linguistic Commonsense Reasoning Sample 100

Visual-Linguistic Commonsense Reasoning Sample 099

Visual-Linguistic Commonsense Reasoning Sample 098

Visual-Linguistic Commonsense Reasoning Sample 097

Visual-Linguistic Commonsense Reasoning Sample 096

Visual-Linguistic Commonsense Reasoning Sample 095

Visual-Linguistic Commonsense Reasoning Sample 094

Visual-Linguistic Commonsense Reasoning Sample 093

Visual-Linguistic Commonsense Reasoning Sample 092

Visual-Linguistic Commonsense Reasoning Sample 091

Visual-Linguistic Commonsense Reasoning Sample 090

Visual-Linguistic Commonsense Reasoning Sample 089

Visual-Linguistic Commonsense Reasoning Sample 088

Visual-Linguistic Commonsense Reasoning Sample 087

Visual-Linguistic Commonsense Reasoning Sample 086

Visual-Linguistic Commonsense Reasoning Sample 085

Visual-Linguistic Commonsense Reasoning Sample 084

Visual-Linguistic Commonsense Reasoning Sample 083

Visual-Linguistic Commonsense Reasoning Sample 082

Visual-Linguistic Commonsense Reasoning Sample 081

Visual-Linguistic Commonsense Reasoning Sample 080

Visual-Linguistic Commonsense Reasoning Sample 079

Visual-Linguistic Commonsense Reasoning Sample 078

Visual-Linguistic Commonsense Reasoning Sample 077

Visual-Linguistic Commonsense Reasoning Sample 076

Visual-Linguistic Commonsense Reasoning Sample 075

Visual-Linguistic Commonsense Reasoning Sample 074

Visual-Linguistic Commonsense Reasoning Sample 073

Visual-Linguistic Commonsense Reasoning Sample 072

Visual-Linguistic Commonsense Reasoning Sample 071

Visual-Linguistic Commonsense Reasoning Sample 070

Visual-Linguistic Commonsense Reasoning Sample 069

Visual-Linguistic Commonsense Reasoning Sample 068

Visual-Linguistic Commonsense Reasoning Sample 067

Visual-Linguistic Commonsense Reasoning Sample 066

Visual-Linguistic Commonsense Reasoning Sample 065

Visual-Linguistic Commonsense Reasoning Sample 064

Visual-Linguistic Commonsense Reasoning Sample 063

Visual-Linguistic Commonsense Reasoning Sample 062

Visual-Linguistic Commonsense Reasoning Sample 061

Visual-Linguistic Commonsense Reasoning Sample 060

Visual-Linguistic Commonsense Reasoning Sample 059

Visual-Linguistic Commonsense Reasoning Sample 058

Visual-Linguistic Commonsense Reasoning Sample 057

Visual-Linguistic Commonsense Reasoning Sample 056

Visual-Linguistic Commonsense Reasoning Sample 055

Visual-Linguistic Commonsense Reasoning Sample 054

Visual-Linguistic Commonsense Reasoning Sample 053

Visual-Linguistic Commonsense Reasoning Sample 052

Visual-Linguistic Commonsense Reasoning Sample 051

Visual-Linguistic Commonsense Reasoning Sample 050

Visual-Linguistic Commonsense Reasoning Sample 049

Visual-Linguistic Commonsense Reasoning Sample 048

Visual-Linguistic Commonsense Reasoning Sample 047

Visual-Linguistic Commonsense Reasoning Sample 046

Visual-Linguistic Commonsense Reasoning Sample 045

Visual-Linguistic Commonsense Reasoning Sample 044

Visual-Linguistic Commonsense Reasoning Sample 043

Visual-Linguistic Commonsense Reasoning Sample 042

Visual-Linguistic Commonsense Reasoning Sample 041

Visual-Linguistic Commonsense Reasoning Sample 040

Visual-Linguistic Commonsense Reasoning Sample 039

Visual-Linguistic Commonsense Reasoning Sample 038

Visual-Linguistic Commonsense Reasoning Sample 037

Visual-Linguistic Commonsense Reasoning Sample 036

Visual-Linguistic Commonsense Reasoning Sample 035

Visual-Linguistic Commonsense Reasoning Sample 034

Visual-Linguistic Commonsense Reasoning Sample 033

Visual-Linguistic Commonsense Reasoning Sample 032

Visual-Linguistic Commonsense Reasoning Sample 031

Visual-Linguistic Commonsense Reasoning Sample 030

Visual-Linguistic Commonsense Reasoning Sample 029

Visual-Linguistic Commonsense Reasoning Sample 028

Visual-Linguistic Commonsense Reasoning Sample 027

Visual-Linguistic Commonsense Reasoning Sample 026

Visual-Linguistic Commonsense Reasoning Sample 025

Visual-Linguistic Commonsense Reasoning Sample 024

Visual-Linguistic Commonsense Reasoning Sample 023

Visual-Linguistic Commonsense Reasoning Sample 022

Visual-Linguistic Commonsense Reasoning Sample 021

Visual-Linguistic Commonsense Reasoning Sample 020

Visual-Linguistic Commonsense Reasoning Sample 019

Visual-Linguistic Commonsense Reasoning Sample 018

Visual-Linguistic Commonsense Reasoning Sample 017

Visual-Linguistic Commonsense Reasoning Sample 016

Visual-Linguistic Commonsense Reasoning Sample 015

Visual-Linguistic Commonsense Reasoning Sample 014

Visual-Linguistic Commonsense Reasoning Sample 013

Visual-Linguistic Commonsense Reasoning Sample 012

Visual-Linguistic Commonsense Reasoning Sample 011

Visual-Linguistic Commonsense Reasoning Sample 010

Visual-Linguistic Commonsense Reasoning Sample 009

Visual-Linguistic Commonsense Reasoning Sample 008

Visual-Linguistic Commonsense Reasoning Sample 007

Visual-Linguistic Commonsense Reasoning Sample 006

Visual-Linguistic Commonsense Reasoning Sample 005

Visual-Linguistic Commonsense Reasoning Sample 004

Visual-Linguistic Commonsense Reasoning Sample 003

Visual-Linguistic Commonsense Reasoning Sample 002

Visual-Linguistic Commonsense Reasoning Sample 001

ActyNet Example 19

ActyNet Example 18

ActyNet Example 17

ActyNet Example 16

ActyNet Example 15

ActyNet Example 14

ActyNet Example 13

ActyNet Example 12

ActyNet Example 11

ActyNet Example 10

ActyNet Example 09

ActyNet Example 08

ActyNet Example 07

ActyNet Example 06

ActyNet Example 05

ActyNet Example 04

ActyNet Example 03

ActyNet Example 02

ActyNet Example 01