Abstract: We introduce Motion-Grounded Video Reasoning, a new motion understanding task that requires generating visual answers (video segmentation masks) according to the input question, and hence ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results