More and more large multimodal models (LMMs) are being released from time to time, but the finetuning of these models is not always straightforward. This codebase aims to provide a unified, minimal ...
Abstract: We introduce WildVideo, an open-world benchmark dataset designed to address how to assess hallucination of Large Multi-modal Models (LMMs) for understanding video-language interaction in the ...
You can download a source code package of the latest release from Releases page. Visual Studio users can install the precompiled binaries from the SF2cute NuGet ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results