More and more large multimodal models (LMMs) are being released from time to time, but the finetuning of these models is not always straightforward. This codebase aims to provide a unified, minimal ...
Abstract: We introduce WildVideo, an open-world benchmark dataset designed to address how to assess hallucination of Large Multi-modal Models (LMMs) for understanding video-language interaction in the ...
You can download a source code package of the latest release from Releases page. Visual Studio users can install the precompiled binaries from the SF2cute NuGet ...