LLaPa is a vision-language model (VLM) framework designed for multimodal procedural planning. It can generate executable action sequences based on textual task descriptions and visual environment ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results