Sometimes you just have to stare at your gorgeous soup bowl and stunning moon-shaped wine holder the whole live long day.
Abstract: Despite significant results achieved by Contrastive Language-Image Pretraining (CLIP) in zero-shot image recognition, limited effort has been made exploring its potential for zero-shot video ...