In this paper, we propose a new multi-modal task, termed audio-visual instance segmentation (AVIS), which aims to simultaneously identify, segment and track individual sounding object instances in ...
A new AI framework can rewrite, remove or add a person’s words in video without reshooting, in a single end-to-end system.