Abstract: Existing image caption evaluation metrics, such as BLEU, ROUGE, and CIDEr primarily rely on high-level similarities like n-gram matching. Here, we propose TAGSim, a novel metric that ...