Abstract: The use of vision transformers (ViT) in computer vision is increasing due to its limited inductive biases (e.g., locality, weight sharing, etc.) and increased scalability compared to other ...