Our WS ends this problem because when used with Group Normalization and trained with 1 image/GPU, WS is able to match or outperform the performances of BN trained with large batch sizes with only 2 ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results