Abstract: Transformer-based models, such as Bidirectional Encoder Representations from Transformers (BERT), cannot process long sequences because their self-attention operation scales quadratically ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results