Transformer Math 101

We present basic math related to computation and memory usage for transformers

April 18, 2023 · Quentin Anthony, Stella Biderman, Hailey Schoelkopf

Rotary Embeddings: A Relative Revolution

Rotary Positional Embedding (RoPE) is a new type of position encoding that unifies absolute and relative approaches. We put it to the test.