Transformer Math 101
We present basic math related to computation and memory usage for transformers
We present basic math related to computation and memory usage for transformers
Rotary Positional Embedding (RoPE) is a new type of position encoding that unifies absolute and relative approaches. We put it to the test.