Hover over the text to see the attention paid to each token, and details of the computation.

The first two columns show the query and key vectors. The third columns shows their elementwise product. The fourth shows their dot product (which is the sum of their elementwise product, scaled by `sqrt(d_head)`). The fifth shows their probabilities.

Your cache has a batch dimension, meaning you can select different sequences to show using the menu below.

Attention of name mover heads (lines mode)