rw-book-cover

Metadata

Highlights

  • Use 3D to visualize matrix multiplication expressions, attention heads with real weights, and more. Matrix multiplications (matmuls) are the building blocks of today’s ML models. This note presents mm, a visualization tool for matmuls and compositions of matmuls. (View Highlight)
  • Because mm uses all three spatial dimensions, it helps build intuition and spark ideas with less cognitive overhead than the usual squares-on-paper idioms, especially (though not only) for visual/spatial thinkers. (View Highlight)
  • mm is fully interactive, runs in the browser or notebook iframes and keeps its complete state in the URL, so links are shareable sessions (the screenshots and videos in this note all have links that open the visualizations in the tool). This reference guide describes all of the available functionality. (View Highlight)
  • Now the computation makes geometric sense: each location i, j in the result matrix anchors a vector running along the depth dimension k in the cube’s interior, where the horizontal plane extending from row i in L and a vertical plane extending from column j in R intersect. Along this vector, pairs of (i, k) (k, j) elements from the left and right arguments meet and are multiplied, and the resulting products are summed along k and the result is deposited in location i, j of the result. (View Highlight)
  • project two orthogonal matrices into the interior of a cube • multiply the pair of values at each intersection, forming a grid of products • sum along the third orthogonal dimension to produce a result matrix. (View Highlight)
  • Dot product First, the canonical algorithm - computing each result element by taking the dot product of the corresponding left row and right column. What we see in the animation is the sweep of multiplied value vectors through the cube’s interior, each delivering a summed result at the corresponding position. (View Highlight)

New highlights added September 26, 2023 at 10:26 AM

  • put” or “output” dimensions. (Concretely in the context of this example, this means that the width of B is greater than the widths of A or C.) As in the single matmul examples, the floating arrows point towards the result matrix, blue vane coming from the left argument (View Highlight)