Puzzle 7: 2D Blocks

Overview

Implement a kernel that adds 10 to each position of matrix a and stores it in out.

Note: You have fewer threads per block than the size of a in both directions.

Blocks 2D visualization

Key concepts

  • Block-based processing
  • Grid-block coordination
  • Multi-block indexing
  • Memory access patterns

🔑 2D thread indexing convention

We extend the block-based indexing from puzzle 04 to 2D:

Global position calculation:
row = block_dim.y * block_idx.y + thread_idx.y
col = block_dim.x * block_idx.x + thread_idx.x

For example, with 2×2 blocks in a 4×4 grid:

Block (0,0):   Block (1,0):
[0,0  0,1]     [0,2  0,3]
[1,0  1,1]     [1,2  1,3]

Block (0,1):   Block (1,1):
[2,0  2,1]     [2,2  2,3]
[3,0  3,1]     [3,2  3,3]

Each position shows (row, col) for that thread’s global index. The block dimensions and indices work together to ensure:

  • Continuous coverage of the 2D space
  • No overlap between blocks
  • Efficient memory access patterns

Implementation approaches

🔰 Raw memory approach

Learn how to handle multi-block operations with manual indexing.

📐 LayoutTensor Version

Use LayoutTensor features to elegantly handle block-based processing.

💡 Note: See how LayoutTensor simplifies block coordination and memory access patterns.