Puzzle 3: Guards
Overview
Implement a kernel that adds 10 to each position of vector a
and stores it in vector out
.
Note: You have more threads than positions. This means you need to protect against out-of-bounds memory access.
Key concepts
In this puzzle, you’ll learn about:
- Handling thread/data size mismatches
- Preventing out-of-bounds memory access
- Using conditional execution in GPU kernels
- Safe memory access patterns
Mathematical Description
For each thread \(i\): \[\Large \text{if}\ i < \text{size}: out[i] = a[i] + 10\]
Memory Safety Pattern
Thread 0 (i=0): if 0 < size: out[0] = a[0] + 10 ✓ Valid
Thread 1 (i=1): if 1 < size: out[1] = a[1] + 10 ✓ Valid
Thread 2 (i=2): if 2 < size: out[2] = a[2] + 10 ✓ Valid
Thread 3 (i=3): if 3 < size: out[3] = a[3] + 10 ✓ Valid
Thread 4 (i=4): if 4 < size: ❌ Skip (out of bounds)
Thread 5 (i=5): if 5 < size: ❌ Skip (out of bounds)
💡 Note: Boundary checking becomes increasingly complex with:
- Multi-dimensional arrays
- Different array shapes
- Complex access patterns
Code to complete
alias SIZE = 4
alias BLOCKS_PER_GRID = 1
alias THREADS_PER_BLOCK = (8, 1)
alias dtype = DType.float32
fn add_10_guard(
out: UnsafePointer[Scalar[dtype]],
a: UnsafePointer[Scalar[dtype]],
size: Int,
):
local_i = thread_idx.x
# FILL ME IN (roughly 2 lines)
View full file: problems/p03/p03.mojo
Tips
- Store
thread_idx.x
inlocal_i
- Add guard:
if local_i < size
- Inside guard:
out[local_i] = a[local_i] + 10.0
Running the code
To test your solution, run the following command in your terminal:
magic run p03
Your output will look like this if the puzzle isn’t solved yet:
out: HostBuffer([0.0, 0.0, 0.0, 0.0])
expected: HostBuffer([10.0, 11.0, 12.0, 13.0])
Solution
fn add_10_guard(
out: UnsafePointer[Scalar[dtype]],
a: UnsafePointer[Scalar[dtype]],
size: Int,
):
local_i = thread_idx.x
if local_i < size:
out[local_i] = a[local_i] + 10.0
This solution:
- Gets thread index with
local_i = thread_idx.x
- Guards against out-of-bounds access with
if local_i < size
- Inside guard: adds 10 to input value
Looking ahead
While simple boundary checks work here, consider these challenges:
- What about 2D/3D array boundaries?
- How to handle different shapes efficiently?
- What if we need padding or edge handling?
Example of growing complexity:
# Current: 1D bounds check
if i < size: ...
# Coming soon: 2D bounds check
if i < height and j < width: ...
# Later: 3D with padding
if i < height and j < width and k < depth and
i >= padding and j >= padding: ...
These boundary handling patterns will become more elegant when we learn about LayoutTensor in Puzzle 4, which provides built-in boundary checking and shape management.