Key concepts
In this puzzle, you’ll learn about:
-
Basic GPU kernel structure
-
Thread indexing with
thread_idx.x
-
Simple parallel operations
-
Parallelism: Each thread executes independently
-
Thread indexing: Access element at position
i = thread_idx.x
-
Memory access: Read from
a[i]
and write toout[i]
-
Data independence: Each output depends only on its corresponding input
Code to complete
alias SIZE = 4
alias BLOCKS_PER_GRID = 1
alias THREADS_PER_BLOCK = SIZE
alias dtype = DType.float32
fn add_10(out: UnsafePointer[Scalar[dtype]], a: UnsafePointer[Scalar[dtype]]):
i = thread_idx.x
# FILL ME IN (roughly 1 line)
View full file: problems/p01/p01.mojo
Tips
- Store
thread_idx.x
ini
- Add 10 to
a[i]
- Store result in
out[i]
Running the code
To test your solution, run the following command in your terminal:
uv run poe p01
pixi run p01
Your output will look like this if the puzzle isn’t solved yet:
out: HostBuffer([0.0, 0.0, 0.0, 0.0])
expected: HostBuffer([10.0, 11.0, 12.0, 13.0])
Solution
fn add_10(
output: UnsafePointer[Scalar[dtype]], a: UnsafePointer[Scalar[dtype]]
):
i = thread_idx.x
output[i] = a[i] + 10.0
This solution:
- Gets thread index with
i = thread_idx.x
- Adds 10 to input value:
out[i] = a[i] + 10.0