Note
Click here to download the full example code
HeteroCL Compute APIs¶
Author: Yi-Hsiang Lai (seanlatias@github)
In this tutorial, we will show more HeteroCL compute APIs. These APIs are used to build the algorithm. Note that in HeteroCL, the compute APIs can be used along with the imperative DSL.
import heterocl as hcl
hcl.compute
¶
We have introduced this API before. This API returns a new tensor whose values are defined in an elementwise fashion. Following we show the API’s prototype.
compute(shape, fcompute, name, dtype)
shape
defines the shape of the output tensor. fcompute
is a lambda
function that describes the elementwise definition. name
and dtype
are optional. We show an example below.
hcl.init()
A = hcl.placeholder((10,), "A")
B = hcl.placeholder((10,), "B")
def compute_example(A, B):
return hcl.compute(A.shape, lambda x: A[x] + B[x], "C")
s = hcl.create_schedule([A, B], compute_example)
print(hcl.lower(s))
module {
func.func @top(%arg0: memref<10xi32>, %arg1: memref<10xi32>) -> memref<10xi32> attributes {itypes = "ss", otypes = "s"} {
%0 = memref.alloc() {name = "C"} : memref<10xi32>
affine.for %arg2 = 0 to 10 {
%1 = affine.load %arg0[%arg2] {from = "A"} : memref<10xi32>
%2 = affine.load %arg1[%arg2] {from = "B"} : memref<10xi32>
%3 = arith.extsi %1 : i32 to i33
%4 = arith.extsi %2 : i32 to i33
%5 = arith.addi %3, %4 : i33
%6 = arith.trunci %5 : i33 to i32
affine.store %6, %0[%arg2] {to = "C"} : memref<10xi32>
} {loop_name = "x", op_name = "C"}
return %0 : memref<10xi32>
}
}
hcl.update
¶
This API is similar to hcl.compute in that it defines how you update a tensor in an elementwise fashion. Note that this API does not return a new tensor. More specifically, the return value is None.
hcl.update(tensor, fupdate, name)
tensor
is the tensor we want ot update. fupate
is a lambda function
that describes the elelmentwise update behavior. name
is optional. We
show an example below that does the similar computation as compute_example.
The difference is that instead of returning a new tensor C, we send it in
as an input and update it in place. We can see that the generated IR is
almost the same.
hcl.init()
A = hcl.placeholder((10,), "A")
B = hcl.placeholder((10,), "B")
C = hcl.placeholder((10,), "C")
def update_example(A, B, C):
hcl.update(C, lambda x: A[x] + B[x], "U")
s = hcl.create_schedule([A, B, C], update_example)
print(hcl.lower(s))
module {
func.func @top(%arg0: memref<10xi32>, %arg1: memref<10xi32>, %arg2: memref<10xi32>) attributes {itypes = "sss", otypes = ""} {
affine.for %arg3 = 0 to 10 {
%0 = affine.load %arg0[%arg3] {from = "A"} : memref<10xi32>
%1 = affine.load %arg1[%arg3] {from = "B"} : memref<10xi32>
%2 = arith.extsi %0 : i32 to i33
%3 = arith.extsi %1 : i32 to i33
%4 = arith.addi %2, %3 : i33
%5 = arith.trunci %4 : i33 to i32
affine.store %5, %arg2[%arg3] {to = "C"} : memref<10xi32>
} {loop_name = "x", op_name = "U"}
return
}
}
hcl.mutate
¶
This API allows users to describe any loops with vector code, even if the loop body does not have any common pattern or contains imperative DSL. This API is useful when we want to perform optimization.
hcl.mutate(domain, fbody, name)
domain
describes the iteration domain of our original for loop.
fbody
is the body statement of the for loop. name
is optional. We
can describe the same computation in the previous two examples using this
API.
hcl.init()
A = hcl.placeholder((10,), "A")
B = hcl.placeholder((10,), "B")
C = hcl.placeholder((10,), "C")
def mut_example(A, B, C):
def loop_body(x):
C[x] = A[x] + B[x]
hcl.mutate((10,), lambda x: loop_body(x), "M")
s = hcl.create_schedule([A, B, C], mut_example)
print(hcl.lower(s))
module {
func.func @top(%arg0: memref<10xi32>, %arg1: memref<10xi32>, %arg2: memref<10xi32>) attributes {itypes = "sss", otypes = ""} {
affine.for %arg3 = 0 to 10 {
%0 = affine.load %arg0[%arg3] {from = "A"} : memref<10xi32>
%1 = affine.load %arg1[%arg3] {from = "B"} : memref<10xi32>
%2 = arith.extsi %0 : i32 to i33
%3 = arith.extsi %1 : i32 to i33
%4 = arith.addi %2, %3 : i33
%5 = arith.trunci %4 : i33 to i32
affine.store %5, %arg2[%arg3] {to = "C"} : memref<10xi32>
} {loop_name = "x", op_name = "M"}
return
}
}
Note that in this example, we are not allowed to directly write the assignment statement inside the lambda function. This is forbidden by Python syntax rules.
Combine Imperative DSL with Compute APIs¶
HeteroCL allows users to write a mixed-paradigm programming application. This is common when performing reduction operations. Although HeteroCL provides APIs for simple reduction operations such as summation and finding the maximum number, for more complexed reduction operations such as sorting, we need to describe them manually. Following we show an example of finding the maximum two values in a tensor.
hcl.init()
A = hcl.placeholder((10,), "A")
M = hcl.placeholder((2,), "M")
def find_max_two(A, M):
def loop_body(x):
with hcl.if_(A[x] > M[0]):
with hcl.if_(A[x] > M[1]):
M[0] = M[1]
M[1] = A[x]
with hcl.else_():
M[0] = A[x]
hcl.mutate(A.shape, lambda x: loop_body(x))
s = hcl.create_schedule([A, M], find_max_two)
f = hcl.build(s)
import numpy as np
hcl_A = hcl.asarray(np.random.randint(50, size=(10,)))
hcl_M = hcl.asarray(np.array([-1, -1]))
f(hcl_A, hcl_M)
np_A = hcl_A.asnumpy()
np_M = hcl_M.asnumpy()
print(np_A)
print(np_M)
assert np.array_equal(np_M, np.sort(np_A)[-2:])
[ 6 17 29 39 28 46 17 25 4 35]
[39 46]
Total running time of the script: ( 0 minutes 0.068 seconds)