Note
Click here to download the full example code
Getting Started¶
Author: Yi-Hsiang Lai (seanlatias@github)
In this tutorial, we demonstrate the basic usage of HeteroCL.
Import HeteroCL¶
We usually use hcl
as the acronym of HeteroCL.
import heterocl as hcl
Initialize the Environment¶
We need to initialize the environment for each HeteroCL application. We can
do this by calling the API hcl.init()
. We can also set the default data
type for every computation via this API. The default data type is 32-bit
integers.
Note
For more information on the data types, please see Data Type Customization.
hcl.init()
Algorithm Definition¶
After we initialize, we define the algorithm by using a Python function definition, where the arguments are the input tensors. The function can optionally return tensors as outputs. In this example, the two inputs are a scalar a and a tensor A, and the output is also a tensor B. The main difference between a scalar and a tensor is that a scalar cannot be updated.
Within the algorithm definition, we use HeteroCL APIs to describe the
operations. In this example, we use a tensor-based declarative-style
operation hcl.compute
. We also show the equivalent Python code.
Note
For more information on the APIs, please see HeteroCL Compute APIs
def simple_compute(a, A):
B = hcl.compute(A.shape, lambda x, y: A[x, y] + a.v, "B")
"""
The above API is equivalent to the following Python code.
for x in range(0, 10):
for y in range(0, 10):
B[x, y] = A[x, y] + a
"""
return B
Inputs/Outputs Definition¶
One of the advantages of such modularized algorithm definition is that we
can reuse the defined function with different input settings. We use
hcl.placeholder
to set the inputs, where we specify the shape, name,
and data type. The shape must be specified and should be in the form of a
tuple. If it is empty (i.e., ()), the returned object is a scalar.
Otherwise, the returned object is a tensor. The rest two fields are
optional. In this example, we define a scalar input a and a
two-dimensional tensor input A.
Note
For more information on the interfaces, please see
heterocl.placeholder
a = hcl.placeholder((), "a")
A = hcl.placeholder((10, 10), "A")
Apply Hardware Customization¶
Usually, our next step is apply various hardware customization techniques to
the application. In this tutorial, we skip this step which will be discussed
in the later tutorials. However, we still need to build a default schedule
by using hcl.create_schedule
whose inputs are a list of inputs and
the Python function that defines the algorithm.
s = hcl.create_schedule([a, A], simple_compute)
Inspect the Intermediate Representation (IR)¶
A HeteroCL program will be lowered to an IR before backend code generation. HeteroCL provides an API for users to inspect the lowered IR. This could be helpful for debugging.
print(hcl.lower(s))
module {
func.func @top(%arg0: memref<1xi32>, %arg1: memref<10x10xi32>) -> memref<10x10xi32> attributes {itypes = "ss", otypes = "s"} {
%0 = memref.alloc() {name = "B"} : memref<10x10xi32>
affine.for %arg2 = 0 to 10 {
affine.for %arg3 = 0 to 10 {
%1 = affine.load %arg1[%arg2, %arg3] {from = "A"} : memref<10x10xi32>
%2 = affine.load %arg0[0] {from = "a"} : memref<1xi32>
%3 = arith.extsi %1 : i32 to i33
%4 = arith.extsi %2 : i32 to i33
%5 = arith.addi %3, %4 : i33
%6 = arith.trunci %5 : i33 to i32
affine.store %6, %0[%arg2, %arg3] {to = "B"} : memref<10x10xi32>
} {loop_name = "y"}
} {loop_name = "x", op_name = "B"}
return %0 : memref<10x10xi32>
}
}
Create the Executable¶
The next step is to build the executable by using hcl.build
. You can
define the target of the executable, where the default target is llvm.
Namely, the executable will be run on CPU. The input for this API is the
schedule we just created.
f = hcl.build(s)
Prepare the Inputs/Outputs for the Executable¶
To run the generated executable, we can feed it with Numpy arrays by using
hcl.asarray
. This API transforms a Numpy array to a HeteroCL container
that is used as inputs/outputs to the executable. In this tutorial, we
randomly generate the values for our input tensor A. Note that since we
return a new tensor at the end of our algorithm, we also need to prepare
an input array for tensor B.
import numpy as np
hcl_a = 10
np_A = np.random.randint(100, size=A.shape)
hcl_A = hcl.asarray(np_A)
hcl_B = hcl.asarray(np.zeros(A.shape))
Run the Executable¶
With the prepared inputs/outputs, we can finally feed them to our executable.
f(hcl_a, hcl_A, hcl_B)
View the Results¶
To view the results, we can transform the HeteroCL tensors back to Numpy
arrays by using asnumpy()
.
10
[[29 86 16 51 66 41 65 12 78 1]
[44 51 80 25 32 44 24 2 64 32]
[81 72 51 78 37 26 20 27 57 23]
[87 62 54 26 18 18 86 9 68 68]
[ 6 33 65 30 71 56 33 91 76 46]
[ 3 31 64 58 23 12 14 59 67 61]
[60 19 90 10 22 50 32 46 62 52]
[47 61 62 26 10 39 96 7 24 50]
[16 72 39 69 23 10 57 77 23 59]
[17 44 46 21 39 10 43 43 2 62]]
[[ 39 96 26 61 76 51 75 22 88 11]
[ 54 61 90 35 42 54 34 12 74 42]
[ 91 82 61 88 47 36 30 37 67 33]
[ 97 72 64 36 28 28 96 19 78 78]
[ 16 43 75 40 81 66 43 101 86 56]
[ 13 41 74 68 33 22 24 69 77 71]
[ 70 29 100 20 32 60 42 56 72 62]
[ 57 71 72 36 20 49 106 17 34 60]
[ 26 82 49 79 33 20 67 87 33 69]
[ 27 54 56 31 49 20 53 53 12 72]]
Let’s run a test
assert np.array_equal(np_B, np_A + 10)
Total running time of the script: ( 0 minutes 0.080 seconds)