Getting Started

Author: Yi-Hsiang Lai (seanlatias@github)

In this tutorial, we demonstrate the basic usage of HeteroCL.

Import HeteroCL

We usually use hcl as the acronym of HeteroCL.

import heterocl as hcl

Initialize the Environment

We need to initialize the environment for each HeteroCL application. We can do this by calling the API hcl.init(). We can also set the default data type for every computation via this API. The default data type is 32-bit integers.

Note

For more information on the data types, please see Data Type Customization.

hcl.init()

Algorithm Definition

After we initialize, we define the algorithm by using a Python function definition, where the arguments are the input tensors. The function can optionally return tensors as outputs. In this example, the two inputs are a scalar a and a tensor A, and the output is also a tensor B. The main difference between a scalar and a tensor is that a scalar cannot be updated.

Within the algorithm definition, we use HeteroCL APIs to describe the operations. In this example, we use a tensor-based declarative-style operation hcl.compute. We also show the equivalent Python code.

Note

For more information on the APIs, please see HeteroCL Compute APIs

def simple_compute(a, A):
    B = hcl.compute(A.shape, lambda x, y: A[x, y] + a.v, "B")
    """
    The above API is equivalent to the following Python code.

    for x in range(0, 10):
        for y in range(0, 10):
            B[x, y] = A[x, y] + a
    """

    return B

Inputs/Outputs Definition

One of the advantages of such modularized algorithm definition is that we can reuse the defined function with different input settings. We use hcl.placeholder to set the inputs, where we specify the shape, name, and data type. The shape must be specified and should be in the form of a tuple. If it is empty (i.e., ()), the returned object is a scalar. Otherwise, the returned object is a tensor. The rest two fields are optional. In this example, we define a scalar input a and a two-dimensional tensor input A.

Note

For more information on the interfaces, please see heterocl.placeholder

a = hcl.placeholder((), "a")
A = hcl.placeholder((10, 10), "A")

Apply Hardware Customization

Usually, our next step is apply various hardware customization techniques to the application. In this tutorial, we skip this step which will be discussed in the later tutorials. However, we still need to build a default schedule by using hcl.create_schedule whose inputs are a list of inputs and the Python function that defines the algorithm.

s = hcl.create_schedule([a, A], simple_compute)

Inspect the Intermediate Representation (IR)

A HeteroCL program will be lowered to an IR before backend code generation. HeteroCL provides an API for users to inspect the lowered IR. This could be helpful for debugging.

print(hcl.lower(s))
module {
  func.func @top(%arg0: memref<1xi32>, %arg1: memref<10x10xi32>) -> memref<10x10xi32> attributes {itypes = "ss", otypes = "s"} {
    %0 = memref.alloc() {name = "B"} : memref<10x10xi32>
    affine.for %arg2 = 0 to 10 {
      affine.for %arg3 = 0 to 10 {
        %1 = affine.load %arg1[%arg2, %arg3] {from = "A"} : memref<10x10xi32>
        %2 = affine.load %arg0[0] {from = "a"} : memref<1xi32>
        %3 = arith.extsi %1 : i32 to i33
        %4 = arith.extsi %2 : i32 to i33
        %5 = arith.addi %3, %4 : i33
        %6 = arith.trunci %5 : i33 to i32
        affine.store %6, %0[%arg2, %arg3] {to = "B"} : memref<10x10xi32>
      } {loop_name = "y"}
    } {loop_name = "x", op_name = "B"}
    return %0 : memref<10x10xi32>
  }
}

Create the Executable

The next step is to build the executable by using hcl.build. You can define the target of the executable, where the default target is llvm. Namely, the executable will be run on CPU. The input for this API is the schedule we just created.

f = hcl.build(s)

Prepare the Inputs/Outputs for the Executable

To run the generated executable, we can feed it with Numpy arrays by using hcl.asarray. This API transforms a Numpy array to a HeteroCL container that is used as inputs/outputs to the executable. In this tutorial, we randomly generate the values for our input tensor A. Note that since we return a new tensor at the end of our algorithm, we also need to prepare an input array for tensor B.

import numpy as np

hcl_a = 10
np_A = np.random.randint(100, size=A.shape)
hcl_A = hcl.asarray(np_A)
hcl_B = hcl.asarray(np.zeros(A.shape))

Run the Executable

With the prepared inputs/outputs, we can finally feed them to our executable.

f(hcl_a, hcl_A, hcl_B)

View the Results

To view the results, we can transform the HeteroCL tensors back to Numpy arrays by using asnumpy().

np_A = hcl_A.asnumpy()
np_B = hcl_B.asnumpy()

print(hcl_a)
print(np_A)
print(np_B)
10
[[29 86 16 51 66 41 65 12 78  1]
 [44 51 80 25 32 44 24  2 64 32]
 [81 72 51 78 37 26 20 27 57 23]
 [87 62 54 26 18 18 86  9 68 68]
 [ 6 33 65 30 71 56 33 91 76 46]
 [ 3 31 64 58 23 12 14 59 67 61]
 [60 19 90 10 22 50 32 46 62 52]
 [47 61 62 26 10 39 96  7 24 50]
 [16 72 39 69 23 10 57 77 23 59]
 [17 44 46 21 39 10 43 43  2 62]]
[[ 39  96  26  61  76  51  75  22  88  11]
 [ 54  61  90  35  42  54  34  12  74  42]
 [ 91  82  61  88  47  36  30  37  67  33]
 [ 97  72  64  36  28  28  96  19  78  78]
 [ 16  43  75  40  81  66  43 101  86  56]
 [ 13  41  74  68  33  22  24  69  77  71]
 [ 70  29 100  20  32  60  42  56  72  62]
 [ 57  71  72  36  20  49 106  17  34  60]
 [ 26  82  49  79  33  20  67  87  33  69]
 [ 27  54  56  31  49  20  53  53  12  72]]

Let’s run a test

assert np.array_equal(np_B, np_A + 10)

Total running time of the script: ( 0 minutes 0.080 seconds)

Gallery generated by Sphinx-Gallery