Timing-based Profiling

A new timing-based profiling feature has been added to help measure the performance of the module during execution.

To enable profiling, use the profile flag in the build method in allo/dataflow.py:

def build(
    func,
    target="vitis_hls",
    mode="csim",
    project="top.prj",
    configs=None,
    wrap_io=True,
    opt_default=True,
    enable_tensor=False,
    mapping_primitives: list[tuple[str, list]] = [],
    profile=False,
    warmup=20,
    num_iters=100,
    trace: list[tuple[str, tuple[int, ...]]] = None,
    trace_size: int = 4096,
    device_type: str = None,
)

Related Parameters:

  • profile (bool): Set to True to enable profiling. When enabled, the module performs extra warm-up and test iterations.

  • warmup (int): Number of initial iterations to warm up the system. These iterations are excluded from the timing measurements. Default is 20.

  • num_iters (int): Number of timed iterations used to compute execution time. Default is 100.

Example

import allo
from allo.ir.types import int16, int32, float32
import allo.dataflow as df
import numpy as np
from allo.memory import Layout

Ty = int16
M, N, K = 128, 128, 32
Pm, Pn, Pk = 4, 4, 1
Mt, Nt, Kt = M // Pm, N // Pn, K // Pk

LyA = Layout("S1S2")
LyB = Layout("S2S0")
LyC = Layout("S1S0")

@df.region()
def top1():
    @df.kernel(mapping=[Pk, Pm, Pn])
    def gemm(A: Ty[M, K] @ LyA, B: Ty[K, N] @ LyB, C: int32[M, N] @ LyC):
        C[:, :] = allo.matmul(A, B)

mod = df.build(
    top1,
    target="aie",
    profile=True,
    warmup=200,
    num_iters=1000,
)
A = np.random.randint(0, 32, (M, K)).astype(np.int16)
B = np.random.randint(0, 32, (K, N)).astype(np.int16)
C = np.zeros((M, N)).astype(np.int32)
tmp_C = np.zeros((M, N)).astype(np.int32)
mod(A, B, C)