Timing-based Profiling¶
A new timing-based profiling feature has been added to help measure the performance of the module during execution.
To enable profiling, use the profile flag in the build method in
allo/dataflow.py:
def build(
func,
target="vitis_hls",
mode="csim",
project="top.prj",
configs=None,
wrap_io=True,
opt_default=True,
enable_tensor=False,
mapping_primitives: list[tuple[str, list]] = [],
profile=False,
warmup=20,
num_iters=100,
trace: list[tuple[str, tuple[int, ...]]] = None,
trace_size: int = 4096,
device_type: str = None,
)
Related Parameters:
profile(bool): Set toTrueto enable profiling. When enabled, the module performs extra warm-up and test iterations.warmup(int): Number of initial iterations to warm up the system. These iterations are excluded from the timing measurements. Default is20.num_iters(int): Number of timed iterations used to compute execution time. Default is100.
Example¶
import allo
from allo.ir.types import int16, int32, float32
import allo.dataflow as df
import numpy as np
from allo.memory import Layout
Ty = int16
M, N, K = 128, 128, 32
Pm, Pn, Pk = 4, 4, 1
Mt, Nt, Kt = M // Pm, N // Pn, K // Pk
LyA = Layout("S1S2")
LyB = Layout("S2S0")
LyC = Layout("S1S0")
@df.region()
def top1():
@df.kernel(mapping=[Pk, Pm, Pn])
def gemm(A: Ty[M, K] @ LyA, B: Ty[K, N] @ LyB, C: int32[M, N] @ LyC):
C[:, :] = allo.matmul(A, B)
mod = df.build(
top1,
target="aie",
profile=True,
warmup=200,
num_iters=1000,
)
A = np.random.randint(0, 32, (M, K)).astype(np.int16)
B = np.random.randint(0, 32, (K, N)).astype(np.int16)
C = np.zeros((M, N)).astype(np.int32)
tmp_C = np.zeros((M, N)).astype(np.int32)
mod(A, B, C)