Multi-Threaded Simulator (CPU)¶
This document demonstrates how to simulate a dataflow design using the simulator backend in Allo. The simulator backend provides a fast and flexible environment for verifying the behavior of dataflow kernels before deploying them to hardware. In this example, a simple producer-consumer model is implemented, where data is produced from an input matrix, sent through a pipe, and then consumed with a basic arithmetic operation.
Dataflow Kernel Definition¶
The design consists of a top-level region that contains two kernels: a producer and a consumer. The producer reads data from an input matrix and sends each element through a pipe, while the consumer receives the data, increments it by one, and writes the result to an output matrix. The following code illustrates the kernel definitions using Allo’s dataflow API:
import allo
from allo.ir.types import float32
import allo.dataflow as df
import numpy as np
Ty = float32
M, N, K = 16, 16, 16
@df.region()
def top():
# Create a pipe with a depth of 4
pipe = df.pipe(dtype=Ty, shape=(), depth=4)
@df.kernel(mapping=[1])
def producer(A: Ty[M, N]):
for i, j in allo.grid(M, N):
# Load data from the input matrix
out: Ty = A[i, j]
# Send data to the pipe
pipe.put(out)
@df.kernel(mapping=[1])
def consumer(B: Ty[M, N]):
for i, j in allo.grid(M, N):
# Receive data from the pipe
data = pipe.get()
# Perform a simple computation (increment by 1)
B[i, j] = data + 1
Simulation and Testing¶
To verify the correctness of the dataflow design, a simulation is executed using the simulator backend. The test function initializes an input matrix with random values and an output matrix filled with zeros. The simulation is performed by building the module with the target set to “simulator”. After running the simulation, the output is compared against the expected result using NumPy’s testing utilities.
A = np.random.rand(M, N).astype(np.float32)
B = np.zeros((M, N), dtype=np.float32)
sim_mod = df.build(top, target="simulator")
sim_mod(A, B)
np.testing.assert_allclose(B, A + 1, rtol=1e-5, atol=1e-5)
print("Dataflow Simulator Passed!")
The simulator is implemented using the OMP dialect in MLIR, so it can natively support multi-threaded execution on CPU, which greatly speeds up functional testing at the first place.