Frontend Syntax Guide¶
This document provides a comprehensive reference for the Allo frontend syntax and semantics. Allo uses a Python-based domain-specific language (DSL) that requires strict type annotations to enable hardware synthesis and optimization.
Function Definition¶
Basic Function Signature¶
Allo kernels are defined as Python functions with explicit type annotations for all arguments and return types. Arguments and return types can be scalar or tensor types. The syntax follows Python’s type hint notation:
def kernel(arg1: Type1[Shape1], arg2: ScalarType) -> ReturnType[Shape]:
# function body
return result
Example: Scalar Arguments
def kernel(A: int32) -> int32:
return A + 1
Example: Matrix Multiplication
from allo.ir.types import int32
def gemm(A: int32[32, 32], B: int32[32, 32]) -> int32[32, 32]:
C: int32[32, 32] = 0
for i, j, k in allo.grid(32, 32, 32):
C[i, j] += A[i, k] * B[k, j]
return C
Multiple Return Values¶
Functions can return multiple values as a tuple:
def kernel(A: int32[M], B: int32[M]) -> (int32[M], int32[M]):
res0: int32[M] = 0
res1: int32[M] = 0
for i in range(M):
res0[i] = A[i] + 1
res1[i] = B[i] + 1
return res0, res1
The caller can unpack the returned tuple:
C, D = callee(A[i], B[i])
To ignore certain return values, use underscore:
C, _ = callee(A[0], B[0]) # Ignore second return value
No Return Value¶
Functions that don’t return a value can omit the return type annotation, use -> None,
or have an empty return:
def kernel(A: int32[32]):
pass # No return
def kernel(A: int32[32]) -> None:
return
def kernel(A: int32[32]):
return None
Variable Declaration and Assignment¶
Scalar Variables¶
Scalar variables are declared using Python’s type annotation syntax:
# Declaration with initialization
x: int32 = 0
y: float32 = 3.14
# Declaration without initialization
z: int32
# Assignment after declaration
z = x + y
Tensor Variables¶
Tensors are declared with their shape in the type annotation:
# 1D tensor
A: int32[10] = 0
# 2D tensor initialized to zero
B: int32[32, 32] = 0
# 4D tensor
C: float32[M, M, M, M] = 0
Initialization from Lists and NumPy Arrays¶
Tensors can be initialized from Python lists or NumPy arrays:
# From nested list (compile-time constant)
tmp: int32[2, 2] = [[1, 2], [3, 4]]
# From NumPy array (global constant)
arr = np.array([[1, 2], [3, 4]])
def kernel() -> int32:
tmp: int32[2, 2] = arr
return tmp[0, 0]
# Constant tensor slicing
np_A = np.array([[1, 2, 3, 4], [5, 6, 7, 8]], dtype=np.int32)
def kernel() -> int32[4]:
A: int32[4] = np_A[1] # Load second row as constant
return A
Copy Semantics¶
Tensors can be copied by assignment:
temp: int32[M, N] = 0
outp: int32[M, N] = temp # Copy temp to outp
# Copy from argument
def kernel(inp: int32[M, N]) -> int32[M, N]:
outp: int32[M, N] = inp
return outp
Loop Constructs¶
Range Loops¶
Standard Python range loops are supported with one, two, or three arguments:
# range(end)
for i in range(10):
A[i] = i
# range(start, end)
for i in range(10, 20):
A[i] = i
# range(start, end, step)
for i in range(0, 20, 2):
A[i] = i * 2
Note
break and continue are not supported in Allo.
Variable Loop Bounds¶
Loop bounds can be runtime variables:
def kernel(A: int32[10]):
for i in range(10):
for j in range(i + 1, 10): # Variable lower bound
for k in range(j * 2, 10): # Variable lower bound
A[k] += i - j
# Bounds from array elements
def kernel(A: int32[10], B: int32[10]):
for i in range(10):
for j in range(A[i], 10, A[i]): # Bounds from array
B[j] += i
Grid Loops¶
allo.grid provides a shorthand for nested loops:
# Equivalent to three nested for loops
for i, j, k in allo.grid(32, 32, 32):
C[i, j] += A[i, k] * B[k, j]
# 2D grid
for i, j in allo.grid(M, M):
res[i, j] = C[i, j] + 1
Named grids are useful for applying schedule optimizations:
for i, j, k in allo.grid(32, 32, 32, name="C"):
C[i, j] += A[i, k] * B[k, j]
While Loops¶
While loops with runtime conditions:
from allo.ir.types import index
def kernel(A: int32[10]):
i: index = 0
while i < 10:
A[i] = i
i += 1
Conditional Statements¶
If-Elif-Else¶
Standard Python conditional syntax:
def kernel(a: int32, b: int32) -> int32:
r: int32 = 0
if a == 0:
r = 1
elif a == 1:
r = 2
if b == 2: # Nested conditional
r = 3
else:
r = 4
return r
Logical Operators¶
Conditions can use and, or, and not:
if A[0] > 0 and b < 0:
r = 1
elif A[1] * 2 <= 1 or b + 1 >= 1:
r = 2
elif not flag:
r = 3
Multiple conditions can be chained:
if A[0] > 0 and A[1] > 0 and A[2] > 0 and b > 0 and c > 0:
r = 1
Select Expression (Ternary)¶
Python’s ternary expression for conditional assignment:
B[i] = 1 if A[i] % 2 == 0 else 0
# With type casting
B[i] = (i * 2) if A[i] % 2 == 0 else 0
Operators¶
Arithmetic Operators¶
Operator |
Description |
Example |
|---|---|---|
|
Addition |
|
|
Subtraction |
|
|
Multiplication |
|
|
Division (float) |
|
|
Floor division |
|
|
Modulo |
|
Unary Operators¶
vi: int32 = -(v + 1) # Negation
result = +(vi + vf) # Unary plus
Comparison Operators¶
All standard comparison operators are supported: ==, !=, <, <=, >, >=
Bitwise Operators¶
Operator |
Description |
Example |
|---|---|---|
|
Left shift |
|
|
Right shift |
|
|
Bitwise AND |
|
|
Bitwise OR |
|
|
Bitwise XOR |
|
Augmented Assignment¶
All augmented assignment operators work on both scalars and tensor elements:
C[i, j] += A[i, k] * B[k, j]
A[i] *= 2
A[i] -= 1
Array and Tensor Operations¶
Indexing¶
Standard multi-dimensional indexing:
value = A[i, j, k]
A[i, j] = value
Subviews¶
Accessing a sub-array by partial indexing:
def kernel(A: int32[10, 10]) -> int32[10]:
return A[5] # Returns row 5 as 1D array
def kernel(A: float32[5, 10, 15]) -> float32[15]:
return A[3, 2] # Returns a 1D slice
Dynamic subviews with variable indices:
def kernel(A: float32[5, 10, 15], i: index, j: index) -> float32[15]:
return A[i, j]
Slicing¶
Sub-tensor assignment using slices:
def slice_copy(A: int32[6, 6]) -> int32[6, 6]:
B: int32[2, 3] = 0
B[0, 0] = 1
A[0:2, 0:3] = B # Copy B into a slice of A
return A
Bit Operations on Integers¶
Access individual bits or bit ranges:
B[i] = A[i][0] # Access bit 0
B[i][0:2] = A[i] # Assign to bits 0-1 (upper bound exclusive)
Dynamic Shapes¶
For functions that accept tensors of unknown size at compile time:
def kernel(A: float32[...], B: float32[...], size: int32):
for i in range(size):
B[i] = A[i]
Nested Functions¶
Functions can be defined inside kernels:
def kernel(A: int32[10]) -> int32[10]:
B: int32[10] = 0
def foo(x: int32) -> int32:
return x + 1
for i in range(10):
B[i] = foo(A[i])
return B
Index Arguments¶
Use the index type for loop indices passed to functions:
from allo.ir.types import index
def kernel(A: int32[10]) -> int32[10]:
B: int32[10] = 0
def foo(A_: int32[10], x: index) -> int32:
C: int32[10] = 0
for i in range(10):
C[i] = A_[i] + 1
return C[x]
for i in range(10):
B[i] = foo(A, i)
return B
Built-in Functions¶
Min and Max¶
Element-wise minimum and maximum:
min_val = min(min_val, A[i])
max_val = max(max_val, A[i])
Type promotion is handled automatically:
res[0] = min(A[0], 0) # int8 with int
res[1] = max(A[1], 0.0) # int8 with float -> float comparison
Broadcast Binary Operations¶
Apply element-wise operations across tensors:
# Chained broadcast operations
result = allo.div(allo.mul(allo.sub(allo.add(A, 3), 1), 2), 2)
# Nested operations
result = allo.sub(50, allo.mul(2, allo.add(3, allo.div(10, A))))
ConstExpr (Compile-Time Constants)¶
ConstExpr declares compile-time constant values that can be used in loop bounds:
from allo.ir.types import ConstExpr, int32
M = 10
def kernel(A: int32[10]) -> int32[10]:
limit: ConstExpr[int32] = M // 2
B: int32[10]
for i in range(limit): # Loop bound is constant 5
B[i] = A[i] + 1
for i in range(limit, 10):
B[i] = A[i]
return B
ConstExpr Arithmetic¶
ConstExpr values can be computed from other ConstExpr values:
base: ConstExpr[int32] = 2
mult: ConstExpr[int32] = 3
offset: ConstExpr[int32] = base * mult # Computed at compile time: 6
Dependent ConstExpr¶
ConstExpr can depend on previously defined ConstExpr:
N: ConstExpr[int32] = 4
M: ConstExpr[int32] = N + 2 # 6
K: ConstExpr[int32] = M + 2 # 8
Using Helper Functions¶
Python helper functions can compute ConstExpr values at compile time:
import math
def compute_coefficient(i):
return math.cos(2.0 * math.pi * i / 8)
def kernel(A: float32[8], B: float32[8]):
with allo.meta_for(8) as i:
coef: ConstExpr[float32] = compute_coefficient(i)
B[i] = A[i] * coef
Note
ConstExpr variables must be initialized at declaration time. Uninitialized ConstExpr will raise an error.
Scoping Rules¶
Allo enforces C++-style Block Scoping rules, which differs from standard Python.
Scope Boundaries:
if,elif,else,for,while,meta_if,meta_for,meta_else.Rule: A variable declared for the first time inside a block is local to that block. It is not visible after the block exits.
Access: Inner blocks can read/write variables defined in outer blocks.
Reassignment Validity¶
A variable can be reassigned.
The new value must match the declared type of the variable.
Immutable Constants:
ConstExprvariables and values returned bydf.get_pid()are compile-time constants and cannot be reassigned.
Valid Scoping¶
Variables should be declared in the scope where they are used:
def kernel(a: int32) -> int32:
r: int32 = 0 # Declare outside conditional
if a == 0:
r = 1
else:
r = 4
return r
Local variables within a branch are allowed:
def kernel(a: int32) -> int32:
r: int32 = 0
if a > 0:
t: int32 = 1 # Local to if-branch
r = r + t
return r
Invalid Scoping¶
The following patterns will raise errors:
Declaring the same variable in multiple branches:
# ERROR: r is not accessible outside branches
def kernel(a: int32) -> int32:
if a == 0:
r: int32 = 1
else:
r: int32 = 4
return r # Error: r not in scope
Using loop-local variables outside the loop:
# ERROR: tmp is not accessible outside loop
def kernel(n: int32) -> int32:
for i in range(n):
tmp: int32 = i
return tmp # Error: tmp not in scope
Redefining loop variables in nested loops:
# ERROR: Cannot redefine i in nested loop
def kernel(n: int32) -> int32:
s: int32 = 0
for i in range(n):
for i in range(n): # Error: i already defined
s = s + i
return s
Meta-Programming Constructs¶
Allo provides compile-time meta-programming constructs that are evaluated during compilation, enabling conditional code generation and advanced optimizations.
Meta If/Elif/Else¶
Compile-time conditionals that generate different code based on conditions known at compile time. The conditions must be compile-time constants:
with allo.meta_if(condition1):
# Code generated only when condition1 is true
pass
with allo.meta_elif(condition2):
# Code generated only when condition1 is false and condition2 is true
pass
with allo.meta_else():
# Code generated when all previous conditions are false
pass
These are useful for:
Selecting different implementations based on compile-time parameters
Specializing kernels for specific data types or array sizes
Eliminating dead code at compile time
Meta For (Compile-Time Loop Unrolling)¶
allo.meta_for supports multiple argument formats similar to Python’s range. The loop bounds and step must be compile-time constants:
# Single argument: meta_for(upper)
with allo.meta_for(10) as i:
A[i] = i
# Two arguments: meta_for(lower, upper)
with allo.meta_for(5, 10) as i:
A[i] = i
# Three arguments: meta_for(lower, upper, step)
with allo.meta_for(0, 10, 2) as i:
A[i] = i * 2
Tensor Attributes and Methods¶
Allo provides several built-in attributes and methods for tensor manipulation.
Transpose (.T)¶
Transpose a tensor by reversing its dimensions:
def kernel(A: float32[3, 4]) -> float32[4, 3]:
return A.T # Transpose: shape becomes [4, 3]
Copy (.copy)¶
Create a copy of a tensor:
B = A.copy()
Bit Reverse (.reverse)¶
Reverse the bits of an integer value (useful for FFT algorithms):
reversed_bits = x.reverse
Type Conversion Functions¶
Explicit Type Casting¶
Use Python built-in functions for explicit type casting:
# Cast to float32
b: float32 = float(a)
# Cast to int32
c: int32 = int(b)
Fixed-Point Type Attributes¶
Access type metadata for fixed-point types:
from allo.ir.types import Fixed
def kernel(A: Fixed[16, 8]) -> int32:
return A.bits # Returns 16 (total bitwidth)
# A.fracs would return 8 (fractional bits)
Bitcast¶
Reinterpret the bit pattern of a value as a different type (preserves bits, changes interpretation):
# Reinterpret float32 bits as int32
int_bits = float_val.bitcast()
Note
bitcast preserves the bit pattern but changes the type interpretation.
This is different from type casting which preserves the value but may change the bits.
Library Operations¶
Allo provides high-level library operations that map to optimized implementations.
Matrix Operations¶
# Matrix multiplication
C = allo.matmul(A, B)
# Batch matrix multiplication
C = allo.bmm(A, B)
# Linear layer: X @ A.T + B
Y = allo.linear(X, A, B)
Tensor Manipulation¶
# Transpose with custom permutation
B = allo.transpose(A, permutation=(1, 0, 2))
# Reshape/view tensor
B = allo.view(A, new_shape)
# Concatenate tensors along an axis
C = allo.concat(A, B, axis=0)
Element-wise Operations¶
# Exponential
B = allo.exp(A)
# Logarithm
B = allo.log(A)
# Absolute value
B = allo.abs(A)
Neural Network Operations¶
# 2D Convolution (NCHW format)
output = allo.conv2d(input, kernel)
# Max pooling
output = allo.maxpool(input, kernel)
# Sum pooling
output = allo.sumpool(input, kernel)
# ReLU activation
output = allo.relu(input)
# Softmax
output = allo.softmax(input)
Templates and Type Parameters¶
Allo supports parameterized kernels using type parameters:
def kernel[Ty](flag: bool) -> "Ty":
X: Ty
if not flag:
X = 1
else:
X = 0
return X
s = allo.customize(kernel, instantiate=[int8])
For more details on templates, see the Template Kernels documentation.
Building and Execution¶
After defining a kernel, create a schedule and build the executable:
import allo
import numpy as np
s = allo.customize(gemm)
mod = s.build() # Default: LLVM backend
# Prepare inputs
np_A = np.random.randint(0, 10, size=(32, 32)).astype(np.int32)
np_B = np.random.randint(0, 10, size=(32, 32)).astype(np.int32)
# Execute
np_C = mod(np_A, np_B)
For HLS code generation:
mod = s.build(target="vhls")
print(mod.hls_code)