Frontend Syntax Guide

This document provides a comprehensive reference for the Allo frontend syntax and semantics. Allo uses a Python-based domain-specific language (DSL) that requires strict type annotations to enable hardware synthesis and optimization.

Function Definition

Basic Function Signature

Allo kernels are defined as Python functions with explicit type annotations for all arguments and return types. Arguments and return types can be scalar or tensor types. The syntax follows Python’s type hint notation:

def kernel(arg1: Type1[Shape1], arg2: ScalarType) -> ReturnType[Shape]:
    # function body
    return result

Example: Scalar Arguments

def kernel(A: int32) -> int32:
    return A + 1

Example: Matrix Multiplication

from allo.ir.types import int32

def gemm(A: int32[32, 32], B: int32[32, 32]) -> int32[32, 32]:
    C: int32[32, 32] = 0
    for i, j, k in allo.grid(32, 32, 32):
        C[i, j] += A[i, k] * B[k, j]
    return C

Multiple Return Values

Functions can return multiple values as a tuple:

def kernel(A: int32[M], B: int32[M]) -> (int32[M], int32[M]):
    res0: int32[M] = 0
    res1: int32[M] = 0
    for i in range(M):
        res0[i] = A[i] + 1
        res1[i] = B[i] + 1
    return res0, res1

The caller can unpack the returned tuple:

C, D = callee(A[i], B[i])

To ignore certain return values, use underscore:

C, _ = callee(A[0], B[0])  # Ignore second return value

No Return Value

Functions that don’t return a value can omit the return type annotation, use -> None, or have an empty return:

def kernel(A: int32[32]):
    pass  # No return

def kernel(A: int32[32]) -> None:
    return

def kernel(A: int32[32]):
    return None

Variable Declaration and Assignment

Scalar Variables

Scalar variables are declared using Python’s type annotation syntax:

# Declaration with initialization
x: int32 = 0
y: float32 = 3.14

# Declaration without initialization
z: int32

# Assignment after declaration
z = x + y

Tensor Variables

Tensors are declared with their shape in the type annotation:

# 1D tensor
A: int32[10] = 0

# 2D tensor initialized to zero
B: int32[32, 32] = 0

# 4D tensor
C: float32[M, M, M, M] = 0

Initialization from Lists and NumPy Arrays

Tensors can be initialized from Python lists or NumPy arrays:

# From nested list (compile-time constant)
tmp: int32[2, 2] = [[1, 2], [3, 4]]

# From NumPy array (global constant)
arr = np.array([[1, 2], [3, 4]])
def kernel() -> int32:
    tmp: int32[2, 2] = arr
    return tmp[0, 0]

# Constant tensor slicing
np_A = np.array([[1, 2, 3, 4], [5, 6, 7, 8]], dtype=np.int32)
def kernel() -> int32[4]:
    A: int32[4] = np_A[1]  # Load second row as constant
    return A

Copy Semantics

Tensors can be copied by assignment:

temp: int32[M, N] = 0
outp: int32[M, N] = temp  # Copy temp to outp

# Copy from argument
def kernel(inp: int32[M, N]) -> int32[M, N]:
    outp: int32[M, N] = inp
    return outp

Loop Constructs

Range Loops

Standard Python range loops are supported with one, two, or three arguments:

# range(end)
for i in range(10):
    A[i] = i

# range(start, end)
for i in range(10, 20):
    A[i] = i

# range(start, end, step)
for i in range(0, 20, 2):
    A[i] = i * 2

Note

break and continue are not supported in Allo.

Variable Loop Bounds

Loop bounds can be runtime variables:

def kernel(A: int32[10]):
    for i in range(10):
        for j in range(i + 1, 10):  # Variable lower bound
            for k in range(j * 2, 10):  # Variable lower bound
                A[k] += i - j

# Bounds from array elements
def kernel(A: int32[10], B: int32[10]):
    for i in range(10):
        for j in range(A[i], 10, A[i]):  # Bounds from array
            B[j] += i

Grid Loops

allo.grid provides a shorthand for nested loops:

# Equivalent to three nested for loops
for i, j, k in allo.grid(32, 32, 32):
    C[i, j] += A[i, k] * B[k, j]

# 2D grid
for i, j in allo.grid(M, M):
    res[i, j] = C[i, j] + 1

Named grids are useful for applying schedule optimizations:

for i, j, k in allo.grid(32, 32, 32, name="C"):
    C[i, j] += A[i, k] * B[k, j]

While Loops

While loops with runtime conditions:

from allo.ir.types import index

def kernel(A: int32[10]):
    i: index = 0
    while i < 10:
        A[i] = i
        i += 1

Conditional Statements

If-Elif-Else

Standard Python conditional syntax:

def kernel(a: int32, b: int32) -> int32:
    r: int32 = 0
    if a == 0:
        r = 1
    elif a == 1:
        r = 2
        if b == 2:  # Nested conditional
            r = 3
    else:
        r = 4
    return r

Logical Operators

Conditions can use and, or, and not:

if A[0] > 0 and b < 0:
    r = 1
elif A[1] * 2 <= 1 or b + 1 >= 1:
    r = 2
elif not flag:
    r = 3

Multiple conditions can be chained:

if A[0] > 0 and A[1] > 0 and A[2] > 0 and b > 0 and c > 0:
    r = 1

Select Expression (Ternary)

Python’s ternary expression for conditional assignment:

B[i] = 1 if A[i] % 2 == 0 else 0

# With type casting
B[i] = (i * 2) if A[i] % 2 == 0 else 0

Operators

Arithmetic Operators

Operator

Description

Example

+

Addition

a + b

-

Subtraction

a - b

*

Multiplication

a * b

/

Division (float)

a / b

//

Floor division

a // b

%

Modulo

a % b

Unary Operators

vi: int32 = -(v + 1)  # Negation
result = +(vi + vf)   # Unary plus

Comparison Operators

All standard comparison operators are supported: ==, !=, <, <=, >, >=

Bitwise Operators

Operator

Description

Example

<<

Left shift

1 << v

>>

Right shift

64 >> v

&

Bitwise AND

1 & v

|

Bitwise OR

1 | v

^

Bitwise XOR

a ^ b

Augmented Assignment

All augmented assignment operators work on both scalars and tensor elements:

C[i, j] += A[i, k] * B[k, j]
A[i] *= 2
A[i] -= 1

Array and Tensor Operations

Indexing

Standard multi-dimensional indexing:

value = A[i, j, k]
A[i, j] = value

Subviews

Accessing a sub-array by partial indexing:

def kernel(A: int32[10, 10]) -> int32[10]:
    return A[5]  # Returns row 5 as 1D array

def kernel(A: float32[5, 10, 15]) -> float32[15]:
    return A[3, 2]  # Returns a 1D slice

Dynamic subviews with variable indices:

def kernel(A: float32[5, 10, 15], i: index, j: index) -> float32[15]:
    return A[i, j]

Slicing

Sub-tensor assignment using slices:

def slice_copy(A: int32[6, 6]) -> int32[6, 6]:
    B: int32[2, 3] = 0
    B[0, 0] = 1
    A[0:2, 0:3] = B  # Copy B into a slice of A
    return A

Bit Operations on Integers

Access individual bits or bit ranges:

B[i] = A[i][0]      # Access bit 0
B[i][0:2] = A[i]    # Assign to bits 0-1 (upper bound exclusive)

Dynamic Shapes

For functions that accept tensors of unknown size at compile time:

def kernel(A: float32[...], B: float32[...], size: int32):
    for i in range(size):
        B[i] = A[i]

Nested Functions

Functions can be defined inside kernels:

def kernel(A: int32[10]) -> int32[10]:
    B: int32[10] = 0

    def foo(x: int32) -> int32:
        return x + 1

    for i in range(10):
        B[i] = foo(A[i])
    return B

Index Arguments

Use the index type for loop indices passed to functions:

from allo.ir.types import index

def kernel(A: int32[10]) -> int32[10]:
    B: int32[10] = 0

    def foo(A_: int32[10], x: index) -> int32:
        C: int32[10] = 0
        for i in range(10):
            C[i] = A_[i] + 1
        return C[x]

    for i in range(10):
        B[i] = foo(A, i)
    return B

Built-in Functions

Min and Max

Element-wise minimum and maximum:

min_val = min(min_val, A[i])
max_val = max(max_val, A[i])

Type promotion is handled automatically:

res[0] = min(A[0], 0)      # int8 with int
res[1] = max(A[1], 0.0)    # int8 with float -> float comparison

Broadcast Binary Operations

Apply element-wise operations across tensors:

# Chained broadcast operations
result = allo.div(allo.mul(allo.sub(allo.add(A, 3), 1), 2), 2)

# Nested operations
result = allo.sub(50, allo.mul(2, allo.add(3, allo.div(10, A))))

ConstExpr (Compile-Time Constants)

ConstExpr declares compile-time constant values that can be used in loop bounds:

from allo.ir.types import ConstExpr, int32

M = 10

def kernel(A: int32[10]) -> int32[10]:
    limit: ConstExpr[int32] = M // 2
    B: int32[10]
    for i in range(limit):  # Loop bound is constant 5
        B[i] = A[i] + 1
    for i in range(limit, 10):
        B[i] = A[i]
    return B

ConstExpr Arithmetic

ConstExpr values can be computed from other ConstExpr values:

base: ConstExpr[int32] = 2
mult: ConstExpr[int32] = 3
offset: ConstExpr[int32] = base * mult  # Computed at compile time: 6

Dependent ConstExpr

ConstExpr can depend on previously defined ConstExpr:

N: ConstExpr[int32] = 4
M: ConstExpr[int32] = N + 2  # 6
K: ConstExpr[int32] = M + 2  # 8

Using Helper Functions

Python helper functions can compute ConstExpr values at compile time:

import math

def compute_coefficient(i):
    return math.cos(2.0 * math.pi * i / 8)

def kernel(A: float32[8], B: float32[8]):
    with allo.meta_for(8) as i:
        coef: ConstExpr[float32] = compute_coefficient(i)
        B[i] = A[i] * coef

Note

ConstExpr variables must be initialized at declaration time. Uninitialized ConstExpr will raise an error.

Scoping Rules

Allo enforces C++-style Block Scoping rules, which differs from standard Python.

  • Scope Boundaries: if, elif, else, for, while, meta_if, meta_for, meta_else.

  • Rule: A variable declared for the first time inside a block is local to that block. It is not visible after the block exits.

  • Access: Inner blocks can read/write variables defined in outer blocks.

Reassignment Validity

  • A variable can be reassigned.

  • The new value must match the declared type of the variable.

  • Immutable Constants: ConstExpr variables and values returned by df.get_pid() are compile-time constants and cannot be reassigned.

Valid Scoping

Variables should be declared in the scope where they are used:

def kernel(a: int32) -> int32:
    r: int32 = 0  # Declare outside conditional
    if a == 0:
        r = 1
    else:
        r = 4
    return r

Local variables within a branch are allowed:

def kernel(a: int32) -> int32:
    r: int32 = 0
    if a > 0:
        t: int32 = 1  # Local to if-branch
        r = r + t
    return r

Invalid Scoping

The following patterns will raise errors:

Declaring the same variable in multiple branches:

# ERROR: r is not accessible outside branches
def kernel(a: int32) -> int32:
    if a == 0:
        r: int32 = 1
    else:
        r: int32 = 4
    return r  # Error: r not in scope

Using loop-local variables outside the loop:

# ERROR: tmp is not accessible outside loop
def kernel(n: int32) -> int32:
    for i in range(n):
        tmp: int32 = i
    return tmp  # Error: tmp not in scope

Redefining loop variables in nested loops:

# ERROR: Cannot redefine i in nested loop
def kernel(n: int32) -> int32:
    s: int32 = 0
    for i in range(n):
        for i in range(n):  # Error: i already defined
            s = s + i
    return s

Meta-Programming Constructs

Allo provides compile-time meta-programming constructs that are evaluated during compilation, enabling conditional code generation and advanced optimizations.

Meta If/Elif/Else

Compile-time conditionals that generate different code based on conditions known at compile time. The conditions must be compile-time constants:

with allo.meta_if(condition1):
    # Code generated only when condition1 is true
    pass

with allo.meta_elif(condition2):
    # Code generated only when condition1 is false and condition2 is true
    pass

with allo.meta_else():
    # Code generated when all previous conditions are false
    pass

These are useful for:

  • Selecting different implementations based on compile-time parameters

  • Specializing kernels for specific data types or array sizes

  • Eliminating dead code at compile time

Meta For (Compile-Time Loop Unrolling)

allo.meta_for supports multiple argument formats similar to Python’s range. The loop bounds and step must be compile-time constants:

# Single argument: meta_for(upper)
with allo.meta_for(10) as i:
    A[i] = i

# Two arguments: meta_for(lower, upper)
with allo.meta_for(5, 10) as i:
    A[i] = i

# Three arguments: meta_for(lower, upper, step)
with allo.meta_for(0, 10, 2) as i:
    A[i] = i * 2

Tensor Attributes and Methods

Allo provides several built-in attributes and methods for tensor manipulation.

Transpose (.T)

Transpose a tensor by reversing its dimensions:

def kernel(A: float32[3, 4]) -> float32[4, 3]:
    return A.T  # Transpose: shape becomes [4, 3]

Copy (.copy)

Create a copy of a tensor:

B = A.copy()

Bit Reverse (.reverse)

Reverse the bits of an integer value (useful for FFT algorithms):

reversed_bits = x.reverse

Type Conversion Functions

Explicit Type Casting

Use Python built-in functions for explicit type casting:

# Cast to float32
b: float32 = float(a)

# Cast to int32
c: int32 = int(b)

Fixed-Point Type Attributes

Access type metadata for fixed-point types:

from allo.ir.types import Fixed

def kernel(A: Fixed[16, 8]) -> int32:
    return A.bits   # Returns 16 (total bitwidth)
    # A.fracs would return 8 (fractional bits)

Bitcast

Reinterpret the bit pattern of a value as a different type (preserves bits, changes interpretation):

# Reinterpret float32 bits as int32
int_bits = float_val.bitcast()

Note

bitcast preserves the bit pattern but changes the type interpretation. This is different from type casting which preserves the value but may change the bits.

Library Operations

Allo provides high-level library operations that map to optimized implementations.

Matrix Operations

# Matrix multiplication
C = allo.matmul(A, B)

# Batch matrix multiplication
C = allo.bmm(A, B)

# Linear layer: X @ A.T + B
Y = allo.linear(X, A, B)

Tensor Manipulation

# Transpose with custom permutation
B = allo.transpose(A, permutation=(1, 0, 2))

# Reshape/view tensor
B = allo.view(A, new_shape)

# Concatenate tensors along an axis
C = allo.concat(A, B, axis=0)

Element-wise Operations

# Exponential
B = allo.exp(A)

# Logarithm
B = allo.log(A)

# Absolute value
B = allo.abs(A)

Neural Network Operations

# 2D Convolution (NCHW format)
output = allo.conv2d(input, kernel)

# Max pooling
output = allo.maxpool(input, kernel)

# Sum pooling
output = allo.sumpool(input, kernel)

# ReLU activation
output = allo.relu(input)

# Softmax
output = allo.softmax(input)

Templates and Type Parameters

Allo supports parameterized kernels using type parameters:

def kernel[Ty](flag: bool) -> "Ty":
    X: Ty
    if not flag:
        X = 1
    else:
        X = 0
    return X

s = allo.customize(kernel, instantiate=[int8])

For more details on templates, see the Template Kernels documentation.

Building and Execution

After defining a kernel, create a schedule and build the executable:

import allo
import numpy as np

s = allo.customize(gemm)
mod = s.build()  # Default: LLVM backend

# Prepare inputs
np_A = np.random.randint(0, 10, size=(32, 32)).astype(np.int32)
np_B = np.random.randint(0, 10, size=(32, 32)).astype(np.int32)

# Execute
np_C = mod(np_A, np_B)

For HLS code generation:

mod = s.build(target="vhls")
print(mod.hls_code)