Mamba SSM —
Selective State Space Model
Mamba SSM is a novel selective state space model compiled for Hailo silicon. It implements gated depthwise convolutions with parallel selective scan — a hardware-friendly alternative to transformer attention that maintains linear-time sequence processing.
Abstract
Background
Mamba SSM is a selective state space model designed for the sequence modeling domain. Mamba SSM is a novel selective state space model compiled for Hailo silicon. It implements gated depthwise convolutions with parallel selective scan — a hardware-friendly alternative to transformer attention that maintains linear-time sequence processing.
Approach
The model was compiled from a trained AxonML checkpoint through the Hailo Dataflow Compiler (DFC) targeting Hailo-10H silicon. Post-training INT8 quantization was applied during the DFC compilation pass. The resulting Hailo Executable Format (HEF) binary executes on Hailo's fixed-function dataflow architecture with deterministic latency and zero framework overhead. No runtime model conversion, graph optimization, or JIT compilation is required at the edge.
Results
On production Hailo-10H hardware, Mamba SSM achieves 2,614 FPS throughput with 0.627 ms hardware latency and 1.342 ms end-to-end latency. Average die temperature during sustained inference was 58.6 °C.
Conclusion
These measurements confirm that Mamba SSM meets the latency and throughput requirements for real-time edge deployment in sequence modeling applications. The model is production-ready as a single HEF binary with no external dependencies beyond the HailoRT runtime.
| Model | Mamba SSM |
| Domain | Sequence Modeling |
| Architecture | Selective State Space Model |
| Target silicon | Hailo-10H |
| DFC compiler | 5.3.0 |
| Framework | AxonML (pure-Rust deep-learning framework) |
| Author | Andrew Jewell Sr. · ORCID 0009-0005-2158-7060 |
| Organization | AutomataNexus LLC · Fort Wayne, Indiana |
| Date | April 29, 2026 |
Executive overview
Mamba SSM is a sequence modeling model compiled to Hailo-10H silicon via the AxonML framework and the Hailo Dataflow Compiler. The model runs as a single Hailo Executable Format (HEF) binary with deterministic latency on Hailo's fixed-function dataflow architecture.
Mamba SSM is a novel selective state space model compiled for Hailo silicon. It implements gated depthwise convolutions with parallel selective scan — a hardware-friendly alternative to transformer attention that maintains linear-time sequence processing.
Architecture
Architecture: 4-layer gated depthwise Conv1D with expand-project pattern (expand ratio 2×), ReLU gating (replacing SiLU to avoid softmax unit contention), and residual connections.
Architecture note
All AxonML models targeting Hailo silicon are subject to the constraints of the fixed-function dataflow compiler: no dynamic control flow, no variable-length dimensions, and all activations must be representable in INT8 after calibration. Operations that require softmax hardware (e.g., standard transformer attention) are replaced with hardware-friendly equivalents (depthwise convolution, ReLU gating) where necessary.
Network I/O specification
| Direction | Stream name | Data type | Shape |
|---|---|---|---|
INPUT | input_layer1 | UINT8 | NHWC(128x1x128) |
OUTPUT | dense_conv17 | UINT8 | NC(101) |
Silicon performance
All measurements were taken on production Hailo-10H hardware using hailortcli run with real silicon profiling enabled. Throughput, latency, and thermal figures represent sustained inference under controlled conditions.
| Metric | Measured value |
|---|---|
| Throughput | 2,614.06 FPS |
| HW Latency (on-die) | 0.627 ms |
| Overall Latency (end-to-end) | 1.342 ms |
| Chip Temperature (min) | 53.8 °C |
| Chip Temperature (avg) | 58.6 °C |
| Chip Temperature (max) | 60.0 °C |
| Target Silicon | Hailo-10H |
| DFC Compiler Version | 5.3.0 |
| HEF Compatibility | HAILO15H, HAILO10H |
Deployment
The compiled HEF binary is deployed to edge devices equipped with Hailo-10H accelerator modules. No runtime model conversion, graph optimization, or JIT compilation is required. The HEF executes directly on the dataflow architecture with deterministic latency guarantees.
| Target silicon | Hailo-10H |
| HEF compatibility | HAILO15H, HAILO10H |
| DFC compiler version | 5.3.0 |
| Quantization | INT8 (post-training, DFC-optimized calibration) |
| Runtime dependency | HailoRT (hailort) — vendor runtime for PCIe / M.2 dispatch |
| Edge platforms | Raspberry Pi 5 + Hailo AI HAT+ · any M.2 key M host with Hailo-10H |
| Interactive dashboard | Open Plotly dashboard |
Deployment procedure
Copy the .hef binary to the target device. The AxonML inference runtime (or the standalone hailortcli run harness) loads the HEF directly into the Hailo-10H dataflow engine over PCIe. No ONNX runtime, TensorFlow Lite interpreter, or Python inference stack is required. Inference begins immediately after HEF load with deterministic per-frame latency.
References
- Jewell, A. (2026). AxonML — Pure-Rust Deep Learning Framework. AutomataNexus LLC. Technical whitepaper.
- Jewell, A. (2026). The Nexus Stack — A Pure-Rust Industrial Technology Stack. AutomataNexus LLC. Technical whitepaper.
- Hailo Technologies Ltd. (2024). Hailo Dataflow Compiler User Guide. DFC v3.x / v5.x documentation.
- Hailo Technologies Ltd. (2024). HailoRT — Hailo Runtime API Reference. HailoRT v4.x documentation.
- Hailo Technologies Ltd. (2024). Hailo-10H Datasheet. Product specification.
AutomataNexus LLC · Fort Wayne, Indiana
Andrew Jewell Sr. · ORCID 0009-0005-2158-7060
Generated from real Hailo-10H silicon measurements · AxonML Framework · April 29, 2026
DISCLAIMER: This document and its contents are proprietary to AutomataNexus LLC. Performance figures are measured on production silicon under controlled conditions. Reproduction or redistribution without written permission is prohibited.