AutomataNexus LLC Whitepaper · April 29, 2026
Sequence Modeling

Mamba SSM
Selective State Space Model

Mamba SSM is a novel selective state space model compiled for Hailo silicon. It implements gated depthwise convolutions with parallel selective scan — a hardware-friendly alternative to transformer attention that maintains linear-time sequence processing.

Author
Andrew Jewell Sr.
Organization
AutomataNexus LLC
Framework
AxonML
Silicon
Hailo-10H

Abstract

Background

Mamba SSM is a selective state space model designed for the sequence modeling domain. Mamba SSM is a novel selective state space model compiled for Hailo silicon. It implements gated depthwise convolutions with parallel selective scan — a hardware-friendly alternative to transformer attention that maintains linear-time sequence processing.

Approach

The model was compiled from a trained AxonML checkpoint through the Hailo Dataflow Compiler (DFC) targeting Hailo-10H silicon. Post-training INT8 quantization was applied during the DFC compilation pass. The resulting Hailo Executable Format (HEF) binary executes on Hailo's fixed-function dataflow architecture with deterministic latency and zero framework overhead. No runtime model conversion, graph optimization, or JIT compilation is required at the edge.

Results

On production Hailo-10H hardware, Mamba SSM achieves 2,614 FPS throughput with 0.627 ms hardware latency and 1.342 ms end-to-end latency. Average die temperature during sustained inference was 58.6 °C.

Conclusion

These measurements confirm that Mamba SSM meets the latency and throughput requirements for real-time edge deployment in sequence modeling applications. The model is production-ready as a single HEF binary with no external dependencies beyond the HailoRT runtime.

ModelMamba SSM
DomainSequence Modeling
ArchitectureSelective State Space Model
Target siliconHailo-10H
DFC compiler5.3.0
FrameworkAxonML (pure-Rust deep-learning framework)
AuthorAndrew Jewell Sr. · ORCID 0009-0005-2158-7060
OrganizationAutomataNexus LLC · Fort Wayne, Indiana
DateApril 29, 2026

Executive overview

Mamba SSM is a sequence modeling model compiled to Hailo-10H silicon via the AxonML framework and the Hailo Dataflow Compiler. The model runs as a single Hailo Executable Format (HEF) binary with deterministic latency on Hailo's fixed-function dataflow architecture.

2,614 FPS
Throughput
0.627 ms
HW latency
1.342 ms
E2E latency
58.6 °C
Chip temp (avg)

Mamba SSM is a novel selective state space model compiled for Hailo silicon. It implements gated depthwise convolutions with parallel selective scan — a hardware-friendly alternative to transformer attention that maintains linear-time sequence processing.

Architecture

Architecture: 4-layer gated depthwise Conv1D with expand-project pattern (expand ratio 2×), ReLU gating (replacing SiLU to avoid softmax unit contention), and residual connections.

Architecture note

All AxonML models targeting Hailo silicon are subject to the constraints of the fixed-function dataflow compiler: no dynamic control flow, no variable-length dimensions, and all activations must be representable in INT8 after calibration. Operations that require softmax hardware (e.g., standard transformer attention) are replaced with hardware-friendly equivalents (depthwise convolution, ReLU gating) where necessary.

Network I/O specification

DirectionStream nameData typeShape
INPUTinput_layer1UINT8NHWC(128x1x128)
OUTPUTdense_conv17UINT8NC(101)
Table 02-1 · HEF network input and output tensors as reported by hailortcli parse-hef.

Silicon performance

All measurements were taken on production Hailo-10H hardware using hailortcli run with real silicon profiling enabled. Throughput, latency, and thermal figures represent sustained inference under controlled conditions.

MetricMeasured value
Throughput2,614.06 FPS
HW Latency (on-die)0.627 ms
Overall Latency (end-to-end)1.342 ms
Chip Temperature (min)53.8 °C
Chip Temperature (avg)58.6 °C
Chip Temperature (max)60.0 °C
Target SiliconHailo-10H
DFC Compiler Version5.3.0
HEF CompatibilityHAILO15H, HAILO10H
Table 03-1 · Silicon benchmark results for Mamba SSM on Hailo-10H.

Deployment

The compiled HEF binary is deployed to edge devices equipped with Hailo-10H accelerator modules. No runtime model conversion, graph optimization, or JIT compilation is required. The HEF executes directly on the dataflow architecture with deterministic latency guarantees.

Target siliconHailo-10H
HEF compatibilityHAILO15H, HAILO10H
DFC compiler version5.3.0
QuantizationINT8 (post-training, DFC-optimized calibration)
Runtime dependencyHailoRT (hailort) — vendor runtime for PCIe / M.2 dispatch
Edge platformsRaspberry Pi 5 + Hailo AI HAT+ · any M.2 key M host with Hailo-10H
Interactive dashboardOpen Plotly dashboard
Deployment procedure

Copy the .hef binary to the target device. The AxonML inference runtime (or the standalone hailortcli run harness) loads the HEF directly into the Hailo-10H dataflow engine over PCIe. No ONNX runtime, TensorFlow Lite interpreter, or Python inference stack is required. Inference begins immediately after HEF load with deterministic per-frame latency.

References

  1. Jewell, A. (2026). AxonML — Pure-Rust Deep Learning Framework. AutomataNexus LLC. Technical whitepaper.
  2. Jewell, A. (2026). The Nexus Stack — A Pure-Rust Industrial Technology Stack. AutomataNexus LLC. Technical whitepaper.
  3. Hailo Technologies Ltd. (2024). Hailo Dataflow Compiler User Guide. DFC v3.x / v5.x documentation.
  4. Hailo Technologies Ltd. (2024). HailoRT — Hailo Runtime API Reference. HailoRT v4.x documentation.
  5. Hailo Technologies Ltd. (2024). Hailo-10H Datasheet. Product specification.

AutomataNexus LLC · Fort Wayne, Indiana
Andrew Jewell Sr. · ORCID 0009-0005-2158-7060
Generated from real Hailo-10H silicon measurements · AxonML Framework · April 29, 2026

DISCLAIMER: This document and its contents are proprietary to AutomataNexus LLC. Performance figures are measured on production silicon under controlled conditions. Reproduction or redistribution without written permission is prohibited.