Hydra —
SSM + Depthwise Attention Hybrid
Hydra is a hybrid architecture combining selective state space modeling with depthwise attention. It demonstrates that SSM and attention mechanisms can be unified on fixed-function accelerators when attention is reformulated as depthwise convolution over query-key products.
Abstract
Background
Hydra is a ssm + depthwise attention hybrid designed for the hybrid architecture domain. Hydra is a hybrid architecture combining selective state space modeling with depthwise attention. It demonstrates that SSM and attention mechanisms can be unified on fixed-function accelerators when attention is reformulated as depthwise convolution over query-key products.
Approach
The model was compiled from a trained AxonML checkpoint through the Hailo Dataflow Compiler (DFC) targeting Hailo-10H silicon. Post-training INT8 quantization was applied during the DFC compilation pass. The resulting Hailo Executable Format (HEF) binary executes on Hailo's fixed-function dataflow architecture with deterministic latency and zero framework overhead. No runtime model conversion, graph optimization, or JIT compilation is required at the edge.
Results
On production Hailo-10H hardware, Hydra achieves 856 FPS throughput with 0.906 ms hardware latency and 1.615 ms end-to-end latency. Average die temperature during sustained inference was 54.2 °C.
Conclusion
These measurements confirm that Hydra meets the latency and throughput requirements for real-time edge deployment in hybrid architecture applications. The model is production-ready as a single HEF binary with no external dependencies beyond the HailoRT runtime.
| Model | Hydra |
| Domain | Hybrid Architecture |
| Architecture | SSM + Depthwise Attention Hybrid |
| Target silicon | Hailo-10H |
| DFC compiler | 5.3.0 |
| Framework | AxonML (pure-Rust deep-learning framework) |
| Author | Andrew Jewell Sr. · ORCID 0009-0005-2158-7060 |
| Organization | AutomataNexus LLC · Fort Wayne, Indiana |
| Date | April 29, 2026 |
Executive overview
Hydra is a hybrid architecture model compiled to Hailo-10H silicon via the AxonML framework and the Hailo Dataflow Compiler. The model runs as a single Hailo Executable Format (HEF) binary with deterministic latency on Hailo's fixed-function dataflow architecture.
Hydra is a hybrid architecture combining selective state space modeling with depthwise attention. It demonstrates that SSM and attention mechanisms can be unified on fixed-function accelerators when attention is reformulated as depthwise convolution over query-key products.
Architecture
Architecture: Alternating SSM blocks (gated depthwise Conv1D) and depthwise attention blocks (Conv1D Q/K/V projection with depthwise mixing), 4 layers total, residual throughout.
Architecture note
All AxonML models targeting Hailo silicon are subject to the constraints of the fixed-function dataflow compiler: no dynamic control flow, no variable-length dimensions, and all activations must be representable in INT8 after calibration. Operations that require softmax hardware (e.g., standard transformer attention) are replaced with hardware-friendly equivalents (depthwise convolution, ReLU gating) where necessary.
Network I/O specification
| Direction | Stream name | Data type | Shape |
|---|---|---|---|
INPUT | input_layer1 | UINT8 | NHWC(128x1x128) |
OUTPUT | dense_conv29 | UINT8 | NC(101) |
Silicon performance
All measurements were taken on production Hailo-10H hardware using hailortcli run with real silicon profiling enabled. Throughput, latency, and thermal figures represent sustained inference under controlled conditions.
| Metric | Measured value |
|---|---|
| Throughput | 855.62 FPS |
| HW Latency (on-die) | 0.906 ms |
| Overall Latency (end-to-end) | 1.615 ms |
| Chip Temperature (min) | 53.5 °C |
| Chip Temperature (avg) | 54.2 °C |
| Chip Temperature (max) | 54.6 °C |
| Target Silicon | Hailo-10H |
| DFC Compiler Version | 5.3.0 |
| HEF Compatibility | HAILO15H, HAILO10H |
Deployment
The compiled HEF binary is deployed to edge devices equipped with Hailo-10H accelerator modules. No runtime model conversion, graph optimization, or JIT compilation is required. The HEF executes directly on the dataflow architecture with deterministic latency guarantees.
| Target silicon | Hailo-10H |
| HEF compatibility | HAILO15H, HAILO10H |
| DFC compiler version | 5.3.0 |
| Quantization | INT8 (post-training, DFC-optimized calibration) |
| Runtime dependency | HailoRT (hailort) — vendor runtime for PCIe / M.2 dispatch |
| Edge platforms | Raspberry Pi 5 + Hailo AI HAT+ · any M.2 key M host with Hailo-10H |
| Interactive dashboard | Open Plotly dashboard |
Deployment procedure
Copy the .hef binary to the target device. The AxonML inference runtime (or the standalone hailortcli run harness) loads the HEF directly into the Hailo-10H dataflow engine over PCIe. No ONNX runtime, TensorFlow Lite interpreter, or Python inference stack is required. Inference begins immediately after HEF load with deterministic per-frame latency.
References
- Jewell, A. (2026). AxonML — Pure-Rust Deep Learning Framework. AutomataNexus LLC. Technical whitepaper.
- Jewell, A. (2026). The Nexus Stack — A Pure-Rust Industrial Technology Stack. AutomataNexus LLC. Technical whitepaper.
- Hailo Technologies Ltd. (2024). Hailo Dataflow Compiler User Guide. DFC v3.x / v5.x documentation.
- Hailo Technologies Ltd. (2024). HailoRT — Hailo Runtime API Reference. HailoRT v4.x documentation.
- Hailo Technologies Ltd. (2024). Hailo-10H Datasheet. Product specification.
AutomataNexus LLC · Fort Wayne, Indiana
Andrew Jewell Sr. · ORCID 0009-0005-2158-7060
Generated from real Hailo-10H silicon measurements · AxonML Framework · April 29, 2026
DISCLAIMER: This document and its contents are proprietary to AutomataNexus LLC. Performance figures are measured on production silicon under controlled conditions. Reproduction or redistribution without written permission is prohibited.