AutomataNexus LLC Whitepaper · April 29, 2026

Hybrid Architecture

Hydra —
SSM + Depthwise Attention Hybrid

Hydra is a hybrid architecture combining selective state space modeling with depthwise attention. It demonstrates that SSM and attention mechanisms can be unified on fixed-function accelerators when attention is reformulated as depthwise convolution over query-key products.

Author

Andrew Jewell Sr.

Organization

AutomataNexus LLC

Framework

AxonML

Silicon

Hailo-10H

Front matterAbstract

Abstract

Background

Hydra is a ssm + depthwise attention hybrid designed for the hybrid architecture domain. Hydra is a hybrid architecture combining selective state space modeling with depthwise attention. It demonstrates that SSM and attention mechanisms can be unified on fixed-function accelerators when attention is reformulated as depthwise convolution over query-key products.

Approach

The model was compiled from a trained AxonML checkpoint through the Hailo Dataflow Compiler (DFC) targeting Hailo-10H silicon. Post-training INT8 quantization was applied during the DFC compilation pass. The resulting Hailo Executable Format (HEF) binary executes on Hailo's fixed-function dataflow architecture with deterministic latency and zero framework overhead. No runtime model conversion, graph optimization, or JIT compilation is required at the edge.

Results

On production Hailo-10H hardware, Hydra achieves 856 FPS throughput with 0.906 ms hardware latency and 1.615 ms end-to-end latency. Average die temperature during sustained inference was 54.2 °C.

Conclusion

These measurements confirm that Hydra meets the latency and throughput requirements for real-time edge deployment in hybrid architecture applications. The model is production-ready as a single HEF binary with no external dependencies beyond the HailoRT runtime.

Model	Hydra
Domain	Hybrid Architecture
Architecture	SSM + Depthwise Attention Hybrid
Target silicon	Hailo-10H
DFC compiler	5.3.0
Framework	AxonML (pure-Rust deep-learning framework)
Author	Andrew Jewell Sr. · ORCID 0009-0005-2158-7060
Organization	AutomataNexus LLC · Fort Wayne, Indiana
Date	April 29, 2026

← PreviousCover Next →01 · Executive overview

Part I · Model01

Executive overview

Hydra is a hybrid architecture model compiled to Hailo-10H silicon via the AxonML framework and the Hailo Dataflow Compiler. The model runs as a single Hailo Executable Format (HEF) binary with deterministic latency on Hailo's fixed-function dataflow architecture.

856_FPS

Throughput

0.906_ms

HW latency

1.615_ms

E2E latency

54.2_°C

Chip temp (avg)

Hydra is a hybrid architecture combining selective state space modeling with depthwise attention. It demonstrates that SSM and attention mechanisms can be unified on fixed-function accelerators when attention is reformulated as depthwise convolution over query-key products.

← PreviousAbstract Next →02 · Architecture

Part I · Model02

Architecture

Architecture: Alternating SSM blocks (gated depthwise Conv1D) and depthwise attention blocks (Conv1D Q/K/V projection with depthwise mixing), 4 layers total, residual throughout.

Architecture note

All AxonML models targeting Hailo silicon are subject to the constraints of the fixed-function dataflow compiler: no dynamic control flow, no variable-length dimensions, and all activations must be representable in INT8 after calibration. Operations that require softmax hardware (e.g., standard transformer attention) are replaced with hardware-friendly equivalents (depthwise convolution, ReLU gating) where necessary.

Network I/O specification

Table 02-1 · HEF network input and output tensors as reported by `hailortcli parse-hef`.
Direction	Stream name	Data type	Shape
`INPUT`	`input_layer1`	`UINT8`	`NHWC(128x1x128)`
`OUTPUT`	`dense_conv29`	`UINT8`	`NC(101)`

← Previous01 · Executive overview Next →03 · Silicon performance

Part II · Silicon03

Silicon performance

All measurements were taken on production Hailo-10H hardware using hailortcli run with real silicon profiling enabled. Throughput, latency, and thermal figures represent sustained inference under controlled conditions.

Table 03-1 · Silicon benchmark results for Hydra on Hailo-10H.
Metric	Measured value
Throughput	855.62 FPS
HW Latency (on-die)	0.906 ms
Overall Latency (end-to-end)	1.615 ms
Chip Temperature (min)	53.5 °C
Chip Temperature (avg)	54.2 °C
Chip Temperature (max)	54.6 °C
Target Silicon	Hailo-10H
DFC Compiler Version	5.3.0
HEF Compatibility	HAILO15H, HAILO10H

← Previous02 · Architecture Next →04 · Deployment

Part II · Silicon04

Deployment

The compiled HEF binary is deployed to edge devices equipped with Hailo-10H accelerator modules. No runtime model conversion, graph optimization, or JIT compilation is required. The HEF executes directly on the dataflow architecture with deterministic latency guarantees.

Target silicon	Hailo-10H
HEF compatibility	HAILO15H, HAILO10H
DFC compiler version	5.3.0
Quantization	INT8 (post-training, DFC-optimized calibration)
Runtime dependency	HailoRT (hailort) — vendor runtime for PCIe / M.2 dispatch
Edge platforms	Raspberry Pi 5 + Hailo AI HAT+ · any M.2 key M host with Hailo-10H
Interactive dashboard	Open Plotly dashboard

Deployment procedure

Copy the .hef binary to the target device. The AxonML inference runtime (or the standalone hailortcli run harness) loads the HEF directly into the Hailo-10H dataflow engine over PCIe. No ONNX runtime, TensorFlow Lite interpreter, or Python inference stack is required. Inference begins immediately after HEF load with deterministic per-frame latency.

← Previous03 · Silicon performance Next →05 · References

Back matter05

References

Jewell, A. (2026). AxonML — Pure-Rust Deep Learning Framework. AutomataNexus LLC. Technical whitepaper.
Jewell, A. (2026). The Nexus Stack — A Pure-Rust Industrial Technology Stack. AutomataNexus LLC. Technical whitepaper.
Hailo Technologies Ltd. (2024). Hailo Dataflow Compiler User Guide. DFC v3.x / v5.x documentation.
Hailo Technologies Ltd. (2024). HailoRT — Hailo Runtime API Reference. HailoRT v4.x documentation.
Hailo Technologies Ltd. (2024). Hailo-10H Datasheet. Product specification.

AutomataNexus LLC · Fort Wayne, Indiana
Andrew Jewell Sr. · ORCID 0009-0005-2158-7060
Generated from real Hailo-10H silicon measurements · AxonML Framework · April 29, 2026

DISCLAIMER: This document and its contents are proprietary to AutomataNexus LLC. Performance figures are measured on production silicon under controlled conditions. Reproduction or redistribution without written permission is prohibited.

← Previous04 · Deployment Back to start →Cover