AutomataNexus LLCWhitepaper · May 2026

HVAC Prediction

Enki —
Unrolled 8-Step GRU Controller

Enki is the FCOG facility mechroom GRU controller. Named for the Sumerian god of water, knowledge, and crafts. A lighter-weight alternative to Enlil using GRU gates (reset + update) instead of LSTM gates (input + forget + cell + output), achieving 50,583 FPS — over 120x faster than the LSTM variant — at the cost of slightly reduced temporal memory depth.

Author

Andrew Jewell Sr.

Organization

AutomataNexus LLC

Framework

AxonML (Rust)

Silicon

Hailo-8

Front matterAbstract

Abstract

Background

Enki is the FCOG facility mechroom GRU controller. Named for the Sumerian god of water, knowledge, and crafts. A lighter-weight alternative to Enlil using GRU gates (reset + update) instead of LSTM gates (input + forget + cell + output), achieving 50,583 FPS — over 120x faster than the LSTM variant — at the cost of slightly reduced temporal memory depth.

Approach

The model was trained in AxonML (a pure-Rust deep learning framework) and compiled through the Hailo Dataflow Compiler (DFC 3.33.1) targeting Hailo-8 silicon. Post-training INT8 quantization was applied during the DFC compilation pass with production telemetry calibration data. The resulting Hailo Executable Format (HEF) binary executes on Hailo’s fixed-function dataflow architecture with deterministic latency and zero framework overhead at the edge.

Results

On production hardware (Hailo-8 M.2 (P/N: HM218B1C2FAE, S/N: HLDDM2A234600289)), Enki achieves 50,582 FPS (hw_only) with 0.057 ms hardware latency at 0.96 W average power draw.

Conclusion

Enki is production-ready as a single HEF binary deployed to edge devices with no external dependencies beyond the HailoRT vendor runtime. The model meets real-time latency requirements for its target hvac prediction application.

Model	Enki
Domain	HVAC Prediction
Architecture	Unrolled 8-Step GRU Controller
Target silicon	Hailo-8
Measured on	Hailo-8 M.2 (P/N: HM218B1C2FAE, S/N: HLDDM2A234600289)
DFC compiler	3.33.1
Framework	AxonML v0.6 (pure-Rust, CUDA + CPU backends)
Author	Andrew Jewell Sr. · ORCID 0009-0005-2158-7060
Organization	AutomataNexus LLC · Fort Wayne, Indiana

← PreviousCover Next →Executive overview

Part I · Model01

Executive overview

Enki is the FCOG facility mechroom GRU controller. Named for the Sumerian god of water, knowledge, and crafts. A lighter-weight alternative to Enlil using GRU gates (reset + update) instead of LSTM gates (input + forget + cell + output), achieving 50,583 FPS — over 120x faster than the LSTM variant — at the cost of slightly reduced temporal memory depth.

50,582_FPS

Throughput

0.057_ms

HW Latency

0.96_W

Power (avg)

8

Target

Network I/O

Input: 8-step sensor sequence [1, 1, 8, 57]. Output: next-state prediction scalar.

← PreviousAbstract Next →Architecture

Part I · Model02

Architecture

Unrolled 8-Step GRU Controller

8-timestep unrolled GRU with input_size=57, hidden_size=57. Each timestep: reset gate (sigmoid), update gate (sigmoid), candidate hidden state (tanh), and gated update. 6 weight matrices per timestep (3 input projections + 3 recurrent projections). The GRU recurrence pattern `h = (1-z)*n + z*h_prev` is fully unrolled into element-wise multiply and add operations that map directly to NPU element-wise units.

Compilation constraints

All AxonML models targeting Hailo silicon are compiled under the fixed-function dataflow constraints: no dynamic control flow, no variable-length dimensions, all activations representable in INT8 after calibration, and no operations requiring dedicated softmax hardware (replaced with ReLU gating or depthwise convolution equivalents where necessary).

← PreviousExecutive overview Next →Silicon performance

Part II · Silicon03

Silicon performance

Measured on production hardware via hailortcli benchmark with 5-second sustained inference. Device: Hailo-8 M.2 (P/N: HM218B1C2FAE, S/N: HLDDM2A234600289).

Table 03-1 — Production silicon measurements, 5s sustained inference.
Metric	Measured Value
FPS (hw_only)	50,582.50
FPS (streaming)	35,657.40
HW Latency	0.057000 ms
Power (streaming avg)	0.96500 W
Power (streaming max)	0.98300 W
Power (idle)	0.74979 W
Quantization	INT8 (post-training, DFC calibration)
DFC Compiler	3.33.1
HailoRT	4.20.0
Measured On	Hailo-8 M.2 (P/N: HM218B1C2FAE, S/N: HLDDM2A234600289)

← PreviousArchitecture Next →Deployment

Part II · Silicon04

Deployment

Deployed as a single HEF binary. No ONNX runtime, TensorFlow Lite, or Python inference stack required at the edge.

Target silicon	Hailo-8
Measured on	Hailo-8 M.2 (P/N: HM218B1C2FAE, S/N: HLDDM2A234600289)
DFC compiler	3.33.1
Quantization	INT8 (post-training, production telemetry calibration)
Runtime	HailoRT (vendor runtime)
Edge platform	Raspberry Pi 5 + Hailo AI HAT+ (M.2 Key M)

Deployment procedure

Copy the .hef binary to the target device. hailortcli run loads the HEF directly into the Hailo-8 dataflow engine over PCIe. Inference begins immediately with deterministic per-frame latency. No model conversion, graph optimization, or warmup phase required.

← PreviousSilicon performance Next →References

Back matter05

References

Jewell, A. (2026). AxonML: A Pure-Rust Deep Learning Framework for Edge Inference. AutomataNexus LLC. Technical whitepaper.
Hailo Technologies Ltd. (2024). Hailo Dataflow Compiler User Guide. DFC v3.33.1.
Hailo Technologies Ltd. (2024). Hailo-8 Product Datasheet.

← PreviousDeployment Back to start →Cover

Enki —Unrolled 8-Step GRU Controller

Abstract

Background

Approach

Results

Conclusion

Executive overview

Network I/O

Architecture

Compilation constraints

Silicon performance

Deployment

Deployment procedure

References

Enki —
Unrolled 8-Step GRU Controller