BirdCLEF SedNet —
Sound Event Detection Convolutional Network
BirdCLEF SedNet is a 234-species avian sound event detection model trained on the BirdCLEF+ 2026 competition dataset. It classifies mel-spectrogram audio frames into bird species probability distributions, enabling automated biodiversity monitoring at remote edge sensor stations with no internet connectivity. Running at 3,507 FPS on Hailo-10H, it processes audio faster than real-time by a factor of >100x.
Abstract
Background
BirdCLEF SedNet is a 234-species avian sound event detection model trained on the BirdCLEF+ 2026 competition dataset. It classifies mel-spectrogram audio frames into bird species probability distributions, enabling automated biodiversity monitoring at remote edge sensor stations with no internet connectivity. Running at 3,507 FPS on Hailo-10H, it processes audio faster than real-time by a factor of >100x.
Approach
The model was trained in AxonML (a pure-Rust deep learning framework) and compiled through the Hailo Dataflow Compiler (DFC 5.3.0) targeting Hailo-10H silicon. Post-training INT8 quantization was applied during the DFC compilation pass with production telemetry calibration data. The resulting Hailo Executable Format (HEF) binary executes on Hailo’s fixed-function dataflow architecture with deterministic latency and zero framework overhead at the edge.
Results
On production hardware (Hailo-10H (NexusWatch, FW 5.3.0)), BirdCLEF SedNet achieves 3,507 FPS with 54.8 °C average die temperature (max 55.7 °C).
Conclusion
BirdCLEF SedNet is production-ready as a single HEF binary deployed to edge devices with no external dependencies beyond the HailoRT vendor runtime. The model meets real-time latency requirements for its target environmental audio application.
| Model | BirdCLEF SedNet |
| Domain | Environmental Audio |
| Architecture | Sound Event Detection Convolutional Network |
| Target silicon | Hailo-10H |
| Measured on | Hailo-10H (NexusWatch, FW 5.3.0) |
| DFC compiler | 5.3.0 |
| Framework | AxonML v0.6 (pure-Rust, CUDA + CPU backends) |
| Author | Andrew Jewell Sr. · ORCID 0009-0005-2158-7060 |
| Organization | AutomataNexus LLC · Fort Wayne, Indiana |
Executive overview
BirdCLEF SedNet is a 234-species avian sound event detection model trained on the BirdCLEF+ 2026 competition dataset. It classifies mel-spectrogram audio frames into bird species probability distributions, enabling automated biodiversity monitoring at remote edge sensor stations with no internet connectivity. Running at 3,507 FPS on Hailo-10H, it processes audio faster than real-time by a factor of >100x.
Network I/O
Input: mel-spectrogram [1, 1, 128, time_frames]. Output: 234-species probability vector.
Architecture
Sound Event Detection Convolutional Network
Frequency-domain Conv2D encoder (7 layers) processing 128-bin mel-spectrograms. Architecture: input Conv2D(1→32, 3x3) → 6 residual blocks with progressive channel expansion (32→64→128→256) and frequency-axis downsampling via stride-2 convolutions. Temporal average pooling over the time axis → 234-class sigmoid output (multi-label, as multiple species may vocalize simultaneously). Trained on 150+ hours of annotated field recordings.
Compilation constraints
All AxonML models targeting Hailo silicon are compiled under the fixed-function dataflow constraints: no dynamic control flow, no variable-length dimensions, all activations representable in INT8 after calibration, and no operations requiring dedicated softmax hardware (replaced with ReLU gating or depthwise convolution equivalents where necessary).
Silicon performance
Measured on production hardware via hailortcli benchmark with 5-second sustained inference. Device: Hailo-10H (NexusWatch, FW 5.3.0).
| Metric | Measured Value |
|---|---|
| FPS (streaming) | 3,506.85 |
| Die Temperature (mean) | 54.75 °C |
| Die Temperature (min) | 51.33 °C |
| Die Temperature (max) | 55.73 °C |
| Quantization | INT8 (post-training, DFC calibration) |
| DFC Compiler | 5.3.0 |
| HailoRT | 5.3.0 |
| Measured On | Hailo-10H (NexusWatch, FW 5.3.0) |
Deployment
Deployed as a single HEF binary. No ONNX runtime, TensorFlow Lite, or Python inference stack required at the edge.
| Target silicon | Hailo-10H |
| Measured on | Hailo-10H (NexusWatch, FW 5.3.0) |
| DFC compiler | 5.3.0 |
| Quantization | INT8 (post-training, production telemetry calibration) |
| Runtime | HailoRT (vendor runtime) |
| Edge platform | Raspberry Pi 5 + Hailo AI HAT+ (M.2 Key M) |
Deployment procedure
Copy the .hef binary to the target device. hailortcli run loads the HEF directly into the Hailo-10H dataflow engine over PCIe. Inference begins immediately with deterministic per-frame latency. No model conversion, graph optimization, or warmup phase required.
References
- Jewell, A. (2026). AxonML: A Pure-Rust Deep Learning Framework for Edge Inference. AutomataNexus LLC. Technical whitepaper.
- Hailo Technologies Ltd. (2024). Hailo Dataflow Compiler User Guide. DFC v5.3.0.
- Hailo Technologies Ltd. (2024). Hailo-10H Product Datasheet.
AutomataNexus LLC · Fort Wayne, Indiana · andrew.jewellsr@automatanexus.com
Andrew Jewell Sr. · ORCID 0009-0005-2158-7060
May 2026 · All rights reserved.