We validate a new smartphone-based breath-phase detector against clinical ground truth — and use the same study to establish it as a qualified label source for a follow-up ML project.
Why this is scientifically new
Every microphone-based breath-detection system so far runs into two hard limits:
- Only three classes (inhale / exhale / silence). Hold and pause are acoustically identical and are not told apart.
- Either wearable-dependent (in-ear microphones, F1 > 95%) or weak (smartphone-only, 69% balanced accuracy in the real world).
Our algorithm combines five cooperating pillars — two of which, to our knowledge, are new in the literature.
- Protocol-Synchronized Prep Anchor
- A 3-second scripted exhale prompt before free breathing measures the individual spectral signature of the user. This per-session calibration replaces population training and makes label polarity deterministic. (New, to our knowledge.)
- Technique-Aware Ratio Matching
- The known target ratio of the technique (e.g. 1:1.75:2 for 4-7-8) resolves the hold/pause ambiguity deterministically. (New, to our knowledge.)
- Adaptive amplitude-based state machine
- Tracks breath state from amplitude dynamics rather than fixed thresholds.
- Multi-signal spectral classification (Peak + Centroid)
- Combines spectral peak and centroid features for phase classification.
- Valley Gate (3-AND) and six-rule transient-artifact rejection
- A three-condition AND gate on valleys, plus six rules that reject transient acoustic artifacts.
The two-phase story (the actual point)
Phase 1 — Detector validation (this paper)
Prospective single-arm study, N ≈ 30, in Switzerland (ethics submission in preparation, expected HRA ClinO category A). Ground truth: Polar H10 + Vernier Go Direct Respiration Belt at 50 Hz, with dual-rater segmentation (κ ≥ 0.80). Three nested settings (quiet / normal / challenging, the last with scripted acoustic challenges). Primary endpoint: balanced accuracy, tested for non-inferiority against Breeze 2 (69%, the state of the art for smartphone-only).
Phase 2 — Substrate qualification (the novel part)
The same study also yields a label-quality measure: Cohen's κ between detector labels and the chest-belt reference, computed per technique × setting cell. Cells whose lower-bound 95% CI for κ is ≥ 0.85 are declared "ML-ready" and released as the training base for a follow-up ML paper. Cells that miss the cutoff are explicitly excluded.
Why this is the point: mobile-health ML usually fails on "unvalidated labels in, unvalidated model out." We reverse the order — quantify label quality first, then train. To our knowledge, no one has formalized this in the mobile-audio context.
Two papers from one study
| Paper | Focus | Data basis | Horizon |
|---|---|---|---|
| Paper 1 (now) | Validated rule-based detector + substrate qualification | N ≈ 30 study | 6–9 months |
| Paper 2 (follow-up) | Personalized self-supervised ML on qualified labels | Dataset released from Paper 1 + ongoing app users | 12–18 months |
Paper 1 automatically produces the de-identified qualified-labels dataset (YAMNet embedding, detector label, chest-belt label), published as an OSF benchmark — not just for us, but as a resource for the community.
Why this is ready now
The algorithm has been in production since April 2026 (shii·haa v1.7.4, iOS + Android). The label-collection infrastructure is already live (milestone M1): confirmed (waveform, phase-label) pairs are collected per user on-device, and no raw data leaves the device. Once the validation study runs, it provides both the accuracy evidence for Paper 1 and the qualified training data for Paper 2 — without any extra recruitment.
Contact & collaboration
We are looking for clinical co-authors, ethics-committee sparring, and feedback from methodology peers (sample-size strategy, target-journal fit, registration strategy). The full methodology draft (v2.1) exists as a supplement and is shared on request.
We measure. You decide.