Coverage for integrations / agent_engine / demonstrability / __init__.py: 0.0%
7 statements
« prev ^ index » next coverage.py v7.14.0, created at 2026-05-12 04:49 +0000
« prev ^ index » next coverage.py v7.14.0, created at 2026-05-12 04:49 +0000
1"""DemonstrationProbe framework — Package B of the ml_intern brief.
3Every seeded agent claims to be best at its goal_type. This package
4turns the claim into a measurable, continuously-audited delta:
6 our_score vs (trivial_prompt, previous_version, cloud_api)
8A probe runs after each agent dispatch, computes a headline score,
9records it (a) in the existing _Leaderboard under benchmark key
10`goal:{goal_type}` — which HiveConsensus._vote_local_probe already
11reads — (b) as a per-goal append-only JSONL history for the
12ContinualImprovementProver, and (c) as tensorboard scalars under the
13`demonstrability/{goal_type}/*` category. No parallel storage.
15Regressions beyond a configured threshold auto-trigger a
16weight_tracker rollback request (when available), closing the loop
17the brief describes in §3.3.
19Public API:
20 - register_probe(goal_type) decorator
21 - get_probe(goal_type) -> DemonstrationProbe | None
22 - record_result(result: ProbeResult) -> None
23 - run_post_dispatch(goal_type, ctx) -> ProbeResult | None (hook
24 that agent_daemon calls after a goal dispatch completes)
25 - get_dashboard_snapshot() -> dict (surface for /api/agent-engine/
26 demonstrability)
27"""
28from __future__ import annotations
30from .base import (
31 DemonstrationProbe,
32 ProbeContext,
33 ProbeResult,
34 record_result,
35 get_dashboard_snapshot,
36)
37from .registry import (
38 register_probe,
39 get_probe,
40 list_probes,
41)
42from .scheduler import run_post_dispatch
44# Importing probes/* registers them via @register_probe — side-effect is
45# intentional and must NOT be lazy, otherwise the first dispatch would
46# find no probe registered.
47from .probes import llm_judge # noqa: F401
48from .probes import speech_therapy # noqa: F401
50__all__ = [
51 'DemonstrationProbe',
52 'ProbeContext',
53 'ProbeResult',
54 'record_result',
55 'get_dashboard_snapshot',
56 'register_probe',
57 'get_probe',
58 'list_probes',
59 'run_post_dispatch',
60]