Coverage for integrations / agent_engine / demonstrability / __init__.py: 0.0%

7 statements  

« prev     ^ index     » next       coverage.py v7.14.0, created at 2026-05-12 04:49 +0000

1"""DemonstrationProbe framework — Package B of the ml_intern brief. 

2 

3Every seeded agent claims to be best at its goal_type. This package 

4turns the claim into a measurable, continuously-audited delta: 

5 

6 our_score vs (trivial_prompt, previous_version, cloud_api) 

7 

8A probe runs after each agent dispatch, computes a headline score, 

9records it (a) in the existing _Leaderboard under benchmark key 

10`goal:{goal_type}` — which HiveConsensus._vote_local_probe already 

11reads — (b) as a per-goal append-only JSONL history for the 

12ContinualImprovementProver, and (c) as tensorboard scalars under the 

13`demonstrability/{goal_type}/*` category. No parallel storage. 

14 

15Regressions beyond a configured threshold auto-trigger a 

16weight_tracker rollback request (when available), closing the loop 

17the brief describes in §3.3. 

18 

19Public API: 

20 - register_probe(goal_type) decorator 

21 - get_probe(goal_type) -> DemonstrationProbe | None 

22 - record_result(result: ProbeResult) -> None 

23 - run_post_dispatch(goal_type, ctx) -> ProbeResult | None (hook 

24 that agent_daemon calls after a goal dispatch completes) 

25 - get_dashboard_snapshot() -> dict (surface for /api/agent-engine/ 

26 demonstrability) 

27""" 

28from __future__ import annotations 

29 

30from .base import ( 

31 DemonstrationProbe, 

32 ProbeContext, 

33 ProbeResult, 

34 record_result, 

35 get_dashboard_snapshot, 

36) 

37from .registry import ( 

38 register_probe, 

39 get_probe, 

40 list_probes, 

41) 

42from .scheduler import run_post_dispatch 

43 

44# Importing probes/* registers them via @register_probe — side-effect is 

45# intentional and must NOT be lazy, otherwise the first dispatch would 

46# find no probe registered. 

47from .probes import llm_judge # noqa: F401 

48from .probes import speech_therapy # noqa: F401 

49 

50__all__ = [ 

51 'DemonstrationProbe', 

52 'ProbeContext', 

53 'ProbeResult', 

54 'record_result', 

55 'get_dashboard_snapshot', 

56 'register_probe', 

57 'get_probe', 

58 'list_probes', 

59 'run_post_dispatch', 

60]