Testing Guide¶

Running Tests¶

All Tests¶

pytest tests/ -v

Unit Tests Only¶

pytest tests/unit/ -v

Integration Tests Only¶

pytest tests/integration/ -v

Specific Test Files¶

pytest tests/unit/test_agent_creation.py -v       # Agent creation
pytest tests/unit/test_recipe_generation.py -v     # Recipe generation
pytest tests/unit/test_reuse_mode.py -v            # Reuse execution
pytest tests/unit/test_federation_upgrade.py -v    # Federation upgrade
pytest tests/unit/test_model_lifecycle.py -v       # Model lifecycle

Standalone Suites¶

python tests/standalone/test_master_suite.py           # Comprehensive suite
python tests/standalone/test_autonomous_agent_suite.py  # Autonomous agents

Important Flags¶

--noconftest¶

Use --noconftest for most test runs. The TestMediaAgent fixture in test_social_models.py can corrupt pytest's tempfile handle, causing cascading failures across 724+ tests.

pytest tests/unit/ -v --noconftest

-p no:capture¶

Required for federation tests to avoid output capture conflicts:

pytest tests/unit/test_federation_upgrade.py -v -p no:capture

Test Environment Notes¶

Python 3.10 required for full compatibility (pydantic 1.10.9)
Python 3.11 works but autogen is not installed, causing 9 test files to skip
Pre-existing: ~70 failures across 27 files (not caused by recent changes)
All 266 tests from the 6-workstream plan pass (41 new + 225 regression)

Key Test Files¶

File	Coverage
`test_agent_creation.py`	CREATE mode, action decomposition
`test_recipe_generation.py`	Recipe save/load, JSON format
`test_reuse_mode.py`	REUSE mode, recipe replay
`test_federation_upgrade.py`	Federation protocol, peer sync
`test_model_lifecycle.py`	Model load/unload/offload
`test_social_models.py`	ORM models, db_session()
`test_master_suite.py`	Comprehensive end-to-end

Writing Tests¶

Use db_session() for Database Tests¶

from integrations.social.models import db_session

def test_create_user():
    with db_session() as db:
        user = User(username='test')
        db.add(user)
        db.commit()
        assert user.id is not None

In-Memory Database¶

Set HEVOLVE_DB_PATH=:memory: for test isolation:

import os
os.environ['HEVOLVE_DB_PATH'] = ':memory:'

Mocking External Services¶

Mock API calls, not internal functions. Use unittest.mock.patch on HTTP endpoints:

from unittest.mock import patch

@patch('requests.post')
def test_external_call(mock_post):
    mock_post.return_value.json.return_value = {'result': 'ok'}
    # Test code here

Functional Tests¶

The tests/functional/ suite validates core subsystems end-to-end with real logic (no mocks on the code under test). Total: 233 tests, runs in ~30 s.

pytest tests/functional/ --noconftest -q

File	Tests	What it covers
`test_message_bus_functional.py`	8	Pub/sub delivery, wildcard topics, unsubscribe, thread safety
`test_federation_functional.py`	23	3-node convergence, HMAC sign/verify, stale-delta rejection, recipe channel, guardrail hash enforcement
`test_revenue_functional.py`	6	90/9/1 split math, real SQLite Spark settlements, dashboard keys, env overrides
`test_vlm_loop_functional.py`	17	VLM control flow, action parsing, bbox handling, iteration budget, safety-gate stubs
`test_pipeline_lifecycle_functional.py`	22	ActionState machine transitions, recipe save/load round-trip, path-traversal guards, thread-safe state changes
`test_security_modules_functional.py`	141	Input sanitization (SQL, HTML, path), audit-log hash chain & tamper detection, action classifier (safe/destructive), DLP scan/redact (PII, credit card, IP), rate limiter, tool allowlist
`test_device_control_functional.py`	16	Channel-to-device routing via PeerLink, SAME_USER privacy gate, fleet-command fallback, embedded handler GPIO/serial detection

These tests exercise the actual production code paths. External services (LLM APIs, network peers) are stubbed at the boundary, but all internal logic runs unmodified.