Skip to main content

Hooks System

Hooks are safety guardrails that enforce governance and best practices in SignalPilot. They let you define boundaries for AI behavior while still enabling autonomous investigation—ensuring that agentic systems operate within your organization’s constraints.

Why Hooks Matter

Agentic AI systems can execute code and run queries autonomously. This power needs boundaries:
❌ Without Hooks:
- AI queries production database during peak hours
- Slow query runs without LIMIT, scans 100M rows
- Analysis uses iterrows() instead of vectorized ops
- No audit trail of what data was accessed
✅ With Hooks:
- Pre-hook blocks prod queries outside business hours
- Pre-hook adds LIMIT clause to exploratory queries
- Post-hook validates code follows performance standards
- All data access logged for compliance

Hook Types

SignalPilot supports four types of hooks:

1. Pre-Execution Hooks

Run before code or queries execute. Can block, modify, or approve.
Use CaseExample Rule
Time-based access”Only query prod between 9am-5pm”
Query safety”Add LIMIT 1000 to SELECT * queries”
Resource protection”Block queries without WHERE clause on large tables”
Approval gates”Require approval for DELETE/UPDATE operations”

2. Post-Execution Hooks

Run after code executes. Validate results and flag issues.
Use CaseExample Rule
Data quality”Warn if null percentage > 5%“
Anomaly detection”Flag if row count changed > 20%“
Result validation”Check that dates are within expected range”
Consistency checks”Verify sum matches expected total”

3. Code Quality Hooks

Validate generated code against team standards.
Use CaseExample Rule
Performance”Use vectorized pandas, never iterrows()“
Style”All functions must have docstrings”
Security”Never hardcode credentials”
Best practices”Use parameterized queries, not string concat”

4. Audit Hooks

Log all operations for compliance and debugging.
What’s LoggedDetails
Data accessTables queried, columns accessed
Code executionWhat ran, when, by whom
Context resolutionWhat MCP sources were queried
ResultsSummary of outputs (not raw data)

Configuring Hooks

Basic Configuration

// signalpilot.config.json
{
  "hooks": {
    "pre_execution": [
      {
        "name": "limit_exploratory_queries",
        "trigger": "query",
        "condition": "SELECT *",
        "action": "modify",
        "modification": "ADD LIMIT 1000"
      }
    ],
    "post_execution": [
      {
        "name": "check_data_quality",
        "trigger": "dataframe_created",
        "action": "validate",
        "rules": ["null_percentage < 0.05", "row_count > 0"]
      }
    ]
  }
}

Python Configuration

from signalpilot import hooks

# Pre-execution hook
@hooks.pre_execute
def limit_prod_queries(query, context):
    if context.database == "production":
        if "LIMIT" not in query.upper():
            return query + " LIMIT 10000"
    return query

# Post-execution hook
@hooks.post_execute
def validate_results(result, context):
    if isinstance(result, pd.DataFrame):
        null_pct = result.isnull().mean().max()
        if null_pct > 0.05:
            hooks.warn(f"High null percentage: {null_pct:.1%}")
    return result

# Code quality hook
@hooks.code_quality
def enforce_vectorization(code):
    if ".iterrows()" in code:
        raise hooks.CodeQualityError(
            "Use vectorized operations instead of iterrows()"
        )
    return code

Common Hook Patterns

Production Database Protection

@hooks.pre_execute
def protect_production(query, context):
    from datetime import datetime

    if context.database == "production":
        hour = datetime.now().hour
        if hour < 9 or hour > 17:
            raise hooks.BlockedError(
                "Production queries only allowed 9am-5pm. "
                "Use staging database for after-hours work."
            )
    return query

Query Cost Control

@hooks.pre_execute
def control_query_cost(query, context):
    # Block full table scans on large tables
    large_tables = ["events", "pageviews", "logs"]

    for table in large_tables:
        if table in query.lower():
            if "WHERE" not in query.upper():
                raise hooks.BlockedError(
                    f"Query on {table} requires WHERE clause. "
                    f"Table has 100M+ rows."
                )
    return query

Data Quality Validation

@hooks.post_execute
def validate_data_quality(result, context):
    if not isinstance(result, pd.DataFrame):
        return result

    issues = []

    # Check for nulls
    null_pct = result.isnull().mean()
    high_null_cols = null_pct[null_pct > 0.05].index.tolist()
    if high_null_cols:
        issues.append(f"High null columns: {high_null_cols}")

    # Check for duplicates
    if result.duplicated().any():
        dup_count = result.duplicated().sum()
        issues.append(f"Found {dup_count} duplicate rows")

    if issues:
        hooks.warn("\n".join(issues))

    return result

Best Practices

1

Start Permissive, Tighten Over Time

Begin with warnings rather than blocks. Review warnings to understand patterns before enforcing strict rules.
2

Explain Blocks Clearly

When blocking an operation, provide clear explanation and suggested alternatives.
3

Test Hooks in Development

Use a staging database and test notebooks to verify hooks work as expected before enabling in production.
4

Review Audit Logs

Regularly review audit logs to identify patterns and adjust hooks accordingly.