docs

Testing Overview

Mimic automatically generates test scenarios grounded in the data it creates. Every testable fact — an overdue invoice, a spending anomaly, a missing record — becomes a scenario with natural-language input and concrete assertions. No prompt templates, no hand-written test data.

The pipeline has three stages:

  1. Facts — During mimic run, the LLM generates a set of testable facts alongside the persona data. These are written to .mimic/fact-manifest.json.
  2. Scenarios — During mimic test, a single LLM call converts facts into test scenarios with natural questions and data-specific assertions.
  3. Export — Scenarios can be exported to external eval platforms (PromptFoo, Braintrust, LangSmith, Inspect AI) or Mimic’s own format.

Facts & the Fact Manifest

A fact is a structured, testable statement about the generated data. Facts are created by the LLM during blueprint generation and describe anomalies, trends, risks, and integrity issues that an AI agent should be able to reason about.

json.mimic/fact-manifest.json (excerpt)
{
  "persona": "growth-saas",
  "domain": "Multi-platform SaaS billing",
  "facts": [
    {
      "id": "fact_001",
      "type": "overdue",
      "platform": "chargebee",
      "severity": "critical",
      "detail": "3 overdue invoices totalling £12,400. Oldest is 34 days overdue.",
      "data": {
        "count": 3,
        "total_gbp": 12400,
        "oldest_days_overdue": 34
      }
    }
  ]
}

Fact types

TypeDescriptionExample
anomalyUnexpected deviation from normal patternsMobile MRR down 23% due to App Store outage
overdueItems past their due date3 invoices totalling £12,400 overdue
pendingItems awaiting settlement or completion£8,400 direct debit pending bank settlement
integrityData consistency issues across systems34 users with paid flags but no billing record
growthNotable growth trends or patternsEU segment up 31% MoM driven by German market
riskChurn risk or other business risks14 Pro customers inactive for 30+ days

Severity levels

Each fact has a severity that maps to a scenario tier:

SeverityScenario TierMax LatencyPurpose
infosmoke10sAgent surfaces basic information correctly
warnfunctional20sAgent handles nuanced or multi-step queries
criticaladversarial15sAgent handles tricky edge cases without hallucinating

Auto-Scenario Generation

When auto_scenarios: true is set in mimic.json, mimic test reads the fact manifest and sends all facts to the LLM in a single batched call. The LLM generates one scenario per fact, each with:

This is adapter-agnostic — the LLM reads each fact’s detail field and generates appropriate questions regardless of whether the data comes from Stripe, a Postgres database, or a future adapter.

bash
# Enable in mimic.json:
# "test": { "agent": "...", "auto_scenarios": true }# Then run:
$ mimic run          # generates data + fact manifest
$ mimic test         # generates scenarios from facts, then runs them

Filtering by tier

Use --tier to limit which scenarios are generated:

bash
# Only smoke tests (info-severity facts)
$ mimic test --tier smoke
​
# Smoke + functional (skip adversarial)
$ mimic test --tier smoke functional

Or set it in the config with scenario_tiers:

json
"test": {
  "agent": "http://localhost:3000/chat",
  "auto_scenarios": true,
  "scenario_tiers": ["smoke", "functional"]
}

Exporting Scenarios

Auto-generated scenarios can be exported to external eval platforms or Mimic’s own format using --export:

bash
$ mimic test --export promptfoo    # PromptFoo YAML config
$ mimic test --export braintrust   # Braintrust dataset + scorer
$ mimic test --export langsmith    # LangSmith dataset + evaluator
$ mimic test --export mimic        # Mimic native JSON
$ mimic test --inspect             # Inspect AI Python task

All exported files are written to .mimic/exports/. If manual scenarios are defined in mimic.json, they are also run after the export.

mimic (native format)

Exports scenarios as a JSON array matching the test.scenarios shape in mimic.json. You can paste these directly into your config or load them as a standalone file.

json.mimic/exports/mimic-scenarios.json (excerpt)
[
  {
    "name": "chargebee-overdue-critical-invoices",
    "persona": "growth-saas",
    "goal": "Agent surfaces the 34-day overdue invoice as highest priority",
    "input": "What overdue invoices do we have in Chargebee?",
    "expect": {
      "response_contains": ["£12,400", "34 days", "inv_p1_cb_overdue_001"],
      "response_excludes": ["no overdue invoices", "all paid"],
      "numeric_range": { "field": "total_overdue_gbp", "min": 11160, "max": 13640 },
      "max_latency_ms": 15000
    },
    "metadata": {
      "tier": "adversarial",
      "source_fact": "fact_001",
      "platform": "chargebee"
    }
  }
]

PromptFoo

Generates a promptfooconfig.yaml with contains, not-contains, and javascript assertions. Ready to run with npx promptfoo eval.

yaml.mimic/exports/promptfooconfig.yaml (excerpt)
tests:
  - description: "chargebee-overdue-critical-invoices [adversarial]"
    vars:
      question: "What overdue invoices do we have in Chargebee?"
    assert:
      - type: contains
        value: "£12,400"
      - type: not-contains
        value: "no overdue invoices"
      - type: javascript
        value: |
          const nums = output.match(/[\d,]+\.?\d*/g) || [];
          return nums.some(n => {
            const v = parseFloat(n.replace(/,/g, ''));
            return v >= 11160 && v <= 13640;
          });

Braintrust

Generates a braintrust-dataset.jsonl (one JSON object per line) and a braintrust-scorer.ts TypeScript scorer file for use with the Braintrust eval framework.

LangSmith

Generates three files:

Inspect AI

Generates a self-contained inspect_task.py Python file with an inline dataset and custom scorer. Run it with inspect eval inspect_task.py.


Configuration Reference

All auto-scenario settings live in the test block of mimic.json:

json
"test": {
  "agent": "http://localhost:3000/chat",
  "auto_scenarios": true,
  "scenario_tiers": ["smoke", "functional", "adversarial"],
  "export": "promptfoo",
  "scenarios": [
    // manual scenarios are merged with auto-generated ones
  ]
}
FieldTypeDefaultDescription
auto_scenariosbooleanfalseEnable auto-scenario generation from fact manifest
scenario_tiersarrayall tiersLimit to "smoke", "functional", and/or "adversarial"
exportstringDefault export format: "mimic", "promptfoo", "braintrust", "langsmith", "inspect"

CLI flags (--tier, --export, --inspect) override the config values.


End-to-End Example

A complete workflow using the CFO Agent example:

bash
# 1. Generate data with facts
$ mimic run
#   → .mimic/data/growth-saas.json
#   → .mimic/fact-manifest.json (11 facts)# 2. Seed databases
$ mimic seed
​
# 3. Start mock servers
$ mimic host
​
# 4. Export auto-generated scenarios to PromptFoo
$ mimic test --export promptfoo
#   → .mimic/exports/promptfooconfig.yaml# 5. Or run scenarios directly against the agent
$ mimic test --ci
#   → runs 11 auto + 2 manual scenarios
#   → exit code 1 if any fail
Combine with CI: Use mimic test --export mimic --ci in your pipeline to both export scenarios for review and fail the build if the agent doesn't pass.