Skip to main content

Runners Overview

Runners execute test datasets against AI systems and collect responses for evaluation.

Why Use Runners?

Automated Testing

Execute tests against any AI system automatically

Multiple Backends

Support for Alquimia, LLMs, and custom systems

Async Execution

Efficient asynchronous batch processing

Detailed Metrics

Track execution times and success rates

Installation

uv pip install "alquimia-fair-forge[runners]"

Quick Start

from fair_forge.runners import AlquimiaRunner
from fair_forge.schemas import Dataset, Batch

# Create runner
runner = AlquimiaRunner(
    base_url="https://api.alquimia.ai",
    api_key="your-api-key",
    agent_id="your-agent-id",
    channel_id="your-channel-id",
)

# Create test dataset
dataset = Dataset(
    session_id="test-001",
    assistant_id="my-assistant",
    language="english",
    context="",
    conversation=[
        Batch(
            qa_id="q1",
            query="What is the capital of France?",
            assistant="",  # Will be filled by runner
            ground_truth_assistant="Paris",
        ),
    ],
)

# Execute tests
updated_dataset, summary = await runner.run_dataset(dataset)

print(f"Success rate: {summary['successes']}/{summary['total_batches']}")

Available Runners

RunnerUse CaseRequirements
AlquimiaRunnerAlquimia AI agentsAlquimia API credentials
Custom runnersAny AI systemImplement BaseRunner interface

Execution Modes

LLM Mode (Lambda)

Execute tests directly against any LangChain-compatible LLM:
curl -X POST "https://your-lambda-url/run" \
  -H "Content-Type: application/json" \
  -d '{
    "connector": {
      "class_path": "langchain_groq.chat_models.ChatGroq",
      "params": {"model": "llama-3.1-8b-instant", "api_key": "..."}
    },
    "datasets": [...]
  }'

Alquimia Mode

Execute tests against Alquimia AI agents:
runner = AlquimiaRunner(
    base_url="https://api.alquimia.ai",
    api_key="your-api-key",
    agent_id="your-agent-id",
    channel_id="your-channel-id",
)

Workflow

Execution Summary

Each run returns a summary with metrics:
summary = {
    "session_id": "test-001",
    "total_batches": 10,
    "successes": 9,
    "failures": 1,
    "total_execution_time_ms": 5432.1,
    "avg_batch_time_ms": 543.2,
}

Next Steps