Skip to main content

AWS Lambda Deployment

Deploy Fair Forge generators, runners, and metrics as AWS Lambda functions for serverless execution.

Available Lambda Functions

FunctionPurposeEndpoint
BestOfTournament-style AI assistant comparisonPOST /run
GeneratorsGenerate test datasets from contextPOST /run
RunnersExecute tests against AI systemsPOST /run

BestOf Metric Lambda

Run tournament-style comparisons between multiple AI assistants to determine which performs best.

How It Works

  1. Submit datasets from multiple assistants (same questions, different responses)
  2. The LLM judge evaluates head-to-head matchups
  3. Winners advance through elimination rounds
  4. Returns the tournament winner with detailed contest results

Supported LLM Providers

Providerclass_path
Groqlangchain_groq.chat_models.ChatGroq
OpenAIlangchain_openai.chat_models.ChatOpenAI
Google Geminilangchain_google_genai.chat_models.ChatGoogleGenerativeAI
Ollamalangchain_ollama.chat_models.ChatOllama

Request Format

{
  "connector": {
    "class_path": "langchain_groq.chat_models.ChatGroq",
    "params": {
      "model": "qwen/qwen3-32b",
      "api_key": "your-api-key",
      "temperature": 0.0
    }
  },
  "datasets": [
    {
      "session_id": "comparison_session",
      "assistant_id": "assistant_a",
      "language": "english",
      "context": "System context...",
      "conversation": [
        {
          "qa_id": "q1",
          "query": "User question",
          "assistant": "Assistant A response",
          "ground_truth_assistant": "Expected response (optional)"
        }
      ]
    },
    {
      "session_id": "comparison_session",
      "assistant_id": "assistant_b",
      "language": "english",
      "context": "System context...",
      "conversation": [
        {
          "qa_id": "q1",
          "query": "User question",
          "assistant": "Assistant B response",
          "ground_truth_assistant": "Expected response (optional)"
        }
      ]
    }
  ],
  "config": {
    "criteria": "Overall response quality",
    "use_structured_output": true
  }
}

Configuration Parameters

ParameterTypeDefaultDescription
criteriastr"Overall response quality"Evaluation criteria for judging
use_structured_outputbooltrueUse LangChain structured output
verboseboolfalseEnable verbose logging

Example: Compare Two Assistants

curl -s -X POST "https://your-lambda-url/run" \
  -H "Content-Type: application/json" \
  -d '{
    "connector": {
      "class_path": "langchain_groq.chat_models.ChatGroq",
      "params": {
        "model": "qwen/qwen3-32b",
        "api_key": "your-groq-api-key",
        "temperature": 0.0
      }
    },
    "datasets": [
      {
        "session_id": "comparison",
        "assistant_id": "gpt4_responses",
        "language": "english",
        "context": "",
        "conversation": [
          {
            "qa_id": "q1",
            "query": "What are the benefits of renewable energy?",
            "assistant": "Renewable energy offers numerous benefits including reduced greenhouse gas emissions, energy independence, job creation, and long-term cost savings."
          },
          {
            "qa_id": "q2",
            "query": "Explain machine learning simply.",
            "assistant": "Machine learning is a type of AI where computers learn patterns from data rather than following explicit programming rules."
          }
        ]
      },
      {
        "session_id": "comparison",
        "assistant_id": "claude_responses",
        "language": "english",
        "context": "",
        "conversation": [
          {
            "qa_id": "q1",
            "query": "What are the benefits of renewable energy?",
            "assistant": "Clean energy good. Sun power help planet."
          },
          {
            "qa_id": "q2",
            "query": "Explain machine learning simply.",
            "assistant": "Computer learns things from data."
          }
        ]
      }
    ],
    "config": {
      "criteria": "Response quality, clarity, completeness, and accuracy"
    }
  }'

Response Format

{
  "success": true,
  "winner": "gpt4_responses",
  "contestants": ["gpt4_responses", "claude_responses"],
  "total_rounds": 1,
  "contests": [
    {
      "round": 1,
      "left": "gpt4_responses",
      "right": "claude_responses",
      "winner": "gpt4_responses",
      "confidence": 0.95,
      "verdict": "Assistant A provides more comprehensive and well-structured responses",
      "reasoning": "Detailed analysis of the comparison..."
    }
  ]
}

Response Fields

FieldTypeDescription
winnerstrThe assistant_id of the tournament winner
contestantslistAll assistant_ids that participated
total_roundsintNumber of tournament rounds
contestslistDetails of each head-to-head matchup
contests[].confidencefloatJudge’s confidence in decision (0-1)
contests[].verdictstrBrief summary of the decision
contests[].reasoningstrDetailed reasoning for the decision

Generators Lambda

Generate synthetic test datasets from markdown content using any LLM.

Supported LLM Providers

Providerclass_path
Groqlangchain_groq.chat_models.ChatGroq
OpenAIlangchain_openai.chat_models.ChatOpenAI
Google Geminilangchain_google_genai.chat_models.ChatGoogleGenerativeAI
Ollamalangchain_ollama.chat_models.ChatOllama

Request Format

{
  "connector": {
    "class_path": "langchain_groq.chat_models.ChatGroq",
    "params": {
      "model": "qwen/qwen3-32b",
      "api_key": "your-api-key",
      "temperature": 0.7
    }
  },
  "context": "# Your Markdown Content\n\nContent to generate questions from...",
  "config": {
    "assistant_id": "my-assistant",
    "num_queries": 3,
    "language": "english",
    "conversation_mode": false,
    "max_chunk_size": 2000,
    "min_chunk_size": 200,
    "seed_examples": ["Example question 1?", "Example question 2?"]
  }
}

Configuration Parameters

ParameterTypeDefaultDescription
assistant_idstrRequiredID for generated dataset
num_queriesint3Questions per chunk
languagestr"english"Language for generation
conversation_modeboolfalseGenerate conversations
max_chunk_sizeint2000Max chars per chunk
min_chunk_sizeint200Min chars per chunk
seed_exampleslist[str]nullExample questions for style

Example: Using Groq

curl -s -X POST "https://your-lambda-url/run" \
  -H "Content-Type: application/json" \
  -d '{
    "connector": {
      "class_path": "langchain_groq.chat_models.ChatGroq",
      "params": {
        "model": "qwen/qwen3-32b",
        "api_key": "your-groq-api-key",
        "temperature": 0.7
      }
    },
    "context": "# Product Documentation\n\nOur product helps users manage tasks.\n\n## Features\n\n- Task creation\n- Reminders\n- Collaboration",
    "config": {
      "assistant_id": "docs-assistant",
      "num_queries": 3,
      "language": "english"
    }
  }'

Example: Using OpenAI

curl -s -X POST "https://your-lambda-url/run" \
  -H "Content-Type: application/json" \
  -d '{
    "connector": {
      "class_path": "langchain_openai.chat_models.ChatOpenAI",
      "params": {
        "model": "gpt-4o-mini",
        "api_key": "your-openai-api-key",
        "temperature": 0.7
      }
    },
    "context": "Your markdown content...",
    "config": {
      "assistant_id": "my-assistant",
      "num_queries": 3
    }
  }'

Response Format

{
  "success": true,
  "datasets": [
    {
      "session_id": "uuid-generated",
      "assistant_id": "my-assistant",
      "language": "english",
      "context": "Combined chunk content...",
      "conversation": [
        {
          "qa_id": "chunk-1_q1",
          "query": "Generated question?",
          "assistant": "",
          "ground_truth_assistant": ""
        }
      ]
    }
  ],
  "total_datasets": 1,
  "total_batches": 3
}

Runners Lambda

Execute test datasets against AI systems.

Modes

LLM Mode: Direct execution against any LangChain-compatible LLM Alquimia Mode: Execution against Alquimia AI agents

LLM Mode Request

{
  "connector": {
    "class_path": "langchain_groq.chat_models.ChatGroq",
    "params": {
      "model": "qwen/qwen3-32b",
      "api_key": "your-api-key"
    }
  },
  "datasets": [
    {
      "session_id": "test-session-1",
      "assistant_id": "groq-assistant",
      "language": "english",
      "context": "",
      "conversation": [
        {
          "qa_id": "q1",
          "query": "What is the capital of France?",
          "assistant": "",
          "ground_truth_assistant": "Paris"
        }
      ]
    }
  ]
}

Alquimia Mode Request

{
  "datasets": [
    {
      "session_id": "test-session-1",
      "assistant_id": "target-assistant",
      "language": "english",
      "context": "",
      "conversation": [
        {
          "qa_id": "q1",
          "query": "What is the capital of France?",
          "assistant": "",
          "ground_truth_assistant": "Paris"
        }
      ]
    }
  ],
  "config": {
    "base_url": "https://api.alquimia.ai",
    "api_key": "your-alquimia-api-key",
    "agent_id": "your-agent-id",
    "channel_id": "your-channel-id"
  }
}

Example: LLM Mode

curl -s -X POST "https://your-lambda-url/run" \
  -H "Content-Type: application/json" \
  -d '{
    "connector": {
      "class_path": "langchain_groq.chat_models.ChatGroq",
      "params": {
        "model": "llama-3.1-8b-instant",
        "api_key": "your-groq-api-key"
      }
    },
    "datasets": [
      {
        "session_id": "test-1",
        "assistant_id": "test-assistant",
        "language": "english",
        "context": "",
        "conversation": [
          {
            "qa_id": "q1",
            "query": "What is machine learning?",
            "assistant": "",
            "ground_truth_assistant": "Machine learning is..."
          }
        ]
      }
    ]
  }'

Response Format

{
  "success": true,
  "datasets": [
    {
      "session_id": "test-session-1",
      "assistant_id": "assistant-id",
      "language": "english",
      "context": "",
      "conversation": [
        {
          "qa_id": "q1",
          "query": "What is the capital of France?",
          "assistant": "The capital of France is Paris.",
          "ground_truth_assistant": "Paris"
        }
      ]
    }
  ],
  "summaries": [
    {
      "session_id": "test-session-1",
      "total_batches": 1,
      "successes": 1,
      "failures": 0,
      "total_execution_time_ms": 1234.5,
      "avg_batch_time_ms": 1234.5
    }
  ],
  "total_datasets": 1
}

Deployment

Prerequisites

  • AWS CLI configured
  • Docker installed
  • AWS ECR repository access

Deploy BestOf Metric

cd examples/bestof/aws-lambda

# Deploy
./scripts/deploy.sh bestof us-east-2

# Update (rebuild and redeploy)
./scripts/update.sh bestof us-east-2

# Cleanup (remove all resources)
./scripts/cleanup.sh bestof us-east-2

Deploy Generators

cd examples/generators/aws-lambda

# Deploy
./scripts/deploy.sh generators us-east-2

# Update (rebuild and redeploy)
./scripts/update.sh generators us-east-2

# Cleanup (remove all resources)
./scripts/cleanup.sh generators us-east-2

Deploy Runners

cd examples/runners/aws-lambda

# Deploy
./scripts/deploy.sh runners us-east-2

# Update
./scripts/update.sh runners us-east-2

# Cleanup
./scripts/cleanup.sh runners us-east-2

View Logs

# BestOf logs
aws logs tail "/aws/lambda/fair-forge-bestof" --follow --region us-east-2

# Generators logs
aws logs tail "/aws/lambda/fair-forge-generators" --follow --region us-east-2

# Runners logs
aws logs tail "/aws/lambda/fair-forge-runners" --follow --region us-east-2

Architecture

Integration Example

Combine generators and runners:
import httpx
import json

# 1. Generate test dataset
generator_response = httpx.post(
    "https://generators-lambda-url/run",
    json={
        "connector": {
            "class_path": "langchain_groq.chat_models.ChatGroq",
            "params": {"model": "llama-3.1-8b-instant", "api_key": "..."}
        },
        "context": "# Documentation\n\nContent here...",
        "config": {"assistant_id": "test-bot", "num_queries": 5}
    }
)
datasets = generator_response.json()["datasets"]

# 2. Execute tests
runner_response = httpx.post(
    "https://runners-lambda-url/run",
    json={
        "connector": {
            "class_path": "langchain_openai.chat_models.ChatOpenAI",
            "params": {"model": "gpt-4o-mini", "api_key": "..."}
        },
        "datasets": datasets
    }
)
results = runner_response.json()

# 3. Analyze results
for summary in results["summaries"]:
    print(f"Session: {summary['session_id']}")
    print(f"Success rate: {summary['successes']}/{summary['total_batches']}")

Next Steps

BestOf Metric

Learn about tournament-style evaluation

Generators

Learn about test generation

Runners

Learn about test execution