Skip to main content

Dataset & Batch

Fair Forge uses two primary data structures to represent conversation data: Dataset and Batch.

Dataset

A Dataset represents a complete conversation session with an AI assistant.
from fair_forge.schemas.common import Dataset

dataset = Dataset(
    session_id="session-123",
    assistant_id="my-assistant-v1",
    language="english",
    context="You are a helpful customer service assistant.",
    conversation=[...],  # list[Batch]
)

Fields

FieldTypeRequiredDescription
session_idstrYesUnique identifier for the conversation session
assistant_idstrYesIdentifier of the assistant being evaluated
languagestr | NoneNoLanguage of the conversation (e.g., “english”, “spanish”)
contextstrYesSystem context/instructions provided to the assistant
conversationlist[Batch]YesList of Q&A interactions in the conversation

Example

from fair_forge.schemas.common import Dataset, Batch

dataset = Dataset(
    session_id="customer-support-001",
    assistant_id="support-bot-v2",
    language="english",
    context="""You are a helpful customer service assistant for TechStore.
    Be polite, concise, and always offer to help further.""",
    conversation=[
        Batch(
            qa_id="q1",
            query="Hi, I need help with my order",
            assistant="Hello! I'd be happy to help with your order. Could you please provide your order number?",
            ground_truth_assistant="Greet the customer and ask for order number.",
        ),
        Batch(
            qa_id="q2",
            query="It's ORDER-12345",
            assistant="Thank you! I found your order. It was shipped yesterday and should arrive by Friday. Is there anything else I can help with?",
            ground_truth_assistant="Look up the order and provide shipping status.",
        ),
    ],
)

Batch

A Batch represents a single question-answer interaction within a conversation.
from fair_forge.schemas.common import Batch

batch = Batch(
    qa_id="q1",
    query="What is the capital of France?",
    assistant="The capital of France is Paris.",
    ground_truth_assistant="Paris is the capital of France.",
)

Fields

FieldTypeRequiredDescription
qa_idstrYesUnique identifier for this interaction
querystrYesUser’s question or message
assistantstrYesAssistant’s response
ground_truth_assistantstr | NoneNoExpected/ideal response for comparison
observationstr | NoneNoAdditional notes or observations
agenticdict | NoneNoMetadata about the interaction
ground_truth_agenticdict | NoneNoExpected metadata
logprobsdict | NoneNoLog probabilities from the model

Field Details

A unique identifier for the interaction within the conversation. Use a consistent naming scheme:
qa_id="q1"           # Simple numbering
qa_id="order-inquiry-001"  # Descriptive naming
qa_id="session123_turn5"   # Session-based
The user’s input message or question. This is what the assistant is responding to:
query="What are your business hours?"
query="Can you help me debug this Python code?"
query="Tell me a joke about programming"
The assistant’s actual response. This is what gets evaluated:
assistant="Our business hours are Monday to Friday, 9 AM to 5 PM EST."
The expected or ideal response. Used by some metrics for comparison:
ground_truth_assistant="Mon-Fri 9-5 EST"
This is optional but recommended for metrics like Context and Conversational.
Additional context or notes about the interaction:
observation="Customer seems frustrated in this exchange"
observation="This is a follow-up to the previous question"
Metadata dictionary for storing additional information:
agentic={
    "difficulty": "medium",
    "query_type": "factual",
    "chunk_id": "doc_section_3",
    "turn_number": 2,
}
Used by generators to store query metadata.
Log probabilities from the model (if available):
logprobs={
    "tokens": ["The", "capital", "is", "Paris"],
    "token_logprobs": [-0.1, -0.05, -0.02, -0.01],
}

JSON Format

The data structures can be easily serialized to/from JSON:

Dataset JSON

{
  "session_id": "session-123",
  "assistant_id": "my-assistant",
  "language": "english",
  "context": "You are a helpful assistant.",
  "conversation": [
    {
      "qa_id": "q1",
      "query": "What is AI?",
      "assistant": "AI stands for Artificial Intelligence...",
      "ground_truth_assistant": "Artificial Intelligence is...",
      "observation": null,
      "agentic": null,
      "ground_truth_agentic": null,
      "logprobs": null
    }
  ]
}

Loading from JSON

import json
from fair_forge.schemas.common import Dataset

# Load from file
with open('data.json') as f:
    data = json.load(f)

# Validate and create Dataset
dataset = Dataset.model_validate(data)

# Or load multiple datasets
datasets = [Dataset.model_validate(d) for d in data_list]

Saving to JSON

import json
from fair_forge.schemas.common import Dataset

# Convert to dict
data = dataset.model_dump()

# Save to file
with open('output.json', 'w') as f:
    json.dump(data, f, indent=2)

Pydantic Validation

Both Dataset and Batch are Pydantic models with built-in validation:
from pydantic import ValidationError
from fair_forge.schemas.common import Dataset, Batch

# This will raise ValidationError - missing required field
try:
    batch = Batch(query="Hello")  # Missing qa_id and assistant
except ValidationError as e:
    print(e)

# This works - minimal required fields
batch = Batch(
    qa_id="q1",
    query="Hello",
    assistant="Hi there!",
)

# Optional fields default to None
print(batch.ground_truth_assistant)  # None
print(batch.observation)  # None

Usage with Metrics

Different metrics use different fields:
MetricKey Fields Used
Toxicityassistant, session_id, assistant_id
Biasquery, assistant, context
Contextquery, assistant, context, ground_truth_assistant
Conversationalquery, assistant, observation, ground_truth_assistant
Humanityassistant, ground_truth_assistant
BestOfquery, assistant (across multiple datasets)

Best Practices

Use Descriptive IDs

Choose meaningful qa_id values that help identify issues:
qa_id="billing-refund-q3"

Include Ground Truth

When possible, include ground_truth_assistant for better evaluation:
ground_truth_assistant="Expected response..."

Set Context Properly

The context field should contain system instructions:
context="You are a helpful, harmless assistant..."

Use Metadata

Store useful metadata in agentic for analysis:
agentic={"category": "support", "priority": "high"}

Next Steps