Dataset & Batch
Fair Forge uses two primary data structures to represent conversation data:Dataset and Batch.
Dataset
ADataset represents a complete conversation session with an AI assistant.
Fields
| Field | Type | Required | Description |
|---|---|---|---|
session_id | str | Yes | Unique identifier for the conversation session |
assistant_id | str | Yes | Identifier of the assistant being evaluated |
language | str | None | No | Language of the conversation (e.g., “english”, “spanish”) |
context | str | Yes | System context/instructions provided to the assistant |
conversation | list[Batch] | Yes | List of Q&A interactions in the conversation |
Example
Batch
ABatch represents a single question-answer interaction within a conversation.
Fields
| Field | Type | Required | Description |
|---|---|---|---|
qa_id | str | Yes | Unique identifier for this interaction |
query | str | Yes | User’s question or message |
assistant | str | Yes | Assistant’s response |
ground_truth_assistant | str | None | No | Expected/ideal response for comparison |
observation | str | None | No | Additional notes or observations |
agentic | dict | None | No | Metadata about the interaction |
ground_truth_agentic | dict | None | No | Expected metadata |
logprobs | dict | None | No | Log probabilities from the model |
Field Details
qa_id
qa_id
A unique identifier for the interaction within the conversation. Use a consistent naming scheme:
query
query
The user’s input message or question. This is what the assistant is responding to:
assistant
assistant
The assistant’s actual response. This is what gets evaluated:
ground_truth_assistant
ground_truth_assistant
The expected or ideal response. Used by some metrics for comparison:This is optional but recommended for metrics like Context and Conversational.
observation
observation
Additional context or notes about the interaction:
agentic
agentic
Metadata dictionary for storing additional information:Used by generators to store query metadata.
logprobs
logprobs
Log probabilities from the model (if available):
JSON Format
The data structures can be easily serialized to/from JSON:Dataset JSON
Loading from JSON
Saving to JSON
Pydantic Validation
BothDataset and Batch are Pydantic models with built-in validation:
Usage with Metrics
Different metrics use different fields:| Metric | Key Fields Used |
|---|---|
| Toxicity | assistant, session_id, assistant_id |
| Bias | query, assistant, context |
| Context | query, assistant, context, ground_truth_assistant |
| Conversational | query, assistant, observation, ground_truth_assistant |
| Humanity | assistant, ground_truth_assistant |
| BestOf | query, assistant (across multiple datasets) |
Best Practices
Use Descriptive IDs
Choose meaningful
qa_id values that help identify issues:Include Ground Truth
When possible, include
ground_truth_assistant for better evaluation:Set Context Properly
The
context field should contain system instructions:Use Metadata
Store useful metadata in
agentic for analysis: