Streaming Retrievers
Fair Forge supports lazy iteration over datasets through two streaming modes. Instead of returning a full list fromload_dataset(), a streaming retriever returns a generator. The framework detects this automatically and routes processing through the appropriate strategy.
Schemas
Streaming introduces two new types fromfair_forge.schemas.common:
StreamedBatch is the unit yielded in STREAM_BATCHES mode. It carries the session context (metadata) alongside a single interaction (batch), because each QA pair no longer belongs to a parent Dataset container.
stream_sessions
Yield one completeDataset session at a time. Memory footprint is bounded to a single session. The processing logic is identical to FULL_DATASET — each session’s full conversation is available in batch().
When to use stream_sessions
- You have many sessions but each session fits easily in memory
- Sessions are independent and require no cross-session state
- You want a simple drop-in replacement for
FULL_DATASETwith lower memory usage - Your data source is a file, database cursor, or any iterable you cannot fully materialise
stream_batches
Yield oneStreamedBatch (a single QA pair) at a time. The metric’s batch() method receives a one-item list for each call. This is the lowest-memory mode and the most granular.
Streaming from a message queue
stream_batches is ideal when QA pairs arrive from a live source:
When to use stream_batches
- QA pairs arrive from a queue or streaming API (no session container)
- You want the minimum possible memory footprint
- Metrics don’t require access to adjacent QA pairs within the same session
- You’re building a real-time evaluation pipeline
BestOf works with all three modes, but the granularity follows how batches are formed. With
full_dataset or stream_sessions, each batch contains all QA pairs in a session, so BestOf compares the entire conversation as a single block (one tournament per session / block of qa_ids). With stream_batches, each batch is a single QA pair, so BestOf runs a single king-of-the-hill comparison per question and produces exactly one result per qa_id instead of per session.Choosing the Right Mode
full_dataset | stream_sessions | stream_batches | |
|---|---|---|---|
| Memory | All sessions loaded | One session at a time | One QA pair at a time |
| Data source | In-memory list | File / DB cursor / API | Queue / stream |
| Cross-session state | ✅ | ❌ | ❌ |
| Cross-QA state in session | ✅ | ✅ | ❌ |
| Compatible with all metrics | ✅ | ✅ | Most metrics ✅ |
| Real-time processing | ❌ | ❌ | ✅ |
Next Steps
Retriever Overview
Full dataset mode examples and retriever interface
Dataset & Batch
Understand Dataset, Batch, and SessionMetadata structures