Documentation Index Fetch the complete documentation index at: https://fairforge.alquimia.ai/llms.txt
Use this file to discover all available pages before exploring further.
Generators Overview
Fair Forge generators create synthetic test datasets from your documentation, enabling automated testing of AI assistants without manual dataset creation.
Why Use Generators?
Save Time Automatically create test cases from existing documentation
Better Coverage Generate diverse questions across all your content
Consistent Quality Structured question generation with difficulty levels
Easy Updates Regenerate tests when documentation changes
Installation
uv add "alquimia-fair-forge[generators]"
uv add langchain-groq # Or your preferred LLM provider
Quick Start
from fair_forge.generators import BaseGenerator, create_markdown_loader
from langchain_groq import ChatGroq
# Create a context loader
loader = create_markdown_loader(
max_chunk_size = 2000 ,
header_levels = [ 1 , 2 , 3 ],
)
# Create generator with an LLM
model = ChatGroq( model = "llama-3.1-8b-instant" , temperature = 0.4 )
generator = BaseGenerator( model = model, use_structured_output = True )
# Generate test dataset
datasets = await generator.generate_dataset(
context_loader = loader,
source = "./documentation.md" ,
assistant_id = "my-assistant" ,
num_queries_per_chunk = 3 ,
language = "english" ,
)
# Use with metrics
for dataset in datasets:
print ( f "Generated { len (dataset.conversation) } test queries" )
Key Components
BaseGenerator
The main class for generating test datasets:
from fair_forge.generators import BaseGenerator
generator = BaseGenerator(
model = your_langchain_model,
use_structured_output = True ,
)
Context Loaders
Load and chunk your documentation:
from fair_forge.generators import create_markdown_loader
loader = create_markdown_loader(
max_chunk_size = 2000 ,
header_levels = [ 1 , 2 , 3 ],
)
Selection Strategies
Control how chunks are selected:
from fair_forge.generators import SequentialStrategy, RandomSamplingStrategy
# Process all chunks sequentially (default)
strategy = SequentialStrategy()
# Sample random chunks multiple times
strategy = RandomSamplingStrategy(
num_samples = 3 ,
chunks_per_sample = 5 ,
)
Generation Modes
Independent Queries
Generate standalone questions:
datasets = await generator.generate_dataset(
context_loader = loader,
source = "./docs" ,
num_queries_per_chunk = 3 ,
conversation_mode = False , # Default
)
Conversation Mode
Generate coherent multi-turn conversations:
datasets = await generator.generate_dataset(
context_loader = loader,
source = "./docs" ,
num_queries_per_chunk = 3 ,
conversation_mode = True , # Each turn builds on previous
)
Generated datasets follow the standard Fair Forge schema:
Dataset(
session_id = "generated-uuid" ,
assistant_id = "my-assistant" ,
language = "english" ,
context = "Combined chunk content..." ,
conversation = [
Batch(
qa_id = "chunk-1_q1" ,
query = "Generated question?" ,
assistant = "" , # Empty - to be filled by runner
agentic = {
"difficulty" : "medium" ,
"query_type" : "factual" ,
"chunk_id" : "doc_section_1" ,
},
),
...
]
)
Workflow
Supported LLM Providers
Provider Import Notes Groq langchain_groq.ChatGroqFast, free tier available OpenAI langchain_openai.ChatOpenAIGPT-4, GPT-3.5 Google langchain_google_genai.ChatGoogleGenerativeAIGemini models Anthropic langchain_anthropic.ChatAnthropicClaude models Ollama langchain_ollama.ChatOllamaLocal models
Next Steps
BaseGenerator Learn about the generator class
Context Loaders Learn about loading documentation
Strategies Learn about chunk selection
AWS Lambda Example Deploy as serverless function