BaseGenerator
TheBaseGenerator class is the core component for generating synthetic test datasets from your documentation.
Overview
Constructor Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
model | BaseChatModel | Required | LangChain-compatible chat model |
use_structured_output | bool | False | Use structured output parsing |
generate_dataset Method
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
context_loader | ContextLoader | Required | Loader for documentation |
source | str | Required | Path to documentation |
assistant_id | str | Required | ID for the generated dataset |
num_queries_per_chunk | int | 3 | Questions per chunk |
language | str | "english" | Language for generation |
conversation_mode | bool | False | Generate conversations |
selection_strategy | Strategy | SequentialStrategy() | Chunk selection strategy |
seed_examples | list[str] | None | Example questions to guide style |
Return Value
Basic Example
With Seed Examples
Guide the style of generated questions:Conversation Mode
Generate coherent multi-turn conversations:With Selection Strategy
Random Sampling
Generate multiple diverse datasets:Generated Query Metadata
Each generated batch includes metadata in theagentic field:
Complete Example
Error Handling
Best Practices
Choose Appropriate Chunk Size
Choose Appropriate Chunk Size
Match chunk size to your content:
- 500-1000: Short, focused sections
- 1000-2000: Standard documentation
- 2000-4000: Long-form content
Use Seed Examples
Use Seed Examples
Provide seed examples to guide question style:
Adjust Temperature
Adjust Temperature
- Low (0.0-0.3): More deterministic, focused questions
- Medium (0.4-0.7): Balanced creativity
- High (0.8-1.0): More varied, creative questions
Use Conversation Mode for Context Testing
Use Conversation Mode for Context Testing
Enable conversation mode when testing context retention: