Prompt Optimizer

The Prompt Optimizer module closes the evaluation loop: once Fair Forge measures where your agent fails, these tools automatically find the system prompt that fixes those failures.

How it fits into Fair Forge

Fair Forge measures metrics (Context, Conversational, etc.)
    ↓ failing examples become your dataset
GEPAOptimizer or MIPROv2Optimizer finds the best prompt
    ↓ optimized prompt deployed to your agent
Fair Forge measures again to confirm improvement

Available Optimizers

GEPA

Iteratively reads failures and generates improved prompt candidates. Best when the prompt itself is clearly wrong.

MIPROv2

Optimizes instruction AND few-shot examples simultaneously using Bayesian search. Best when format and tone matter as much as content.

When to use each

	GEPA	MIPROv2
Optimizes	Prompt instruction	Instruction + few-shot examples
Search strategy	Iterative, reads failures	Bayesian (Optuna/TPE)
Best for	Clearly bad prompts	Format-sensitive tasks
Speed	Fast (few iterations)	Slower (20+ trials)
Examples needed	No	Yes — key differentiator

Installation

uv add "alquimia-fair-forge"
uv add langchain-groq  # or your preferred LLM provider

Common Pattern

Both optimizers follow the same Fair Forge pattern:

from fair_forge import Retriever
from fair_forge.schemas import Dataset

class MyRetriever(Retriever):
    def load_dataset(self) -> list[Dataset]:
        # Return your evaluation dataset
        ...

result = AnyOptimizer.run(
    retriever=MyRetriever,
    model=model,
    seed_prompt="Your current (bad) system prompt.",
    objective="Plain language description of what a good response looks like.",
)

print(result.optimized_prompt)

The objective is the most important parameter. Describe what a good response looks like — the optimizer uses it to evaluate candidates and guide generation.

Custom Evaluator

By default both optimizers use an LLM judge based on the objective. For structured or deterministic tasks, pass a custom evaluator for sharper signal:

from fair_forge.prompt_optimizer import LLMEvaluator

# Default — describe criteria in natural language
evaluator = LLMEvaluator(
    model=model,
    criteria="Response must use only the provided context and be concise.",
)

# Custom — deterministic logic
def my_evaluator(actual: str, expected: str, query: str, context: str) -> float:
    # Return a float between 0.0 and 1.0
    ...

Output

print(f"Score: {result.initial_score:.2f} → {result.final_score:.2f}  ({result.n_examples} examples)")
print(result.optimized_prompt)

Documentation Index

​Prompt Optimizer

​How it fits into Fair Forge

​Available Optimizers

GEPA

MIPROv2

​When to use each

​Installation

​Common Pattern

​Custom Evaluator

​Output

Prompt Optimizer

How it fits into Fair Forge

Available Optimizers

When to use each

Installation

Common Pattern

Custom Evaluator

Output