Skip to main content

Jupyter Notebooks

Interactive notebooks demonstrating each Fair Forge metric with visualizations.

Setup

# Clone the repository
git clone https://github.com/Alquimia-ai/fair-forge
cd fair-forge

# Install dependencies
uv sync

# Navigate to examples
cd examples

Available Notebooks

Toxicity

Location: examples/toxicity/jupyter/toxicity.ipynb Demonstrates toxicity detection with DIDT framework:
  • Cluster profiling with HDBSCAN + UMAP
  • Group profiling with demographic detection
  • Frequentist vs Bayesian statistical modes
  • UMAP visualization of response clusters
cd toxicity/jupyter
uv pip install "$(ls ../../dist/*.whl)[toxicity]" matplotlib
jupyter lab toxicity.ipynb
Key Features:
# Define group prototypes
group_prototypes = {
    "gender": ["women", "men", "female", "male"],
    "race": ["Asian", "African", "European", "Hispanic"],
    "religion": ["Christian", "Muslim", "Jewish", "Hindu"],
}

# Run with Frequentist mode
freq_metrics = Toxicity.run(
    LocalRetriever,
    group_prototypes=group_prototypes,
    statistical_mode=FrequentistMode(),
)

# Run with Bayesian mode
bayesian = BayesianMode(mc_samples=5000, ci_level=0.95)
bayes_metrics = Toxicity.run(
    LocalRetriever,
    group_prototypes=group_prototypes,
    statistical_mode=bayesian,
)

Bias

Location: examples/bias/jupyter/bias.ipynb Demonstrates bias detection across protected attributes:
  • LlamaGuard guardian configuration
  • Clopper-Pearson confidence intervals
  • Per-attribute bias analysis
  • Error bar visualization
cd bias/jupyter
uv pip install "$(ls ../../dist/*.whl)[bias]" matplotlib
jupyter lab bias.ipynb
Key Features:
# Configure guardian
guardian_config = GuardianLLMConfig(
    model="meta-llama/llama-guard-4-12b",
    api_key=GUARDIAN_API_KEY,
    url="https://api.groq.com/openai",
    provider=OpenAIGuardianProvider,
)

# Run bias detection
metrics = Bias.run(
    LocalRetriever,
    guardian=LLamaGuard,
    config=guardian_config,
    confidence_level=0.95,
)

Context

Location: examples/context/jupyter/context.ipynb Demonstrates context alignment evaluation:
  • LLM judge configuration
  • Per-interaction scoring
  • Insight analysis
  • Average score calculation
cd context/jupyter
uv pip install "$(ls ../../dist/*.whl)[context]" langchain-groq
jupyter lab context.ipynb
Key Features:
# Initialize judge
judge_model = ChatGroq(
    model="llama-3.3-70b-versatile",
    api_key=GROQ_API_KEY,
    temperature=0.0,
)

# Run context evaluation
metrics = Context.run(
    LocalRetriever,
    model=judge_model,
    use_structured_output=True,
)

Conversational

Location: examples/conversational/jupyter/conversational.ipynb Demonstrates dialogue quality evaluation:
  • Grice’s Maxims scoring
  • Memory and language assessment
  • Radar chart visualization
  • Per-maxim distribution analysis
cd conversational/jupyter
uv pip install "$(ls ../../dist/*.whl)[conversational]" langchain-groq matplotlib
jupyter lab conversational.ipynb
Key Features:
# Run conversational evaluation
metrics = Conversational.run(
    LocalRetriever,
    model=judge_model,
    use_structured_output=True,
)

# Visualize with radar chart
categories = ['Quality', 'Quantity', 'Relation', 'Manner', 'Memory', 'Language', 'Sensibleness']

Humanity

Location: examples/humanity/jupyter/humanity.ipynb Demonstrates emotional analysis:
  • NRC Emotion Lexicon analysis
  • Emotional entropy calculation
  • Spearman correlation with ground truth
  • Emotion distribution visualization
cd humanity/jupyter
uv pip install "$(ls ../../dist/*.whl)[humanity]" matplotlib
jupyter lab humanity.ipynb
Key Features:
# Run humanity evaluation (no LLM required)
metrics = Humanity.run(
    LocalRetriever,
    verbose=True,
)

# Analyze 8 emotions
emotions = ["anger", "anticipation", "disgust", "fear",
            "joy", "sadness", "surprise", "trust"]

BestOf

Location: examples/bestof/jupyter/bestof.ipynb Demonstrates tournament comparison:
  • Multi-assistant dataset setup
  • Elimination rounds
  • Winner determination
  • Tournament bracket visualization
cd bestof/jupyter
uv pip install "$(ls ../../dist/*.whl)[bestof]" langchain-groq matplotlib
jupyter lab bestof.ipynb
Key Features:
# Create multi-assistant retriever
class BestOfRetriever(Retriever):
    def load_dataset(self) -> list[Dataset]:
        # Return datasets from multiple assistants
        pass

# Run tournament
metrics = BestOf.run(
    BestOfRetriever,
    model=judge_model,
    criteria="Overall response quality, helpfulness, and clarity",
)

print(f"Tournament Winner: {metrics[0].bestof_winner_id}")

Generators

Location: examples/generators/jupyter/generators_groq.ipynb Demonstrates synthetic dataset generation:
  • Markdown loading and chunking
  • Sequential and random sampling strategies
  • Conversation mode generation
  • Seed examples for guided generation
cd generators/jupyter
uv pip install "$(ls ../../dist/*.whl)[generators]" langchain-groq
jupyter lab generators_groq.ipynb
Key Features:
# Create generator
generator = BaseGenerator(model=model, use_structured_output=True)

# Generate with random sampling
strategy = RandomSamplingStrategy(num_samples=3, chunks_per_sample=5)
datasets = await generator.generate_dataset(
    context_loader=loader,
    source="./docs.md",
    selection_strategy=strategy,
    conversation_mode=True,
)

Runners

Location: examples/runners/jupyter/runners.ipynb Demonstrates test execution:
  • AlquimiaRunner configuration
  • Single batch and full dataset execution
  • Local storage integration
  • Complete pipeline example
cd runners/jupyter
uv pip install "$(ls ../../dist/*.whl)[runners]"
jupyter lab runners.ipynb
Key Features:
# Setup runner
runner = AlquimiaRunner(
    base_url=os.getenv("ALQUIMIA_URL"),
    api_key=os.getenv("ALQUIMIA_API_KEY"),
    agent_id=os.getenv("AGENT_ID"),
    channel_id=os.getenv("CHANNEL_ID"),
)

# Execute dataset
updated_dataset, summary = await runner.run_dataset(dataset)

Creating a Custom Retriever

All notebooks use a local retriever pattern:
import json
from pathlib import Path
from fair_forge.core.retriever import Retriever
from fair_forge.schemas.common import Dataset

class LocalRetriever(Retriever):
    def load_dataset(self) -> list[Dataset]:
        dataset_path = Path("../data/dataset.json")
        with open(dataset_path) as f:
            data = json.load(f)
        return [Dataset.model_validate(d) for d in data]

Next Steps