Skip to main content
by Alquimia AI

A comprehensive performance-measurement library for evaluating fairness, toxicity, bias, and conversational quality of AI systems.

Toxicity Analysis
Detect toxic language patterns with demographic profiling and group fairness scoring.
Bias Detection
Measure bias across protected attributes using IBM Granite and LlamaGuard guardians.
Conversational Quality
Evaluate dialogue using Grice’s Maxims for relevance, clarity, and truthfulness.
Test Generation
Generate synthetic test datasets from your documentation with pluggable strategies.
Model Comparison
Run tournament-style evaluations between multiple assistants with LLM-as-judge.
Explainability
Understand model decisions with token-level attribution and attention analysis.

Simple by design

Evaluate any AI system in a few lines of code.

from fair_forge.metrics.humanity import Humanity

metrics = Humanity.run(MyRetriever)

Ready to evaluate your AI?

Install Fair Forge and start measuring in minutes.

Install Fair Forge →