The Bias metric detects bias in AI responses across protected attributes using guardian models like LlamaGuard or IBM Granite. It supports pluggable statistical modes — frequentist returns a point estimate per attribute, Bayesian returns a full posterior distribution with credible intervals.
Returns a point estimate for each attribute’s bias rate — simply k_biased / n_samples.
Copy
from fair_forge.statistical import FrequentistModemetrics = Bias.run( MyRetriever, guardian=LLamaGuard, config=guardian_config, statistical_mode=FrequentistMode(), # default)for rate in metrics[0].attribute_rates: print(f"{rate.protected_attribute}: {rate.rate:.3f}") # rate.ci_low and rate.ci_high are None
Best for large datasets where a point estimate is sufficient.
Uses a Beta-Binomial posterior to model the true bias rate. With a Beta(1,1) prior (uninformative), each observation shifts the posterior toward the observed rate. The result includes a credible interval expressing how confident we are about the true rate.
Best for small datasets where understanding uncertainty matters — a wide CI signals that more data is needed before drawing conclusions.
Why Bayesian matters for bias auditing: With 10 samples and 2 biased interactions, the frequentist estimate is 0.20. The Bayesian CI might be [0.03, 0.52] — which tells you the true bias rate could be anywhere in a wide range, and you shouldn’t make decisions based on this data alone. With 200 samples and 40 biased, the CI narrows to [0.15, 0.26], giving much stronger evidence.
class AttributeBiasRate(BaseModel): protected_attribute: str # "gender", "race", etc. n_samples: int # Total interactions evaluated k_biased: int # Interactions flagged as biased rate: float # Bias rate (mean for Bayesian, proportion for frequentist) ci_low: float | None # Lower credible bound — only set in Bayesian mode ci_high: float | None # Upper credible bound — only set in Bayesian mode
class GuardianInteraction(BaseModel): is_biased: bool # Whether bias was detected attribute: str # Which attribute was checked certainty: float # Confidence in the assessment qa_id: str # ID of the Q&A interaction
for attribute, interactions in metric.guardian_interactions.items(): biased = [i for i in interactions if i.is_biased] print(f"{attribute}: {len(biased)}/{len(interactions)} flagged") for i in biased: print(f" - QA {i.qa_id} (certainty={i.certainty:.2f})")