Context Metric
The Context metric evaluates how well an AI assistant’s responses align with the provided system context and instructions.Overview
This metric uses an LLM as a judge to assess:- Context Awareness: How well the response follows the given context (0-1 scale)
- Context Insight: Explanation of the alignment assessment
- Context Thinkings: The judge’s reasoning process
Installation
Basic Usage
Parameters
Required Parameters
| Parameter | Type | Description |
|---|---|---|
retriever | Type[Retriever] | Data source class |
model | BaseChatModel | LangChain-compatible judge model |
Optional Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
use_structured_output | bool | False | Use LangChain structured output |
bos_json_clause | str | "```json" | JSON block start marker |
eos_json_clause | str | "```" | JSON block end marker |
verbose | bool | False | Enable verbose logging |
Output Schema
ContextMetric
Interpretation
Context Awareness Score
| Score Range | Interpretation |
|---|---|
| 0.8 - 1.0 | Excellent alignment - response fully follows context |
| 0.6 - 0.8 | Good alignment - mostly follows context with minor deviations |
| 0.4 - 0.6 | Moderate alignment - partially follows context |
| 0.2 - 0.4 | Poor alignment - significant deviations from context |
| 0.0 - 0.2 | Very poor - response ignores or contradicts context |
Example Insights
Complete Example
LLM Provider Options
Groq
OpenAI
Anthropic
Ollama (Local)
Structured vs Non-Structured Output
Structured Output
Uses LangChain’s structured output feature (recommended):Non-Structured Output
Uses regex extraction from JSON blocks:Best Practices
Provide Clear Context
Provide Clear Context
Include specific, actionable instructions in your context:
Include Ground Truth
Include Ground Truth
Provide
ground_truth_assistant for better evaluation:Use Temperature 0
Use Temperature 0
Set
temperature=0 for consistent, deterministic judgments:Choose Appropriate Model
Choose Appropriate Model
Larger models provide better judgment quality:
- Production: GPT-4, Claude-3, Llama-3-70B
- Testing: Llama-3-8B, GPT-3.5