Conversational Metric
The Conversational metric evaluates dialogue quality using Grice’s Maxims - principles of cooperative conversation that define effective communication.Overview
The metric assesses seven dimensions:| Dimension | Description | Scale |
|---|---|---|
| Quality Maxim | Truthfulness and evidence-based responses | 0-10 |
| Quantity Maxim | Appropriate amount of information | 0-10 |
| Relation Maxim | Relevance to the conversation | 0-10 |
| Manner Maxim | Clarity and organization | 0-10 |
| Memory | Ability to recall previous context | 0-10 |
| Language | Appropriateness of language style | 0-10 |
| Sensibleness | Overall coherence and logic | 0-10 |
Installation
Basic Usage
Parameters
Required Parameters
| Parameter | Type | Description |
|---|---|---|
retriever | Type[Retriever] | Data source class |
model | BaseChatModel | LangChain-compatible judge model |
Optional Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
use_structured_output | bool | False | Use LangChain structured output |
bos_json_clause | str | "```json" | JSON block start marker |
eos_json_clause | str | "```" | JSON block end marker |
verbose | bool | False | Enable verbose logging |
Output Schema
ConversationalMetric
Understanding Grice’s Maxims
Quality Maxim
Be truthful: Don’t say what you believe to be false or lack evidence for.Quantity Maxim
Be informative: Provide enough information, but not more than required.Relation Maxim
Be relevant: Make your contribution relevant to the conversation.Manner Maxim
Be clear: Avoid obscurity and ambiguity. Be brief and orderly.Memory
Recall context: Reference previous parts of the conversation appropriately.Language
Appropriate style: Match language register to the context.Sensibleness
Overall coherence: Response makes logical sense in context.Complete Example
Visualization
Radar Chart
Score Interpretation
| Score Range | Interpretation |
|---|---|
| 8-10 | Excellent - High-quality dialogue |
| 6-8 | Good - Meets expectations with minor issues |
| 4-6 | Moderate - Noticeable quality issues |
| 2-4 | Poor - Significant problems |
| 0-2 | Very Poor - Fails basic criteria |
Best Practices
Include Observations
Include Observations
Add
observation field to guide evaluation:Test Multi-Turn Conversations
Test Multi-Turn Conversations
Include sequences that test memory:
Vary Conversation Types
Vary Conversation Types
Include different interaction styles:
- Factual questions
- Clarification requests
- Complex multi-part queries
- Follow-up questions