Auditing AI response quality

Updated June 14, 2026

Where to find it

Navigate to Reports → Quality to audit the quality of AI responses.

Confidence score histogram

A bar chart groups all AI messages into 0.1-wide confidence buckets (0.0–0.1, 0.1–0.2, … 0.9–1.0). A healthy agent should have most messages in the 0.7–1.0 range. A spike in lower buckets indicates the agent is frequently uncertain — consider adding more knowledge or raising the system prompt guidance.

Low-confidence conversations

A table lists every conversation where at least one AI reply fell below that agent's configured confidence threshold. Columns show the customer, agent, outcome, date, and the lowest confidence score seen. Use this to find conversations worth reviewing to improve knowledge or agent instructions. Export as CSV for offline review.

Tool call usage

A ranked list shows which tools (knowledge search, CRM lookup, Zendesk actions, etc.) were called most often in the period — useful for understanding agent behaviour and identifying over- or under-used capabilities.

Agent filter

Use the agentId URL parameter (set automatically when clicking an agent row on the Agent Performance page) to scope all Quality charts to a single agent.