AI Demo Evaluation Worksheet
Score each run from 1 (needs work) to 5 (excellent).
| Dimension | What good looks like |
|---|---|
| Clarity | Anyone can understand the answer quickly. |
| Usefulness | The output leads to a concrete next action. |
| Accuracy | The response aligns with source facts and context. |
| Trust | Limits, assumptions, and risks are clearly disclosed. |
| Consistency | Similar inputs produce stable, reliable outputs. |
Debrief prompts
- What surprised technical reviewers?
- What confused non-technical reviewers?
- What change would improve confidence the most?