Exercise 2: The Comparison
Take a piece of AI-generated output in your domain. This should be something real—not a hypothetical. Something you've actually generated using an AI tool, or something you could realistically generate given your role.
Now take your own best work on a similar task from the recent past. Something you're proud of. Something that represents your actual quality when you're doing well.
Evaluate both pieces against the same criteria. Use the rubric you're building (or if you haven't built it yet, identify four to six criteria that matter in your domain, then apply them). For each criterion, ask: Does the AI output meet this standard? Does my work? Where's the gap?
Write up what the comparison reveals. What does AI's output do well? Where does it typically fall short? Where does your own work fall short? What does this tell you about where AI can be trusted to work independently in your domain and where it needs human direction?
The point isn't to judge AI or to defend your work. It's to be honest about what each does well and where the gaps are. This clarity is essential for being a good director of AI work.
Module Deliverable: Your Evaluation Rubric
Write a rubric for the type of work you most frequently produce or oversee. Your rubric should be specific enough that someone unfamiliar with your domain could apply it consistently, and honest enough that it would actually distinguish excellent work from adequate work.
Your rubric should include:
1. The Domain: What type of work is this rubric for? Be specific. Not "writing" but "persuasive business proposals" or "product documentation" or "email communication." Not "design" but "UI design for data tools" or "brand identity work."
2. Four to Eight Criteria: These should be observable, testable qualities that distinguish excellent work from adequate work. Each criterion should have a short title and a 1-2 sentence explanation of what it means.
3. Three Levels of Quality: For each criterion, describe what excellent, adequate, and poor performance looks like. You don't need to write a paragraph for each—a sentence or two per level is sufficient. But be specific enough that someone could look at a piece of work and place it on that spectrum.
4. An Explanation of Choices: A brief section (2-3 paragraphs) explaining where these criteria came from. What examples of excellent work informed your choices? Where did you notice the most important gaps between good and adequate? Are there any criteria you considered but didn't include, and why?
The rubric should be practical. Someone should be able to use it to evaluate work in your domain within a reasonable amount of time—not "this would take hours to apply" but "I can use this to give consistent, useful feedback."
How to Build It
Start by identifying examples. Find three pieces of genuinely excellent work in your domain. Find three pieces of adequate work. Compare them. What's actually different? Write those differences down as observations, not as judgments. From those observations, build your criteria.
Test your rubric by applying it to work you didn't produce. Does it give you consistent results? Can you explain why a piece of work rated highly or lowly using this rubric?
Refine it. Adjust criteria that were too vague. Add criteria you realize matter but didn't capture. Remove criteria that don't actually distinguish quality.
Remember: A rubric won't capture all of what makes work excellent. Some of that is context-specific, some is judgment-based, some is genuinely intuitive. But a good rubric captures the core. It makes standards visible. It makes feedback specific. That's the goal.
Submission
Your rubric should be 2-3 pages including the criteria, the quality levels, and your explanation of choices. It should be specific enough to your domain that it actually means something—not so generic that it could apply to any work, but not so narrow that it only applies to one specific type of task.
This rubric becomes your reference standard. You'll use it going forward to evaluate work in your domain—to give feedback, to recognize where you and AI outputs need development, to guide your own work toward excellence. It's the written version of the taste and judgment you're developing.