How the credential
is built and graded.
The credential is only as meaningful as the rigour behind it. This page documents how the test site is constructed, how submissions are scored, and the design decisions that prevent gaming.
The test site
A fully built Next.js 15 e-commerce store called Lumora — 18 products across 5 categories, blog, search, cart, structured data, the full surface area of a production site. Built and configured the way a real engineering team would build and configure it.
60+ SEO defects are planted across every layer: rendered HTML, response headers, structured data, internal linking, robots/sitemap, render-vs-source drift, schema-vs-content mismatches, and patterns specifically designed to evade casual auditing. The defects are reviewed and revised on a scheduled cadence so the test doesn't decay into a memorisation exercise.
The master answer key
Every planted defect has an entry on a private master answer key with: a stable ID, a one-line canonical description, the page surface(s) it appears on, the severity tier (basic / intermediate / advanced / guru), and a list of acceptable phrasings a candidate might use to describe it.
The key is not published. Spoiler-bearing copy is excluded from every public marketing surface — including this page. What is on the key is reviewable internally; what is on the key is not reviewable by candidates.
Why crawlers are blocked
The test site denies known bot user-agents and the IP ranges of major audit suites at the network level — Screaming Frog, Sitebulb, Ahrefs Site Audit, Semrush, GPTBot, ClaudeBot, Googlebot, and the rest are blocked. This is intentional and structural, not a polite robots.txt request.
The reason is anti-cheating. If candidates could pipe the test site through GPT-4 or paste in a Sitebulb report, the credential would measure tool ownership, not auditing skill. Closing the site to automation forces candidates to audit the way real auditors do when the tools fail: by reading the page, opening dev tools, and noticing.
The marketing site (this page) remains fully open to search engines so the credential can become discoverable. The two sites are separate deployments.
How grading works
Submissions are graded against the master answer key by an AI grader (Anthropic Claude) operating against a published rubric. For each finding in the submission, the grader decides which master-list entry it matches and at what confidence level.
- HighThe finding clearly describes the same issue. A senior SEO comparing the two would say "yes, that's the same thing" without hesitation. Counts toward score.
- MediumThe finding strongly suggests the master issue but is not perfectly specific. A senior SEO would nod yes; a strict grader might want more detail. Counts toward score.
- LowThe finding is too vague, generic, or off-target to credit. "The metadata could be better." "The site has issues." Does not count.
Final score is the count of unique high- and medium-confidence matches. Tier follows from the score plus the kinds of patterns the candidate caught. A candidate who finds 30 basic issues but no advanced or guru patterns does not earn Specialist; the tier weights pattern depth, not pattern count.
Why AI grading, not human panels
Consistency. A panel of human graders drift over time and across reviewers — the same submission graded by two senior SEOs returns two different scores. An AI grader applied to a fixed master key against a published rubric returns the same answer for the same submission. The judgement was made up-front by humans (in the master key); the grader executes it.
We use Anthropic Claude. Prompts and master-key entries are not used to train the model (per Anthropic API policy). The grader receives only the candidate's submission text and the master key — never the candidate's name, email, or other PII.
Anti-gaming measures
- Automation blockThe test site denies known bot user-agents and audit-suite IP ranges. Crawler-piping doesn't work.
- Specificity requiredFindings must be specific enough to map to one master-list entry. Bulk noise from automated tools fails to match.
- Conservative graderLow-confidence matches don't count. The grader is tuned for false-negatives over false-positives.
- No published keyThe master answer key is not visible to candidates. The repository surface that holds it is not served on any public route.
- No miss disclosureScorecards return tier, score, and headroom — never the list of issues a candidate missed. Past scorecards do not become study guides.
- Cooldown24-hour cooldown between attempts. Submissions are timestamped and tier history is kept.
- Pattern revisionPlanted issues are reviewed and revised on a scheduled cadence so circulating findings lose value over time.
Tier calibration
Tier thresholds (8 / 18 / 30 / 42) are calibrated against the master key so they map to industry seniority levels:
- ApprenticeJunior SEO competence. Can audit obvious surface issues with a methodical pass.
- PractitionerMid-level SEO competence. Reads underlying structure, connects symptoms to causes.
- SpecialistSenior SEO competence. Reads past the rendered page; reconciles validators.
- GuruMastery-level. Reconciles every layer of a page against every other; finds patterns built to evade casual auditing.
Thresholds are reviewed against submission distributions on a scheduled basis. Any change to thresholds is grandfathered — already-issued credentials remain valid at the tier they were earned at.
Open questions
We're still calibrating. As submission volume grows, we'll publish anonymised grader-vs-human-reviewer agreement data on this page. If you're a senior SEO who'd like to spot-check a sample of graded submissions for an external sanity check, get in touch.
Now you know how it works.
The credential is rigorous because the constraints are rigorous: manual audit only, conservative grading, undisclosed answer key. Sign up and find out what tier you earn.