WAF_BLOCKED
The target site's WAF or CDN refused the Onto crawler (origin returned 401 / 403).
| Code | HTTP status | Retryable? |
|---|---|---|
| WAF_BLOCKED | 403 | No |
What this means
WAF_BLOCKED is the proxy of an origin-side block. The target site's web application firewall, Cloudflare bot-fight mode, Akamai bot manager, or a similar protection layer returned 401 or 403 to the Onto-Reader user agent. This is the site saying "crawlers stay out" via active enforcement, not via robots.txt. (For the robots.txt case see ROBOTS_BLOCKED — it's a different error.)
When you'll see it
HTTP 403. Body always includes code: "WAF_BLOCKED". Branch on code, never on the human-readable message — wording can change without notice; the code is the stable contract.
Example response
{
"status": "error",
"code": "WAF_BLOCKED",
"message": "Target site refused crawler access (origin returned 403)"
}How to handle
Skip the URL — don't retry with a different user agent or try to bypass the WAF. Many large sites intentionally block all bot traffic, and respecting that block is both ethical and legally safer. If you control the target site, allowlist the Onto-Reader user-agent in your WAF rules. For aggregator use cases, present a clear error to the end user and let them visit the source directly.
Suggested handling in a Node client:
if (data.code === 'WAF_BLOCKED') {
// Origin won't serve us. Skip and move on; don't retry.
return null;
}Related errors
| Code | Status | What it means |
|---|---|---|
| ROBOTS_BLOCKED | HTTP 403 | The target site's robots.txt explicitly disallows GPTBot or a wildcard user-agent. |
| URL_NOT_FOUND | HTTP 404 | The target URL returned 404 — the page doesn't exist on the origin. |
See the full error index for the complete catalog with the handling switch statement covering every code at once.