WAF_BLOCKED

The target site's WAF or CDN refused the Onto crawler (origin returned 401 / 403).

CodeHTTP statusRetryable?
WAF_BLOCKED403No

What this means

WAF_BLOCKED is the proxy of an origin-side block. The target site's web application firewall, Cloudflare bot-fight mode, Akamai bot manager, or a similar protection layer returned 401 or 403 to the Onto-Reader user agent. This is the site saying "crawlers stay out" via active enforcement, not via robots.txt. (For the robots.txt case see ROBOTS_BLOCKED — it's a different error.)

When you'll see it

HTTP 403. Body always includes code: "WAF_BLOCKED". Branch on code, never on the human-readable message — wording can change without notice; the code is the stable contract.

Example response

json
{
  "status": "error",
  "code": "WAF_BLOCKED",
  "message": "Target site refused crawler access (origin returned 403)"
}

How to handle

Skip the URL — don't retry with a different user agent or try to bypass the WAF. Many large sites intentionally block all bot traffic, and respecting that block is both ethical and legally safer. If you control the target site, allowlist the Onto-Reader user-agent in your WAF rules. For aggregator use cases, present a clear error to the end user and let them visit the source directly.

Do NOT retry blindly. The origin refused access. Skip the URL and move on.

Suggested handling in a Node client:

ts
if (data.code === 'WAF_BLOCKED') {
  // Origin won't serve us. Skip and move on; don't retry.
  return null;
}
CodeStatusWhat it means
ROBOTS_BLOCKEDHTTP 403The target site's robots.txt explicitly disallows GPTBot or a wildcard user-agent.
URL_NOT_FOUNDHTTP 404The target URL returned 404 — the page doesn't exist on the origin.

See the full error index for the complete catalog with the handling switch statement covering every code at once.