ROBOTS_BLOCKED

The target site's robots.txt explicitly disallows GPTBot or a wildcard user-agent.

CodeHTTP statusRetryable?
ROBOTS_BLOCKED403No

What this means

ROBOTS_BLOCKED is the polite version of WAF_BLOCKED. Onto checks `/robots.txt` before issuing the actual fetch — if the site disallows GPTBot or has a wildcard `User-agent: *` with `Disallow: /`, the request short-circuits with ROBOTS_BLOCKED and never burns an outbound fetch. The check costs zero against your quota.

When you'll see it

HTTP 403. Body always includes code: "ROBOTS_BLOCKED". Branch on code, never on the human-readable message — wording can change without notice; the code is the stable contract.

Example response

json
{
  "status": "error",
  "code": "ROBOTS_BLOCKED",
  "message": "robots.txt disallows GPTBot for this host"
}

How to handle

Honor the block. Respecting robots.txt isn't optional — it's the social contract that keeps the web crawler-friendly for everyone. If you're aggregating content for end users, surface the block as "the source has opted out of AI crawling" and let the user click through to the original site. If you own the target site and want Onto to read it, edit `robots.txt` to allow `GPTBot`.

Do NOT retry blindly. The origin refused access. Skip the URL and move on.

Suggested handling in a Node client:

ts
if (data.code === 'ROBOTS_BLOCKED') {
  // The site explicitly opted out. Honor it.
  return null;
}
CodeStatusWhat it means
WAF_BLOCKEDHTTP 403The target site's WAF or CDN refused the Onto crawler (origin returned 401 / 403).
URL_NOT_FOUNDHTTP 404The target URL returned 404 — the page doesn't exist on the origin.

See the full error index for the complete catalog with the handling switch statement covering every code at once.