IMAGE_PDF

The URL resolved to a PDF that contains only scanned images — no machine-readable text to extract.

Code	HTTP status	Retryable?
IMAGE_PDF	422	No

What this means

IMAGE_PDF fires when /v1/read or /v1/extract resolves to a PDF, but the PDF holds only scanned or photographed pages with no embedded text layer. Onto extracts text from PDFs deterministically — no OCR, no vision — so an image-only PDF yields nothing to return. The request is automatically refunded and never counts against your quota.

When you'll see it

HTTP 422. Body always includes code: "IMAGE_PDF". Branch on code, never on the human-readable message — wording can change without notice; the code is the stable contract.

Example response

json

{
  "status": "error",
  "code": "IMAGE_PDF",
  "message": "PDF contains only scanned images; no extractable text"
}

How to handle

Run the PDF through OCR upstream so it carries a real text layer, or read the original HTML source if one exists. Onto deliberately does no OCR or vision — it stays a deterministic text primitive. If you control the document pipeline, export PDFs with selectable text rather than flattened images.

⚠

Do NOT retry blindly. Check the handling guidance above before retrying.

Suggested handling in a Node client:

if (data.code === 'IMAGE_PDF') {
  // Image-only PDF — nothing to extract (and you were refunded). OCR upstream if needed.
  return null;
}

Code	Status	What it means
EXTRACTION_FAILED	HTTP 500	The cleaning engine errored mid-process — rare, usually transient.
URL_NOT_FOUND	HTTP 404	The target URL returned 404 — the page doesn't exist on the origin.

See the full error index for the complete catalog with the handling switch statement covering every code at once.

IMAGE_PDF

What this means

When you'll see it

Example response

How to handle

Related errors