IMAGE_PDF

The URL resolved to a PDF that contains only scanned images — no machine-readable text to extract.

CodeHTTP statusRetryable?
IMAGE_PDF422No

What this means

IMAGE_PDF fires when /v1/read or /v1/extract resolves to a PDF, but the PDF holds only scanned or photographed pages with no embedded text layer. Onto extracts text from PDFs deterministically — no OCR, no vision — so an image-only PDF yields nothing to return. The request is automatically refunded and never counts against your quota.

When you'll see it

HTTP 422. Body always includes code: "IMAGE_PDF". Branch on code, never on the human-readable message — wording can change without notice; the code is the stable contract.

Example response

json
{
  "status": "error",
  "code": "IMAGE_PDF",
  "message": "PDF contains only scanned images; no extractable text"
}

How to handle

Run the PDF through OCR upstream so it carries a real text layer, or read the original HTML source if one exists. Onto deliberately does no OCR or vision — it stays a deterministic text primitive. If you control the document pipeline, export PDFs with selectable text rather than flattened images.

Do NOT retry blindly. Check the handling guidance above before retrying.

Suggested handling in a Node client:

ts
if (data.code === 'IMAGE_PDF') {
  // Image-only PDF — nothing to extract (and you were refunded). OCR upstream if needed.
  return null;
}
CodeStatusWhat it means
EXTRACTION_FAILEDHTTP 500The cleaning engine errored mid-process — rare, usually transient.
URL_NOT_FOUNDHTTP 404The target URL returned 404 — the page doesn't exist on the origin.

See the full error index for the complete catalog with the handling switch statement covering every code at once.