IMAGE_PDF
The URL resolved to a PDF that contains only scanned images — no machine-readable text to extract.
| Code | HTTP status | Retryable? |
|---|---|---|
| IMAGE_PDF | 422 | No |
What this means
IMAGE_PDF fires when /v1/read or /v1/extract resolves to a PDF, but the PDF holds only scanned or photographed pages with no embedded text layer. Onto extracts text from PDFs deterministically — no OCR, no vision — so an image-only PDF yields nothing to return. The request is automatically refunded and never counts against your quota.
When you'll see it
HTTP 422. Body always includes code: "IMAGE_PDF". Branch on code, never on the human-readable message — wording can change without notice; the code is the stable contract.
Example response
{
"status": "error",
"code": "IMAGE_PDF",
"message": "PDF contains only scanned images; no extractable text"
}How to handle
Run the PDF through OCR upstream so it carries a real text layer, or read the original HTML source if one exists. Onto deliberately does no OCR or vision — it stays a deterministic text primitive. If you control the document pipeline, export PDFs with selectable text rather than flattened images.
Suggested handling in a Node client:
if (data.code === 'IMAGE_PDF') {
// Image-only PDF — nothing to extract (and you were refunded). OCR upstream if needed.
return null;
}Related errors
| Code | Status | What it means |
|---|---|---|
| EXTRACTION_FAILED | HTTP 500 | The cleaning engine errored mid-process — rare, usually transient. |
| URL_NOT_FOUND | HTTP 404 | The target URL returned 404 — the page doesn't exist on the origin. |
See the full error index for the complete catalog with the handling switch statement covering every code at once.