Document OCR that
actually reads the page.
Drop in any TIFF, PDF or scan. BotifyOCR runs a vision-language model on every page, returns clean Markdown, structured JSON and bounding-box overlays you can review side-by-side.
No signup · drag & drop in the browser · 100 free pages / day
- Lungs are clear with no focal consolidation.
- Cardiomediastinal silhouette unremarkable.
- No pleural effusion or pneumothorax.
Everything you need to turn scans into clean, reviewable data.
BotifyOCR was built to replace brittle OCR + regex stacks with a single vision-language pipeline you can audit page-by-page.
Vision-language OCR
A multimodal LLM reads the whole page — not just glyphs — so it handles handwriting, multi-column layouts, tables and stamps.
Bounding-box review
Every token has coordinates. Toggle overlays on the page to verify what was extracted, where, with one click.
Markdown, JSON & text
Get a structured Markdown view, raw text, or strict JSON tokens — whichever your downstream pipeline prefers.
Multi-page batching
PDFs and multi-page TIFFs are split, scheduled and recombined automatically. Pages process in parallel on GPU.
GPU-accelerated
Runs on vLLM with paged attention on a single RTX 4090 / 5090 — typical pages finish in 1-2 seconds.
Self-hostable
Docker-compose, environment-driven, no SaaS lock-in. Keep your documents inside your VPC.
From scanned page to structured data in four steps.
Upload
Drop a TIFF, PDF, PNG or JPG up to 50 MB. Multi-page documents are split client-side for instant feedback.
Detect & route
Each page is normalized, deskewed, and routed to the best pipeline — vision LLM, layout-aware OCR or both.
Extract
A vLLM-served vision-language model reads the page and emits tokens with bounding boxes, reading order and confidence.
Review & export
Review side-by-side with overlays, edit if needed, then export as Markdown, JSON or plain text.
Wherever paper meets pipelines.
Healthcare
Radiology, pathology and lab reports — preserve sections, tables and stamps without manual templating.
Legal & insurance
Scanned contracts, claim forms and court filings extracted as structured Markdown ready for downstream LLMs.
Finance & ops
Invoices, POs and bank statements with line-items, totals and bounding boxes for audit.
Government archives
Decades-old TIFF archives digitized at scale — handwriting, faded ink and dot-matrix prints handled.
Start free. Scale on your terms.
Free
For trying it out and small personal projects.
- 100 pages / day
- Markdown + JSON export
- Browser studio
- Community support
Pro
For teams running OCR at production volume.
- Unlimited pages
- API + webhook access
- Workspace seats
- Priority queue
Self-hosted
For regulated industries that keep data on-prem.
- Docker-compose stack
- vLLM + RTX 4090/5090
- No data egress
- Email support
Ready to see what your scans really say?
Open the studio, drop in a document, and watch the OCR happen in seconds. No signup, no credit card.