📄

document-ocr

idle

SIE: ...

checking SIE...

OCR is rarely a single-model problem. This demo runs three model classes through one SIE server: a VLM-OCR recognizes the document into Markdown, a fine-tuned Donut emits a JSON tree directly, and a zero-shot NER (GLiNER) pulls typed fields out of the recognition output. Pick a sample on the left, swap any of the three models in the dropdowns, watch SIE hot-swap them with one identifier change.

image

↓

one SIE server · client.extract(model_id, item)

↓↓↓

VLM-OCR
(LightOnOCR-2-1B, PaddleOCR-VL, GLM-OCR)

Donut
(end-to-end JSON)

GLiNER
(zero-shot NER)

Why SIE

Three different model architectures (a vision-language model, a fine-tuned encoder-decoder, a span-based NER), one inference engine, one HTTP API, one SDK call. Without SIE, this demo would be three separate inference services with three SDKs, three auth flows, three rate limits. With SIE, swap a string in client.extract(...) and the underlying architecture changes.

Try these moments

Click any sample on the left. All three models run in one pipeline. The footer prints per-stage timings as each one lands.
Open "See the SIE call" in any panel, then swap the model dropdown above. The snippet updates with the one parameter that changed. That is the swap-a-string pitch in action.
Click the receipt, then the multi-column page. Donut (fine-tuned on receipts) dominates the first; recognition dominates the second. Same pipeline, different model wins.
Switch NER from gliner_multi to gliner_large. Same labels, same input text, different confidence scores. Model quality is a single dropdown away.

Sample documents

Recognition (Markdown)

See the SIE call

// pick a recognition model in the dropdown

Click a sample on the left.

Extraction

See the SIE calls

// structured (Donut)

// NER (GLiNER)

Typed fields will appear here.