OCR Reader – Documentation

Overview

OCR Reader is a Windows desktop application for extracting text from PDFs and images using OCR (Tesseract engine). It supports template-driven parsing and exports structured data into CSV or TXT files.

Installation

1. Download package

Download the latest release package containing:

OCR_Reader.exe
tesseract folder
poppler folder
config.yaml example

2. Folder structure

OCR_Reader/
 ├── OCR_Reader.exe
 ├── config.yaml
 ├── tools/
 │    ├── tesseract/
 │    │     └── tesseract.exe
 │    └── poppler/
 └── input/

3. First run

Run OCR_Reader.exe. On first start the application will load configuration from config.yaml.

Requirements

Windows 10 / 11
Tesseract OCR (bundled or external)
Poppler (for PDF processing)

Configuration (YAML)

The application is configured using a config.yaml file.

Example configuration

tesseract_path: tools/tesseract/tesseract.exe
poppler_path: tools/poppler/bin

input_folder: input
output_folder: output

output_format: csv   # csv | txt

language: eng

batch_mode: true

template:
  name: invoice_template
  fields:
    invoice_number:
      type: text
      pattern: "Invoice No:\\s*(.*)"
    date:
      type: text
      pattern: "Date:\\s*(.*)"
    total:
      type: number
      pattern: "Total:\\s*([0-9.,]+)"

YAML Parameters Explained

Paths

tesseract_path – path to tesseract.exe
poppler_path – path to Poppler binaries (PDF rendering)
input_folder – folder with input files (PDF/images)
output_folder – output directory for results

Processing

output_format – output type (csv or txt)
language – OCR language (e.g. eng, deu, ces)
batch_mode – enables processing of multiple files

Template section

template.name – name of active extraction template
fields – dictionary of extracted values

Field properties

type – data type (text, number)
pattern – regex used to extract value from OCR text

How it works

Load PDF or image
Convert to image (Poppler if needed)
Run OCR (Tesseract)
Apply template regex rules
Export structured output

Output Example

invoice_number, date, total
INV-1023, 2026-04-10, 1250.50

Support

For custom templates or integration help, contact:
nezval.software@gmail.com

← Back to Home