← Back to Home

OCR Reader – Documentation

Overview

OCR Reader is a Windows desktop application for extracting text from PDFs and images using OCR (Tesseract engine). It supports template-driven parsing and exports structured data into CSV or TXT files.

Installation

1. Download package

Download the latest release package containing:

2. Folder structure

OCR_Reader/
 ├── OCR_Reader.exe
 ├── config.yaml
 ├── tools/
 │    ├── tesseract/
 │    │     └── tesseract.exe
 │    └── poppler/
 └── input/

3. First run

Run OCR_Reader.exe. On first start the application will load configuration from config.yaml.

Requirements

Configuration (YAML)

The application is configured using a config.yaml file.

Example configuration

tesseract_path: tools/tesseract/tesseract.exe
poppler_path: tools/poppler/bin

input_folder: input
output_folder: output

output_format: csv   # csv | txt

language: eng

batch_mode: true

template:
  name: invoice_template
  fields:
    invoice_number:
      type: text
      pattern: "Invoice No:\\s*(.*)"
    date:
      type: text
      pattern: "Date:\\s*(.*)"
    total:
      type: number
      pattern: "Total:\\s*([0-9.,]+)"

YAML Parameters Explained

Paths

Processing

Template section

Field properties

How it works

  1. Load PDF or image
  2. Convert to image (Poppler if needed)
  3. Run OCR (Tesseract)
  4. Apply template regex rules
  5. Export structured output

Output Example

invoice_number, date, total
INV-1023, 2026-04-10, 1250.50

Support

For custom templates or integration help, contact:
nezval.software@gmail.com

← Back to Home