Skip to content

Line2Page

Usage

Merges line images with corresponding text-files in page-images and page-xml

Usage: pagetools line2page [OPTIONS]

  Merges line images and line texts into combined images and XML files

Options:
  -c, --creator TEXT              Creator tag for PAGE XML  [default:
                                  PAGETools]
  -s, --source-folder TEXT        Path to images and GT  [required]
  -i, --image-folder TEXT         Path to images  [default: ]
  -gt, --gt-folder TEXT           Path to GT  [default: ]
  -d, --dest-folder TEXT          Path where output gets stored  [default:
                                  /home/ocr4all/merged]
  -e, --ext TEXT                  Image extension  [default: .bin.png]
  -p, --pred                      Sets flag to also include .pred.txt
                                  [default: False]
  -l, --lines INTEGER RANGE       Lines per page  [default: 20;x>=0]
  -ls, --line-spacing INTEGER RANGE
                                  Spacing between lines (in pixel)  [default:
                                  5;x>=0]
  -b, --border INTEGER RANGE...   Border (in pixel): TOP BOTTOM LEFT RIGHT
                                  [default: 10, 10, 10, 10;x>=0]
  --debug [10|20|30|40|50]        Sets the level of feedback to receive:
                                  DEBUG=10, INFO=20, WARNING=30, ERROR=40,
                                  CRITICAL=50  [default: 20]
  --threads INTEGER RANGE         Thread count to be used  [default: 16;x>=1]
  --xml-schema [2017|2019]        Sets the year of the xml-Schema to be used
                                  [default: 2019]
  --help                          Show this message and exit.

Example

INFO

TODO

Info

Please note that each image file has to have the same name as its Ground Truth file.

foo.nrm.png -> foo.gt.txt (& foo.pred.txt)
bar.bin.png -> bar.gt.txt (& bar.pred.txt)