Skip to content

Get Codec

Usage

Usage: pagetools get-codec [OPTIONS] FILES...

  Retrieves codec of PAGE XML files.

Options:
  -l, --level [region|line|word|glyph]
                                  [default: line]
  -idx, --index INTEGER           Considers only text from TextEquiv elements
                                  with a certain index.
  -mc, --most-common INTEGER      Only prints n most common entries. Shows all
                                  by default.
  -o, --output TEXT               File to which results are written.
  -rw, --remove-whitespace
  -of, --output-format [json|csv|txt]
                                  Available result formats.
  -freq, --frequencies            Outputs character frequencies.
  -nu, --normalize-unicode [NFC|NFD|NFKC|NFKD]
                                  Normalize unicode for both rules and PAGE
                                  XML tests.
  --text-output-newline           Inserts new line after every character in
                                  txt output. Only applies when frequencies
                                  aren't output.
  --verbose / --silent            Choose between verbose or silent output.
  --help                          Show this message and exit.

Example

INFO

TODO