Batch highlighting tool can process all PDF documents in a directory tree and create new PDFs with built in highlight annotations. Documents can be highlighted for a single query or multiple queries specified in a CSV file.
Run batch highlighter from the command line. Available options will be listed when started without arguments:
--check-file-time highlight only
input PDF or query file is newer
than output file
--delete-input-pdf delete input pdf after highlighting
-i,--input-dir <dir> input folder
-l,--language <lang> language
-nav add navigation links to document
-o,--output-dir <dir> output folder
-q,--query <str> search query with keywords to highlight
-qenc,--charset <file> charset of the query file,
-qf,--query-file <file> file in csv format with keywords to highlight
-t,--threads <num> thread count
Batch highlighter works in offline mode and does not need Highlighter Server running – in fact, the server and the batch tool cannot run at the same time.
Simple Query Highlighting
To highlight documents for a single query, use -q option as in:
batch-pdf-highlight -i D:\input\pdfs -o D:\output-dir -l en -q
"home style price"
The above command will highlight terms home, style, and price using English language rules.
Using Query Files
Using a query file you can highlight all your PDF documents for multiple, possibly hundreds, keywords and phrases at once.
The query file is a plain CSV file with the following columns:
The query is the only required column. Multi term queries will be handled as phrases without need to put them in quotes. Color is desired RGB color of highlight for the query keywords.
Tags can be used to categorize and group keywords. If the color was not specified but the tag is, all queries with the same tag will get assigned the same color. In addition, created PDF bookmarks to highlights will be grouped by tags. (More details about customizing bookmark creation and PDF output can be found here.)
The bookmark column can be used to control PDF bookmark creation on a query level, specifying bookmark path template string. If not defined, the default path template will be used – see PDF bookmark options.
To highlight PDF documents for a query file, use -qf option as in:
batch-pdf-highlight -i D:\input\pdfs -o D:\output-dir -l en -qf queries.csv
The queries CSV file can list queries only, as in:
or, can include colors and tags as well:
To quickly test how PDF highlighting works for your documents and keywords, use our online demo.
The batch highlighter configuration file is named application.conf and is loaded from the <highlighter>/conf/batch/ directory. This config file does not exist by default, but the directory contains the file application.conf.sample that contains the most commonly used options.
Batch highlighter creates a log of all handled files, listing any issue or error that occurred. The default path to the log file is <highlighter>/logs/batch-highlighter.log.
The file is overriden every time the batch tool runs.
comments powered by Disqus