Burning Highlights Into PDF

This highlighting method creates new PDF document on-the-fly, injecting highlights and navigation annotations into the original document. The resulting document can be opened in any PDF viewer and, assuming the viewer supports PDF Annotations feature, term highlights will be shown.

Check live example here.

images/download/attachments/64585788/highlights-burned-in-w600.png

PDF Bookmarks

For all found terms/phrases, PDF bookmarks will be created under the "Highlights" bookmark node. The title of the bookmark will be matching text.

The following options control and can be used to customize bookmark creation:

highlighter.pdf {
bookmarksForMatches {
enabled = true
pathTemplate = "Highlights > {tag} > {match} (pg {page})"
sortNodes = true
}
}

The pathTemplate string defines hierarchical organization of created bookmarks:

  • Bookmark node levels are separated by the > mark surrounded by whitespace.

  • Placeholders {tag}, {match} and {page} will be replaced with query tag, matched text and page number respectively.

  • Any other text will be used as it is.

  • If the node text is empty (e.g. query tagging is not used), the level is omitted and any child nodes are added directly to the parent node.

For the above default bookmark path template, bookmarks to hits (that look like "Solder Pallet (pg 123)") will be grouped in sections named by the tags (if defined), and finally added to the top level node "Highlights".

A tag can be associated to a query using multi-query PDF highlighting requests (submitted as JSON) or using batch highlighting.

images/download/thumbnails/64585788/pdf-bookmarks.png
Example PDF bookmarks with tags and matches

Customizing Navigation Elements

The settings section below lists configuration properties that define the style and text of messages and links that Highlighter adds to PDF documents:

highlighter.pdf.nav {
 
noteFontSize = 12
noteColor = AAAAAA
linkFontSize = 12
linkColor = 0000FF
 
# Note: Placeholder {linkedPage} can be used in link labels
prevLinkText = "< Previous Hit (pg.{linkedPage})"
nextLinkText = "Next Hit (pg.{linkedPage}) >"
firstMatchingPageLinkText = "Go to First Hit (pg.{linkedPage})"
searchMatchingPage = "Page {currentMatchingPage}/{totalMatchingPages} with Hits"
}

To override a property, add it to your application.conf file.

Post-processing PDF

After burning highlights and navigation into PDF, the Highlighter is running PDF through a post-processing phase. By default, this phase includes conversion to a linearized PDF format required for so-called "fast web view", but you may extend calling an external command for additional PDF filtering.

Post-processing options are grouped under the highlighter.pdf.postProcessing config section.

Linearization

Linearized PDF is an optimized PDF format that allows PDF viewers to show the first document page as soon as it's loaded — without having to wait for the whole document to download. PDF Highlighter enables linearization for all PDF files above a certain size (e.g. 100KB). To disable this option or change the threshold, use the following options:

highlighter.pdf.postProcessing.linearizeInternal = true
highlighter.pdf.postProcessing.linearizeInternalMinFileSize = 100k

Running external command

You can run any external (command line) program for additional PDF processing. The command and parameters to pass are listed under the cmd option (that expects an array).

In the example below, we show how to linearize PDF using QPDF tool.

As of Highlighter v1.1, PDF linearization is supported internally, and there is no need to run an external tool to do it.

Adding QPDF to highlighting pipeline

From a system shell, we would use the following command to linearize PDF:

qpdf --linearize input.pdf output.pdf

To do this automatically after PDF is highlighted, add the following to Highlighter's application.conf (on Linux):

highlighter.pdf.postProcessing {
cmd = [ "/usr/bin/qpdf", "--linearize", "{inputFile}", "{outputFile}" ]
}

On Windows it would look just a bit different:

highlighter.pdf.postProcessing {
cmd = [ "C:/qpdf-5.1.2/bin/qpdf", "--linearize", "{inputFile}", "{outputFile}" ]
}

Placeholders {inputFile} and {outputFile} should be used as specified. Highlighter will replace them with temporary file paths.


comments powered by Disqus