Text Analysis

When highlighting a document for a search query, Highlighter extracts and compares words found in the document's text against terms and phrases in the search query.

Text analysis can be generic or language specific, depending on the "language" parameter passed to Highlighter. Language specific rules help match different forms of the same word (stemming). Text analysis rules can be customized to support synonyms.

Internally, for text analysis, Highlighter uses Apache Solr. Text analysis rules are defined in Solr's schema.xml config file located in highlighter's conf/search/config-lib/Default/conf directory. Highlighter selects rules to use from schema field named "text_general" when no language is selected or, when language parameter is provided, from a field name "text_<language>" — for example "text_en" for "en" (English).


comments powered by Disqus