Filtering Text Field Searches

Basic Customization > User Interface Customization > Windchill Search Customization > Customizing Solr > Filtering Text Field Searches

For more detailed information about analyzers, filters and tokenizers, see the following link:

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters

Every text field uses the com.ptc.solr.analysis.PTCWordDelimiterFilterFactory filter. This filter splits words into subwords and performs optional transformations on subword groups. Words are split into subwords using the following rules:

Rule

Example

Split on intra-word delimiters (by default, all non alpha-numeric characters)

"Wi-Fi" splits into "Wi" and "Fi"

Split on case transitions.

"TransAM" splits into "Trans" and "AM"

Leading and trailing intra-word delimiters on each subword are ignored.

"__hello---there, 'dude'" splits into "hello", "there", and "dude"

Trailing "'s" characters are removed for each subword.

This step is not performed in a separate filter because of possible subword combinations.

"O'Neil's" splits into "O" and "Neil".

This filter is a replica of solr.WordDelimiterFilter, which is shipped with Solr. It has been customized to protect the following characters: ".", "-" and "_"

Splitting is affected by the following parameters:

• generateWordParts=1

Parts of words are generated: "whistle-blower" = "whistle" "blower"

• generateNumberParts=1

Number subwords are generated: "500-42" = "500" "42"

• catenateWords=1

Maximum runs of word parts are catenated: "re-confirm" = "reconfirm"

• catenateNumbers=1

Maximum runs of number parts are catenated: "500-42" = "50042"

• catenateAll=1

All subword parts are catenated: "wi-fi-4000" = "wifi4000"

• splitOnCaseChange=1

Split on case transitions: “PowerShot” = "Power" "Shot"

• preserveOriginal=1

Includes original words in subwords: "500-42" = "500" "42" "500-42"

The com.ptc.solr.analysis.PTCSpecialCharacterFilterFactory filter is also used. This filter creates sub-tokens for tokens that end with PTC protected special characters. Currently there are only three protected special characters:

• dot or period (.)

• dash (-)

• underscore (_)

Sub-tokens are created with the following rules:

Rule	Example
Tokens ending with a period(.)	"dot." = "dot.", "dot"
Tokens ending with a dash (-)	"dash-" = "dash-", "dash"
Tokens ending with an underscore (_)	"under_" = "under_", "under"

Ensure that the same order of tokenizers is maintained at indexing and query time. Tokens generated at query time should be the same as when indexing for a given word.

Stop Words

The words mentioned in $solr-home\wblib\conf\stopwords.txt are not indexed. These words should be words that a user would not enter in a meaningful search. For example, “if” or “not”. To include these words in searches, remove them from stopwords.txt.

For English, the text field is used and is configured using the StopFilterFactory filter.

Synonyms

The synonym entries in $solr-home\wblib\conf\synonyms.txt ensure that searching on one word can find records with synonymous words. You can edit this file to enter or remove synonyms.

The SynonymFilterFactory filter is configured for English text fields.

autoCommit

Windchill uses the Solr auto commit feature to commit the index information automatically after certain criteria is met.

You can configure autoCommit in solrconfig.xml.

This criteria is specified under the following element:

maxDocs

maxDocs is the maximum uncommited Windchill business object documents before autocommit triggered

maxTime

maxTime is the maximum time (in milliseconds) after adding a Windchill business object document before an autocommit event is triggered

Indexed searches perform better when the maxTime and maxDocs values are higher.

However, an object does not appear in search results unless the index information is committed.

Use higher values when you run bulk indexing.

Enabling Alphanumeric Splits

By default, Windchill search does not tokenize alphanumeric transitions. For example, the string “ABC123” is indexed as “ABC123.”

You can customize Solr to enable alphanumeric splits. When enabled, the string “ABC123” is indexed as the following:

ABC123

ABC

123

To enable alphanumeric splitting, perform the following actions:

1. Stop Windchill.

2. Navigate to the following file:

/solr-home/wblib/conf/conf_generic_field_types.xml

3. Locate all instances of the following: splitOnNumerics="0"

And replace with the following: splitOnNumerics="1"

4. Restart Windchill.

5. Once Windchill is restarted, re-index data using the Bulk Index Tool.