|
For more detailed information about analyzers, filters and tokenizers, see the following link:
|
Rule
|
Example
|
||
Split on intra-word delimiters (by default, all non alpha-numeric characters)
|
"Wi-Fi" splits into "Wi" and "Fi"
|
||
Split on case transitions.
|
"TransAM" splits into "Trans" and "AM"
|
||
Leading and trailing intra-word delimiters on each subword are ignored.
|
"__hello---there, 'dude'" splits into "hello", "there", and "dude"
|
||
Trailing "'s" characters are removed for each subword.
|
"O'Neil's" splits into "O" and "Neil".
|
Rule
|
Example
|
Tokens ending with a period(.)
|
"dot." = "dot.", "dot"
|
Tokens ending with a dash (-)
|
"dash-" = "dash-", "dash"
|
Tokens ending with an underscore (_)
|
"under_" = "under_", "under"
|
|
Ensure that the same order of tokenizers is maintained at indexing and query time. Tokens generated at query time should be the same as when indexing for a given word.
|