![]() |
The TEXT opType is currently designed to handle English text. While non-English text is untested it might appear to work. However, it could yield some unpredictable results.
|
![]() |
Examples:
• Dog, DOG, dog are considered the same word.
• dog, dogs, dog’s are considered to be different words.
• As a result of this tokenization step, a text entry of “The quick brown fox jumps over the lazy dog.” becomes the following list of words: the, quick, brown, fox, jumps, over, the, lazy, dog
|
![]() |
Example:
• a, an, the, i, she, he, her, him, it, and, but, if, of, at, for
• As a result of this step, word list 1 below becomes word list 2:
1. the, quick, brown, fox, jumps, over, the, lazy, dog
2. quick, brown, fox, jumps, lazy, dog
|
![]() |
Examples:
• 1-grams: quick, brown, fox, jumps, lazy, dog
• 2-grams: quick brown, brown fox, fox jumps, jumps lazy, lazy dog
• 3-grams: quick brown fox, brown fox jumps, jumps lazy dog
|
![]() |
Example:
• If the vocabulary for the dataset includes: quick, brown, fox, brown fox, red, red fox
Then a row containing the text entry: The quick brown fox jumps over the lazy dog.
Will now contain 6 new fields with the following counts: 1, 1, 1, 1, 0, 0
• A row that contained the text entry: The red fox jumps over the brown fox.
Will now contain fields with the following counts: 0, 1, 2, 1, 1, 1
|
![]() |
For prescriptive scoring, free text fields cannot be used as lever features. Predictive scoring with important fields is not supported if the model has text inputs but predictive scoring without important fields is supported.
|
![]() |
Note:
• In the context of maxAllowedFields parameter during model training configuration, each of the resulting n-gram count fields is treated as an individual field and shall be treated same as other features included for training the model.
• n-gram count field features are not separately displayed in the Analytics Builder.
|