Optimizing Leading and Trailing Wildcard Queries Using Substring Indexes

Substrings are string containing both leading and trailing wildcard characters, such as *ABC*. Queries containing substring criteria can take significantly longer to run than queries for strings ending with wildcards, such as ABC*. Standard database indexes do not execute queries with substring criteria efficiently.

A standard database index is like a phone directory, with entries sorted by last name. If you know the exact name, or even the first few letters of the last name, finding a particular entry is quick and efficient. If you search for a string of letters that occur within that last name, however, you have to look through every individual entry. The letters that you know are a substring of the complete string of letters that make up the name. Just as with the phone directory, if the leading characters of the string are not known, a standard database index is of no use.

Additional database schema objects can be created to support efficient execution for these types of queries, and improve end-user query response time. These additional database schema objects are known as substring indexes. A substring index can be created for each string attribute that is queried using substring criteria. Creating an index of substrings can improve the performance of queries containing substrings. Essentially, this index of substrings would take each entry in the phone directory and store smaller portions of each last name as individual index entries. This allows the system to efficiently find entries when just a few characters are provided.

Administrators can use SQL monitoring tools such as the TopSQL MBean to identify queries taking up too much time or system resources. These queries might benefit from substring indexes. For example, you might observe that there are numerous searches for parts, and that users commonly search for substrings of the part name or number. Users may also alert an administrator or IT that queries with leading wildcards take much longer than other searches.

Once you have identified the attributes commonly involved in a substring criteria search, you can create substring indexes to allow these searches to perform more efficiently.

Keep in mind that these additional substring index schema objects require system storage space, and their maintenance requires system processing resources. This means that substring indexes may not be suitable for every string attribute. There is a trade-off between extra system resources and end-user response time.

To create a substring index, you must:

When using substring indexes, queries are optimized for substrings up to a certain number. This number is the product of the indexLength and joinLimit properties specified in the configuration file. For example, if your indexLength is 4 and joinLimit is 3, then substring queries have most optimal performance if the number of characters specified is 12 or less. This optimal character number should be communicated to your users. If a user enters more characters than the product of the indexLength and joinLimit properties, the query still works, but it is not as optimized. The more characters specified in the query criteria, the more refined the result set will be. If search criteria commonly exceeds the optimal character number, and users report poor query performance, you should consider increasing the indexLength.