www.openlinksw.com
docs.openlinksw.com

Book Home

Contents
Preface

Free Text Search

Basic Concepts
Creating Free Text Indexes
Querying Free Text Indexes
Text Triggers
Generated Tables and Internals
Removing A Text Index
Removing A Text Trigger
Internationalization & Unicode
Performance
Free Text Functions

19.8. Internationalization & Unicode

The text being indexed and the text query expression may both be wide strings. The word boundaries used to cut the text in words in both queries and index maintenance may depend on a language declared for the text index.

The default language has white space and punctuation as word delimiters and will recognize Unicode ideographic characters as self standing. A single non-ideographic character will always be considered noise and not indexed.

Non-ASCII Unicode values are converted to UTF8 before being stored into the word table as narrow strings. Narrow 8 bit strings are stored in the words table as is.

See Also:

The LANGUAGE option in CREATE TEXT INDEX.