net.sf.saxon.codenorm

Class Normalizer

public class Normalizer extends Object

Implements Unicode Normalization Forms C, D, KC, KD. Copyright (c) 1991-2005 Unicode, Inc. For terms of use, see http://www.unicode.org/terms_of_use.html For documentation, see UAX#15.
The Unicode Consortium makes no expressed or implied warranty of any kind, and assumes no liability for errors or omissions. No liability is assumed for incidental and consequential damages in connection with or arising out of the use of the information here.

Author: Mark Davis Updates for supplementary code points: Vladimir Weinstein & Markus Scherer Modified to remove dependency on ICU code: Michael Kay

Field Summary
static byteC
Normalization Form Selector
static byteCOMPATIBILITY_MASK
Masks for the form selector
static byteCOMPOSITION_MASK
Masks for the form selector
static byteD
Normalization Form Selector
static byteKC
Normalization Form Selector
static byteKD
Normalization Form Selector
static byteNO_ACTION
Normalization Form Selector
Constructor Summary
Normalizer(byte form)
Create a normalizer for a given form.
Normalizer(CharSequence formCS)
Create a normalizer for a given form, expressed as a character string
Method Summary
booleangetExcluded(char ch)
Just accessible for testing.
StringgetRawDecompositionMapping(char ch)
Just accessible for testing.
CharSequencenormalize(CharSequence source)
Normalizes text according to the chosen form

Field Detail

C

public static final byte C
Normalization Form Selector

COMPATIBILITY_MASK

static final byte COMPATIBILITY_MASK
Masks for the form selector

COMPOSITION_MASK

static final byte COMPOSITION_MASK
Masks for the form selector

D

public static final byte D
Normalization Form Selector

KC

public static final byte KC
Normalization Form Selector

KD

public static final byte KD
Normalization Form Selector

NO_ACTION

public static final byte NO_ACTION
Normalization Form Selector

Constructor Detail

Normalizer

public Normalizer(byte form)
Create a normalizer for a given form.

Parameters: form the normalization form required: for example C, D

Normalizer

public Normalizer(CharSequence formCS)
Create a normalizer for a given form, expressed as a character string

Parameters: formCS the normalization form required: for example "NFC" or "NFD"

Method Detail

getExcluded

boolean getExcluded(char ch)
Just accessible for testing.

Parameters: ch a character

Returns: true if the character is an excluded character

getRawDecompositionMapping

String getRawDecompositionMapping(char ch)
Just accessible for testing.

Parameters: ch a character

Returns: the raw decomposition mapping of the character

normalize

public CharSequence normalize(CharSequence source)
Normalizes text according to the chosen form

Parameters: source the original text, unnormalized

Returns: target the resulting normalized text