java.lang.Object | |
↳ | sun.text.normalizer.Trie |
![]() |
A trie is a kind of compressed, serializable table of values associated with Unicode code points (0..0x10ffff).
This class defines the basic structure of a trie and provides methods to retrieve the offsets to the actual data.
Data will be the form of an array of basic types, char or int.
The actual data format will have to be specified by the user in the inner static interface com.ibm.icu.impl.Trie.DataManipulate.
This trie implementation is optimized for getting offset while walking forward through a UTF-16 string. Therefore, the simplest and fastest access macros are the fromLead() and fromOffsetTrail() methods. The fromBMP() method are a little more complicated; they get offsets even for lead surrogate codepoints, while the fromLead() method get special "folded" offsets for lead surrogate code units if there is relevant data associated with them. From such a folded offsets, an offset needs to be extracted to supply to the fromOffsetTrail() methods. To handle such supplementary codepoints, some offset information are kept in the data.
Methods in com.ibm.icu.impl.Trie.DataManipulate are called to retrieve that offset from the folded value for the lead surrogate unit.
For examples of use, see com.ibm.icu.impl.CharTrie or com.ibm.icu.impl.IntTrie.
Nested Classes | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Trie.DataManipulate | Character data in com.ibm.impl.Trie have different user-specified format for different purposes. |
Constants | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
int | INDEX_STAGE_1_SHIFT_ | Shift size for shifting right the input index. | |||||||||
int | INDEX_STAGE_2_SHIFT_ | Shift size for shifting left the index array values. | |||||||||
int | INDEX_STAGE_3_MASK_ | Mask for getting the lower bits from the input index. | |||||||||
int | LEAD_INDEX_OFFSET_ | Lead surrogate code points' index displacement in the index array. | |||||||||
int | SURROGATE_MASK_ | Surrogate mask to use when shifting offset to retrieve supplementary values |
Fields | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
m_dataLength_ | Length of the data array | ||||||||||
m_dataManipulate_ | Internal TrieValue which handles the parsing of the data value. | ||||||||||
m_dataOffset_ | Start index of the data portion of the trie. | ||||||||||
m_index_ | Index or UTF16 characters |
Protected Constructors | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Trie constructor for CharTrie use.
| |||||||||||
Trie constructor
|
Protected Methods | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Gets the offset to data which the BMP character points to
Treats a lead surrogate as a normal code point.
| |||||||||||
Internal trie getter from a code point.
| |||||||||||
Gets the default initial value
| |||||||||||
Gets the offset to the data which this lead surrogate character points
to.
| |||||||||||
Gets the offset to the data which the index ch after variable offset
points to.
| |||||||||||
Gets the offset to the data which the surrogate pair points to.
| |||||||||||
Gets the value at the argument index
| |||||||||||
Determines if this is a 16 bit trie
| |||||||||||
Determines if this is a 32 bit trie
| |||||||||||
Parses the inputstream and creates the trie index with it. |
[Expand]
Inherited Methods | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
![]() |
Shift size for shifting right the input index. 1..9
Shift size for shifting left the index array values. Increases possible data size with 16-bit index values at the cost of compactability. This requires blocks of stage 2 data to be aligned by DATA_GRANULARITY. 0..INDEX_STAGE_1_SHIFT
Mask for getting the lower bits from the input index. DATA_BLOCK_LENGTH_ - 1.
Lead surrogate code points' index displacement in the index array. 0x10000-0xd800=0x2800 0x2800 >> INDEX_STAGE_1_SHIFT_
Surrogate mask to use when shifting offset to retrieve supplementary values
Length of the data array
Internal TrieValue which handles the parsing of the data value. This class is to be implemented by the user
Start index of the data portion of the trie. CharTrie combines index and data into a char array, so this is used to indicate the initial offset to the data portion. Note this index always points to the initial value.
Index or UTF16 characters
Trie constructor for CharTrie use.
inputStream | ICU data file input stream which contains the trie |
---|---|
dataManipulate | object containing the information to parse the trie data |
IOException | thrown when input stream does not have the right header. |
---|
Trie constructor
index | array to be used for index |
---|---|
options | used by the trie |
dataManipulate | object containing the information to parse the trie data |
Gets the offset to data which the BMP character points to Treats a lead surrogate as a normal code point.
ch | BMP character |
---|
Internal trie getter from a code point. Could be faster(?) but longer with if((c32)<=0xd7ff) { (result)=_TRIE_GET_RAW(trie, data, 0, c32); } Gets the offset to data which the codepoint points to
ch | codepoint |
---|
Gets the default initial value
Gets the offset to the data which this lead surrogate character points to. Data at the returned offset may contain folding offset information for the next trailing surrogate character.
ch | lead surrogate character |
---|
Gets the offset to the data which the index ch after variable offset points to. Note for locating a non-supplementary character data offset, calling
getRawOffset(0, ch);
will do. Otherwise if it is a supplementary character formed by surrogates lead and trail. Then we would have to call getRawOffset() with getFoldingIndexOffset(). See getSurrogateOffset().offset | index offset which ch is to start from |
---|---|
ch | index to be used after offset |
Gets the offset to the data which the surrogate pair points to.
lead | lead surrogate |
---|---|
trail | trailing surrogate |
Gets the value at the argument index
index | value at index will be retrieved |
---|
Determines if this is a 16 bit trie
Determines if this is a 32 bit trie
Parses the inputstream and creates the trie index with it.
This is overwritten by the child classes.
inputStream | input stream containing the trie information |
---|
IOException | thrown when data reading fails. |
---|