java.lang.Object | |
↳ | sun.text.normalizer.UTF16 |
Standalone utility class providing UTF16 character conversions and indexing conversions.
Code that uses strings alone rarely need modification.
By design, UTF-16 does not allow overlap, so searching for strings is a safe
operation. Similarly, concatenation is always safe. Substringing is safe if
the start and end are both on UTF-32 boundaries. In normal code, the values
for start and end are on those boundaries, since they arose from operations
like searching. If not, the nearest UTF-32 boundaries can be determined
using bounds()
.
The following examples illustrate use of some of these methods.
// iteration forwards: Original for (int i = 0; i < s.length(); ++i) { char ch = s.charAt(i); doSomethingWith(ch); } // iteration forwards: Changes for UTF-32 int ch; for (int i = 0; i < s.length(); i+=UTF16.getCharCount(ch)) { ch = UTF16.charAt(s,i); doSomethingWith(ch); } // iteration backwards: Original for (int i = s.length() -1; i >= 0; --i) { char ch = s.charAt(i); doSomethingWith(ch); } // iteration backwards: Changes for UTF-32 int ch; for (int i = s.length() -1; i > 0; i-=UTF16.getCharCount(ch)) { ch = UTF16.charAt(s,i); doSomethingWith(ch); }Notes:
Lead
and Trail
in the API, which gives a better
sense of their ordering in a string. offset16
and
offset32
are used to distinguish offsets to UTF-16
boundaries vs offsets to UTF-32 boundaries. int char32
is
used to contain UTF-32 characters, as opposed to char16
,
which is a UTF-16 code unit.
bounds(string, offset16) != TRAIL
.
UCharacter.isLegal()
can be used to check
for validity if desired.
Constants | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
int | CODEPOINT_MAX_VALUE | The highest Unicode code point value (scalar value) according to the Unicode Standard. | |||||||||
int | CODEPOINT_MIN_VALUE | The lowest Unicode code point value. | |||||||||
int | LEAD_SURROGATE_MAX_VALUE | Lead surrogate maximum value | |||||||||
int | LEAD_SURROGATE_MIN_VALUE | Lead surrogate minimum value | |||||||||
int | SUPPLEMENTARY_MIN_VALUE | The minimum value for Supplementary code points | |||||||||
int | SURROGATE_MIN_VALUE | Surrogate minimum value | |||||||||
int | TRAIL_SURROGATE_MAX_VALUE | Trail surrogate maximum value | |||||||||
int | TRAIL_SURROGATE_MIN_VALUE | Trail surrogate minimum value |
Public Constructors | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Public Methods | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Append a single UTF-32 value to the end of a StringBuffer.
| |||||||||||
Extract a single UTF-32 value from a string.
| |||||||||||
Extract a single UTF-32 value from a substring.
| |||||||||||
Determines how many chars this char32 requires.
| |||||||||||
Returns the lead surrogate.
| |||||||||||
Returns the trail surrogate.
| |||||||||||
Determines whether the character is a lead surrogate.
| |||||||||||
Determines whether the code value is a surrogate.
| |||||||||||
Determines whether the character is a trail surrogate.
| |||||||||||
Shifts offset16 by the argument number of codepoints within a subarray.
| |||||||||||
Convenience method corresponding to String.valueOf(char).
|
[Expand]
Inherited Methods | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
![]() |
The highest Unicode code point value (scalar value) according to the Unicode Standard.
The lowest Unicode code point value.
Lead surrogate maximum value
Lead surrogate minimum value
The minimum value for Supplementary code points
Surrogate minimum value
Trail surrogate maximum value
Trail surrogate minimum value
Append a single UTF-32 value to the end of a StringBuffer.
If a validity check is required, use
isLegal()
on char32 before calling.
target | the buffer to append to |
---|---|
char32 | value to append. |
IllegalArgumentException | thrown when char32 does not lie within the range of the Unicode codepoints |
---|
Extract a single UTF-32 value from a string.
Used when iterating forwards or backwards (with
UTF16.getCharCount()
, as well as random access. If a
validity check is required, use
UCharacter.isLegal()
on the return value.
If the char retrieved is part of a surrogate pair, its supplementary
character will be returned. If a complete supplementary character is
not found the incomplete character will be returned
source | array of UTF-16 chars |
---|---|
offset16 | UTF-16 offset to the start of the character. |
bounds32()
.IndexOutOfBoundsException | thrown if offset16 is out of bounds. |
---|
Extract a single UTF-32 value from a substring.
Used when iterating forwards or backwards (with
UTF16.getCharCount()
, as well as random access. If a
validity check is required, use
UCharacter.isLegal()
on the return value.
If the char retrieved is part of a surrogate pair, its supplementary
character will be returned. If a complete supplementary character is
not found the incomplete character will be returned
source | array of UTF-16 chars |
---|---|
start | offset to substring in the source array for analyzing |
limit | offset to substring in the source array for analyzing |
offset16 | UTF-16 offset relative to start |
bounds32()
.IndexOutOfBoundsException | thrown if offset16 is not within the range of start and limit. |
---|
Determines how many chars this char32 requires.
If a validity check is required, use
isLegal()
on
char32 before calling.
char32 | the input codepoint. |
---|
Returns the lead surrogate.
If a validity check is required, use
isLegal()
on char32 before calling.
char32 | the input character. |
---|
Returns the trail surrogate.
If a validity check is required, use
isLegal()
on char32 before calling.
char32 | the input character. |
---|
Determines whether the character is a lead surrogate.
char16 | the input character. |
---|
Determines whether the code value is a surrogate.
char16 | the input character. |
---|
Determines whether the character is a trail surrogate.
char16 | the input character. |
---|
Shifts offset16 by the argument number of codepoints within a subarray.
source | char array |
---|---|
start | position of the subarray to be performed on |
limit | position of the subarray to be performed on |
offset16 | UTF16 position to shift relative to start |
shift32 | number of codepoints to shift |
IndexOutOfBoundsException | if the new offset16 is out of bounds with respect to the subarray or the subarray bounds are out of range. |
---|
Convenience method corresponding to String.valueOf(char). Returns a one
or two char string containing the UTF-32 value in UTF16 format. If a
validity check is required, use
isLegal()
on char32 before calling.
char32 | the input character. |
---|
IllegalArgumentException | thrown if char32 is a invalid codepoint. |
---|