java.lang.Object | |
↳ | sun.text.normalizer.UCharacter |
The UCharacter class provides extensions to the java.lang.Character class. These extensions provide support for Unicode 3.2 properties and together with the UTF16 class, provide support for supplementary characters (those with code points above U+FFFF).
Code points are represented in these API using ints. While it would be more convenient in Java to have a separate primitive datatype for them, ints suffice in the meantime.
To use this class please add the jar file name icu4j.jar to the
class path, since it contains data files which supply the information used
by this file.
E.g. In Windows
set CLASSPATH=%CLASSPATH%;$JAR_FILE_PATH/ucharacter.jar
.
Otherwise, another method would be to copy the files uprops.dat and
unames.icu from the icu4j source subdirectory
$ICU4J_SRC/src/com.ibm.icu.impl.data to your class directory
$ICU4J_CLASS/com.ibm.icu.impl.data.
Aside from the additions for UTF-16 support, and the updated Unicode 3.1 properties, the main differences between UCharacter and Character are:
Further detail differences can be determined from the program com.ibm.icu.dev.test.lang.UCharacterCompare
This class is not subclassable
Nested Classes | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
UCharacter.ECharacterCategory | This interface is deprecated. This is a draft API and might change in a future release of ICU. | ||||||||||
UCharacter.HangulSyllableType | Hangul Syllable Type constants. | ||||||||||
UCharacter.NumericType | Numeric Type constants. |
Constants | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
int | MAX_VALUE | The highest Unicode code point value (scalar value) according to the Unicode Standard. | |||||||||
int | MIN_VALUE | The lowest Unicode code point value. | |||||||||
double | NO_NUMERIC_VALUE | Special value that is returned by getUnicodeNumericValue(int) when no numeric value is defined for a code point. | |||||||||
int | SUPPLEMENTARY_MIN_VALUE | The minimum value for Supplementary code points |
Public Methods | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Retrieves the numeric value of a decimal digit code point.
| |||||||||||
The given string is mapped to its case folding equivalent according to
UnicodeData.txt and CaseFolding.txt; if any character has no case
folding equivalent, the character itself is returned.
| |||||||||||
Get the "age" of the code point. | |||||||||||
Returns a code point corresponding to the two UTF16 characters.
| |||||||||||
Returns the Bidirection property of a code point.
| |||||||||||
Gets the property value for an Unicode property type of a code point. | |||||||||||
Returns a value indicating a code point's Unicode category.
| |||||||||||
Get the numeric value for a Unicode code point as defined in the Unicode Character Database. |
[Expand]
Inherited Methods | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
![]() |
The highest Unicode code point value (scalar value) according to the
Unicode Standard.
This is a 21-bit value (21 bits, rounded up).
Up-to-date Unicode implementation of java.lang.Character.MIN_VALUE
The lowest Unicode code point value.
Special value that is returned by getUnicodeNumericValue(int) when no numeric value is defined for a code point.
The minimum value for Supplementary code points
Retrieves the numeric value of a decimal digit code point.
This method observes the semantics of
java.lang.Character.digit()
. Note that this
will return positive values for code points for which isDigit
returns false, just like java.lang.Character.
Semantic Change: In release 1.3.1 and
prior, this did not treat the European letters as having a
digit value, and also treated numeric letters and other numbers as
digits.
This has been changed to conform to the java semantics.
A code point is a valid digit if and only if:
ch | the code point to query |
---|---|
radix | the radix |
The given string is mapped to its case folding equivalent according to UnicodeData.txt and CaseFolding.txt; if any character has no case folding equivalent, the character itself is returned. "Full", multiple-code point case folding mappings are returned here. For "simple" single-code point mappings use the API foldCase(int ch, boolean defaultmapping).
str | the String to be converted |
---|---|
defaultmapping | Indicates if all mappings defined in CaseFolding.txt is to be used, otherwise the mappings for dotted I and dotless i marked with 'I' in CaseFolding.txt will be skipped. |
Get the "age" of the code point.
The "age" is the Unicode version when the code point was first designated (as a non-character or for Private Use) or assigned a character.
This can be useful to avoid emitting code points to receiving processes that do not accept newer characters.
The data is from the UCD file DerivedAge.txt.
ch | The code point. |
---|
Returns a code point corresponding to the two UTF16 characters.
lead | the lead char |
---|---|
trail | the trail char |
IllegalArgumentException | thrown when argument characters do not form a valid codepoint |
---|
Returns the Bidirection property of a code point.
For example, 0x0041 (letter A) has the LEFT_TO_RIGHT directional
property.
Result returned belongs to the interface
UCharacterDirection
ch | the code point to be determined its direction |
---|
Gets the property value for an Unicode property type of a code point. Also returns binary and mask property values.
Unicode, especially in version 3.2, defines many more properties than the original set in UnicodeData.txt.
The properties APIs are intended to reflect Unicode properties as defined in the Unicode Character Database (UCD) and Unicode Technical Reports (UTR). For details about the properties see http://www.unicode.org/.
For names of Unicode properties see the UCD file PropertyAliases.txt.
Sample usage: int ea = UCharacter.getIntPropertyValue(c, UProperty.EAST_ASIAN_WIDTH); int ideo = UCharacter.getIntPropertyValue(c, UProperty.IDEOGRAPHIC); boolean b = (ideo == 1) ? true : false;
ch | code point to test. |
---|---|
type | UProperty selector constant, identifies which binary property to check. Must be UProperty.BINARY_START <= type < UProperty.BINARY_LIMIT or UProperty.INT_START <= type < UProperty.INT_LIMIT or UProperty.MASK_START <= type < UProperty.MASK_LIMIT. |
UProperty
Returns a value indicating a code point's Unicode category.
Up-to-date Unicode implementation of java.lang.Character.getType()
except for the above mentioned code points that had their category
changed.
Return results are constants from the interface
UCharacterCategory
NOTE: the UCharacterCategory values are not compatible with
those returned by java.lang.Character.getType. UCharacterCategory values
match the ones used in ICU4C, while java.lang.Character type
values, though similar, skip the value 17.
ch | code point whose type is to be determined |
---|
Get the numeric value for a Unicode code point as defined in the Unicode Character Database.
A "double" return type is necessary because some numeric values are fractions, negative, or too large for int.
For characters without any numeric values in the Unicode Character Database, this function will return NO_NUMERIC_VALUE.
API Change: In release 2.2 and prior, this API has a return type int and returns -1 when the argument ch does not have a corresponding numeric value. This has been changed to synch with ICU4C
This corresponds to the ICU4C function u_getNumericValue.ch | Code point to get the numeric value for. |
---|