Python Unicode data:
The 'unicodedata' library in python is helpful in defining the
properties for all the Unicode characters available in the Unicode
database. This library accesses the Unicode Character Database (UCD) for defining
the characters.
The next three functions defined here are:
1. unicodedata.decimal(chr)
2. unicodedata.digit(chr)
3. unicodedata.numeric(chr)
Before describing the above mentioned functions, lets be clear on the
fact that what is decimal, digit and numeric values for a character.
Each character in the Unicode character set have a Numeric Type
property. This property can have 4 possible values and those are:
Numeric-Type=Decimal
Numeric-Type=Digit
Numeric-Type=Numeric
Numeric-Type=None
The first type, i.e. Numeric-Type=Decimal property is
limited to those numeric characters those which are used in decimal-radix
numbers and for which a full set of digits has been encoded in a continuous
range of ascending numeric values with the digit zero as the first code point
in the range. Thus we can say that characters with this property are basically
decimal digits. These exclude some characters such as the Chinese, Japanese and
Korean ideographic digits, because they are not encoded in continuous range. Subscripts
and superscripts are also excluded.
The second type i.e. Numeric-Type=Digit property does
state that the character should be containing the digit values but it does not
necessarily impose all the strict rules set in the Numeric-Type=Decimal
property.
The third type i.e. Numeric-Type=Numeric basically
represents all those characters that represent some or the other numbers that
does not fit into the category of either the Numeric-Type-Decimal and nor in
the Numeric-Type=Digit.
The fourth type i.e. Numeric-Type=None basically
includes all those characters that does not represent numbers in any form.
The below defined functions are helpful in extracting the decimal,
digit or numeric values for the mentioned characters if their
Numeric-Type=Decimal, Digit or Numeric respectively.
If the Numeric_Type=None property character is fed into these functions
then the function returns the ValueError.
unicodedata.decimal(chr):
This is a function defined in the unicodedata library that returns the
decimal values assigned to the character 'chr' as integer type.
If no such value is defined, then is available the default value is
printed otherwise a ValueError is displayed.
The value returned is of type 'int'.
unicodedata.digit(chr):
This is a function defined in the unicodedata library of python which
returns the digit value of the character 'chr' entered as the argument in the
function as integer .
If no such value is defined, a default value is returned otherwise
ValueError is raised.
The value returned is of type 'int'.
unicodedata.numeric(chr):
This is a function defined in the unicodedata library of python which
returns the numeric value of the character 'chr' provided as an argument to the
function as float.
If no such value is defined, then the default value id returned
otherwise ValueError is raised.
The value returned is of type 'float'.
Comments