Python Unicode data:
The 'unicodedata' library in python is helpful in defining the
properties for all the Unicode characters available in the Unicode
database. This library accesses the Unicode Character Database (UCD) for
defining the characters.
The next three functions defined here are:
1. unicodedata.category(chr)
2. unicodedata.bidirectional(chr)
3. unicodedata.combining(chr)
Unicodedata.category(chr):
This is a function in the unicodedata library for python which returns
the category assigned to the unicode character ‘chr’.
Each unicode character is assigned a category. All such categories are
listed here:
[Cc] - Other,Control
[Cf] - Other,Format
[Cn] - Other,Not Assigned(no characters in the file has this property)
[Co] - Other,Private Use
[Cs] - Other,Surrogate
[LC] - Letter,Cased
[Ll] - Letter,Lowercase
[Lm] - Letter,Modifier
[Lo] - Letter,Other
[Lt] - Letter,Title case
[Lu] - Letter,Uppercase
[Mc] - Mark,Spacing Combining
[Me] - Mark,Enclosing
[Mn] - Mark,Non-spacing
[Nd] - Number,Decimal Digit
[Nl] - Number,Letter
[No] - Number,Other
[Pc] - Punctuation,Connector
[Pd] - Punctuation,Dash
[Pe] - Punctuation,Close
[Pf] - Punctuation,Final quote (may behave like ps or Pe depending on usage)
[Pi] - Punctuation,Initial quote (may behave like Ps or Pe depending on usage)
[Po] - Punctuation,Other
[Ps] - Punctuation,Open
[Sc] - Symbol,Currency
[Sk] - Symbol,Modifier
[Sm] - Symbol,Math
[So] - Symbol,Other
[Zl] - Seperator,Line
[Zp] - Seperator,Paragraph
[Zs] - Seperator,Space
Thus this function returns these categories.
Unicodedata.bidirectional(chr):
This is a function in the unicodedata library in python and it returns
the Bidirectional class assigned to the character ‘chr’ passed to the argument
as a string.
This i
Unicodedata.combining(chr):
This is a function in unicodedata library in python that returns the
canonical combining classes assigned to those characters. If no canonical
combining class is assigned then the function returns ‘0’.
Canonical Combining classes essentially indicates the priority with
which a combining character is attached to its base character. The characters
whose combining class is 0 are the base characters and all the other characters
are the combining classes.
Combining Classes are limited to the values from 0 to 255
The various combining classes these return are:
0: Spacing,Split, Enclosing, Reordrant and Tibetan subjoined.
1: Overlaysand interior
2: Nuktas
3: Hiragana/Katakanavoicing marks
9: Viramas
10: Startof fixed position classes
199: End of fixed Position classes
200: Below left attached
202: Below attached
204: Below right attached
208: Left attached
210: Right Attached
212: Above left attached
214: Above attached
216: Above right attached
218: Below left
220: Below
222: Below right
224: Left(reordrant around single base character)
226: Right
228: Above left
230: Above
232: Above right
233: Double below
234: Double above
240: below(iota subscript)
Some of the combining classes present in the list do not have members
but are specified here for completeness.
Comments