Python Unicodedata Library category bidirectional and combining functions














































Python Unicodedata Library category bidirectional and combining functions



Python Unicode data:


The 'unicodedata' library in python is helpful in defining the
properties for all the Unicode characters available in the Unicode
database. This library accesses the Unicode Character Database (UCD) for
defining the characters.


The next three functions defined here are:

1. unicodedata.category(chr)

2. unicodedata.bidirectional(chr)

3. unicodedata.combining(chr)


Unicodedata.category(chr):


This is a function in the unicodedata library for python which returns
the category assigned to the unicode character ‘chr’.

Each unicode character is assigned a category. All such categories are
listed here:


[Cc]    -         Other,Control

[Cf]    -         Other,Format

[Cn]    -         Other,Not Assigned(no characters in the file has this property)

[Co]    -         Other,Private Use

[Cs]    -         Other,Surrogate

[LC]    -         Letter,Cased

[Ll]               -         Letter,Lowercase

[Lm]   -         Letter,Modifier

[Lo]    -         Letter,Other

[Lt]    -         Letter,Title case

[Lu]    -         Letter,Uppercase

[Mc]   -         Mark,Spacing Combining

[Me]   -         Mark,Enclosing

[Mn]   -         Mark,Non-spacing

[Nd]   -         Number,Decimal Digit

[Nl]    -         Number,Letter

[No]   -         Number,Other

[Pc]    -         Punctuation,Connector

[Pd]    -         Punctuation,Dash

[Pe]    -         Punctuation,Close

[Pf]    -         Punctuation,Final quote (may behave like ps or Pe depending on usage)

[Pi]     -         Punctuation,Initial quote (may behave like Ps or Pe depending on usage)

[Po]    -         Punctuation,Other

[Ps]    -         Punctuation,Open

[Sc]   -         Symbol,Currency

[Sk]   -         Symbol,Modifier

[Sm]  -         Symbol,Math

[So]   -         Symbol,Other

[Zl]    -         Seperator,Line

[Zp]   -         Seperator,Paragraph

[Zs]   -         Seperator,Space

Thus this function returns these categories.






Unicodedata.bidirectional(chr):


This is a function in the unicodedata library in python and it returns the Bidirectional class assigned to the character ‘chr’ passed to the argument as a string.

Below are the Bidirectional classes :
L                  -         Left-to-Right
LRE              -         Left-to-Right-Embedding
LRO             -         Left-to-Right Override
R                  -         Right-to-Left
AL                -         Right-to-Left Arabic
RLE              -         Right-to-Left Embedding
RLO             -         Right-to-Left Override
PDF              -         Pop Directional Format
EN               -         European Number
ES                -         European Number Seperator
ET                -         European Number Terminator
AN               -         Arabic Number
CS                -         Common Number Seperator
NSM             -         Non-Spacing Mark
BN               -         Boundary Neutral            
B                  -         Paragraph Seperator
S                  -         Segment Seperator
WS              -         Whitespace
ON               -         Other Neutrals
This function returns one of these classes assigned.
An empty string is returned no such value is defined.

This i





Unicodedata.combining(chr):


This is a function in unicodedata library in python that returns the
canonical combining classes assigned to those characters. If no canonical
combining class is assigned then the function returns ‘0’.


Canonical Combining classes essentially indicates the priority with
which a combining character is attached to its base character. The characters
whose combining class is 0 are the base characters and all the other characters
are the combining classes.


Combining Classes are limited to the values from 0 to 255



The various combining classes these return are:



0:        Spacing,Split, Enclosing, Reordrant and Tibetan subjoined.

1:        Overlaysand interior

2:        Nuktas

3:        Hiragana/Katakanavoicing marks

9:        Viramas

10:      Startof fixed position classes

199:    End of fixed Position classes

200:   Below left attached

202:   Below attached

204:   Below right attached

208:   Left attached

210:    Right Attached

212:    Above left attached

214:    Above attached

216:    Above right attached

218:    Below left

220:   Below

222:   Below right

224:   Left(reordrant around single base character)

226:   Right

228:   Above left

230:   Above

232:   Above right

233:   Double below

234:   Double above

240:   below(iota subscript)



Some of the combining classes present in the list do not have members
but are specified here for completeness.





More Articles of Arkaja Sharan:

Name Views Likes
Python codecs Library Error Handling schemes module functions 142 0
Python codecs Library Error Handler register_error and lookup_error functions 133 0
Python codecs Library Error Handlers 152 0
Python codecs Library open and EncodedFile functions 136 0
Python codecs Library iterencode and iterdecode functions 147 0
Python codecs Library register and unregister functions 118 0
Python codecs Library getreader and getwriter functions 136 0
Python codecs Library getincrementalencoder and getincrementaldecoder 125 0
Python codecs Library getencoder and getdecoder functions 132 0
Python Introduction to codecs Library 159 0
Python fcntl Library flock and lockf functions 137 0
Python fcntl Library fcntl and ioctl functions 153 0
Python Resource Library resource usage functions 140 0
Python Resource Library resource usage symbolic constants 119 0
Python Resource Library Resource Limit Functions 143 0
Python resource library resource limit symbolic constants 144 0
Python Introduction to Resource Library 146 0
Python stringprep Library in_table_d1 and in_table_d2 functions 130 0
Python stringprep Library in_table_c8 and in_table_c9 functions 123 0
Python stringprep Library in_table_c5 in_table_c6 and in_table_c7 functions 119 0
Python stringprep Library in_table_c3 and in_table_c4 functions 122 0
Python stringprep library in_table_c21 in_table_c22 and in_table_c21_c22 131 0
Python stringprep library functions in_table_c11 in_table_c12 and in_table_c11_c12 126 0
Python Introduction to stringprep Library 143 0
Python unicodedata library is_normalized unidata_version and ucd_3_2_0 127 0
Python Unicodedata Library functions normalize and decomposition 183 0
Python Unicodedata Library functions east_asian_width and mirrored 129 1
Python Unicodedata Library category bidirectional and combining functions 192 0
Introduction to Unicodedata library lookup and name functions 137 0
Unicode Library decimal digit and numeric functions 134 0
Introduction to Unicode Data library 0 0

Comments