Python Unicode data:
The 'unicodedata' library in python is helpful in defining the properties for all the unicode characters available in the Unicode database. This library accesses the Unicode Character Database (UCD) for defining the characters.
UCD file link: https://www.unicode.org/Public/9.0.0/ucd/UnicodeData.txt
The next function defined here is:
1. unicodedata.is_normalized(form, unistr)
unicodedata Library provides some attributes and objects also:
1. unicodedata.unidata_version
2. unicodedata.ucd_3_2_0
Let us discuss about these in detail.
unicodedata.is_normalized(form, unistr):
As explained in the previous article, Unicode Normalization helps to solve the compatibility and canonical equivalences.
As stated before, in both, decompositions and compositions, they have 2 types of conversions each:
Decomposition:
1. NFD- Canonical Decomposition
2. NFKD- Compatibility Decomposition
Composition:
1. NFC- Canonical Decomposition followed by Canonical Composition
2. NFKC-Compatibility Decomposition followed by Canonical Composition
While unicodedata.normalize(form unistr) function helps to normalize the string 'unistr' to the stated form 'form' given as the argument to it, unicodedata.is_normalized(from, unistr) tells us whether the string 'unistr' is in the stated normal form or not.
The first parameter 'form' can take the values: 'NFD', 'NFC', 'NFKD', 'NFKC'.
The second parameter takes the string which needs to be checked if it is in the normal from or not.
The function returns 'True' or 'False' depending upon the answer.
EXAMPLE-1:
EXAMPLE-2:
Module Attributes and Objects:
1. unicodedata.unidata_version
This is an attribute in unicodedata library that has the version of the Unicode Database used in the module.
2. unicodedata.ucd_3_2_0
This is an object in unicodedata library that has the same functions and methods as the entire view, but uses the Unicode database version 3.2 instead, for applications that require this specific version of Unicode Database(such as IDNA)
This provides ucd3.2 object mode access in order to be compatible with the old IDNA Applications
Comments