Python Introduction to stringprep Library














































Python Introduction to stringprep Library



Python stringprep Library:

Another useful library in python is stringprep.
Stringprep describes a framework for preparing Unicode text strings in order to increase the possibility that string input and string comparison work in ways that make sense for typical users throughout the world.
When we try to find out the required resources on the internet, we use strings for matching and comparing for identifying them on the internet. Exactly how these comparisons need to be made entirely depend on the application domains like whether it should be case-sensitive or not, and whether whitespaces are required or can be ignored, etc. 

RFC-3454 defines the procedure to prepare the Unicode strings before transmitting through the wire and after going through the procedure of preparation, they have a certain normalized form.
RFC-3454 defines a set of tables, which can be combined into profiles and each profile must define which table it uses and what other optional parts of the stringprep procedure are a part of the profile. 
One example of the stringprep profile is nameprep, which is used for internationalized domain names.
There are two kinds of tables, the set and the mappings.
Set Tables:
If one character is present in the set table, then it will return true otherwise false. 
Mappings Tables:
In mapping tables, when the key is passed then the associated value is returned.
The module stringprep exposes the tables from RFC-3454. This module uses the Unicode Character Database internally, since these modules can't be represented as dictionaries or lists. Thus, these tables are exposed as functions not as data structures.

Let's talk about some of its functions:

1. stringprep.in_table_a1(code):
This is a function that returns "True" if the code provided as the argument is there in the table A.1 and "False" if the code is not present in the table A.1.
Table A.1 contains the list of all the unassigned code points in Unicode 3.2

First of all, what is a code point in Unicode?
Code Point is a number assigned to represent an abstract character in a system for representing text and Unicode code points are expressed in the forms "U+1234" where "1234" is the assigned number.
Example: character "A" is assigned a code point of "U+0041"

So, if the mentioned code point in the parameter is not assigned, it will be present in the Table A.1 and hence will result into "True". And if the mentioned code point in the parameter is an assigned Unicode code point, it will not be present in the Table A.1 and hence will result into "False".

EXAMPLE:



2. stringprep.in_table_b1(code):
This function returns "True" and "False" depending on whether the code provided as argument to the function is present in the table B.1 or not.
Table B.1 contains all those Unicode characters which are commonly mapped to nothing.

EXAMPLE:


3. stringprep.map_table_b2(code):
Unlike the previous two functions, this function does not return "True" or "False", but returns the mapped value for the code. Table B.1 is the mappings for case-folding used with NFKC normalization form.

Case folding is basically removing of case distinction, by replacing the upper case and title case characters with the lower case 
Compatibility mappings substitute characters with their compatibility decomposition. Many compatibility mappings are folding, some are multigraph expansions(replacement of the multigraph such as example double prime, by its expansion into an equivalent series of single characters, in this case, two single primes. these are a subset of compatibility mappings).
These types of case folding are useful in fuzzy searches.

EXAMPLE:


4. stringprep.map_table_b3(code):
This function from stringprep library in python returns the mapped value for the code. Table B.1 is the mappings for case-folding used with no normalization. It simply converts them to lower case.

EXAMPLE:





More Articles of Arkaja Sharan:

Name Views Likes
Python codecs Library Error Handling schemes module functions 119 0
Python codecs Library Error Handler register_error and lookup_error functions 119 0
Python codecs Library Error Handlers 136 0
Python codecs Library open and EncodedFile functions 119 0
Python codecs Library iterencode and iterdecode functions 134 0
Python codecs Library register and unregister functions 103 0
Python codecs Library getreader and getwriter functions 119 0
Python codecs Library getincrementalencoder and getincrementaldecoder 103 0
Python codecs Library getencoder and getdecoder functions 113 0
Python Introduction to codecs Library 140 0
Python fcntl Library flock and lockf functions 122 0
Python fcntl Library fcntl and ioctl functions 141 0
Python Resource Library resource usage functions 124 0
Python Resource Library resource usage symbolic constants 105 0
Python Resource Library Resource Limit Functions 125 0
Python resource library resource limit symbolic constants 121 0
Python Introduction to Resource Library 130 0
Python stringprep Library in_table_d1 and in_table_d2 functions 116 0
Python stringprep Library in_table_c8 and in_table_c9 functions 111 0
Python stringprep Library in_table_c5 in_table_c6 and in_table_c7 functions 105 0
Python stringprep Library in_table_c3 and in_table_c4 functions 110 0
Python stringprep library in_table_c21 in_table_c22 and in_table_c21_c22 116 0
Python stringprep library functions in_table_c11 in_table_c12 and in_table_c11_c12 112 0
Python Introduction to stringprep Library 125 0
Python unicodedata library is_normalized unidata_version and ucd_3_2_0 111 0
Python Unicodedata Library functions normalize and decomposition 165 0
Python Unicodedata Library functions east_asian_width and mirrored 110 1
Python Unicodedata Library category bidirectional and combining functions 163 0
Introduction to Unicodedata library lookup and name functions 112 0
Unicode Library decimal digit and numeric functions 118 0
Introduction to Unicode Data library 0 0

Comments