Python Introduction to stringprep Library

Python Introduction to stringprep Library

Python stringprep Library:

Another useful library in python is stringprep.
Stringprep describes a framework for preparing Unicode text strings in order to increase the possibility that string input and string comparison work in ways that make sense for typical users throughout the world.
When we try to find out the required resources on the internet, we use strings for matching and comparing for identifying them on the internet. Exactly how these comparisons need to be made entirely depend on the application domains like whether it should be case-sensitive or not, and whether whitespaces are required or can be ignored, etc. 

RFC-3454 defines the procedure to prepare the Unicode strings before transmitting through the wire and after going through the procedure of preparation, they have a certain normalized form.
RFC-3454 defines a set of tables, which can be combined into profiles and each profile must define which table it uses and what other optional parts of the stringprep procedure are a part of the profile. 
One example of the stringprep profile is nameprep, which is used for internationalized domain names.
There are two kinds of tables, the set and the mappings.
Set Tables:
If one character is present in the set table, then it will return true otherwise false. 
Mappings Tables:
In mapping tables, when the key is passed then the associated value is returned.
The module stringprep exposes the tables from RFC-3454. This module uses the Unicode Character Database internally, since these modules can't be represented as dictionaries or lists. Thus, these tables are exposed as functions not as data structures.

Let's talk about some of its functions:

1. stringprep.in_table_a1(code):
This is a function that returns "True" if the code provided as the argument is there in the table A.1 and "False" if the code is not present in the table A.1.
Table A.1 contains the list of all the unassigned code points in Unicode 3.2

First of all, what is a code point in Unicode?
Code Point is a number assigned to represent an abstract character in a system for representing text and Unicode code points are expressed in the forms "U+1234" where "1234" is the assigned number.
Example: character "A" is assigned a code point of "U+0041"

So, if the mentioned code point in the parameter is not assigned, it will be present in the Table A.1 and hence will result into "True". And if the mentioned code point in the parameter is an assigned Unicode code point, it will not be present in the Table A.1 and hence will result into "False".


2. stringprep.in_table_b1(code):
This function returns "True" and "False" depending on whether the code provided as argument to the function is present in the table B.1 or not.
Table B.1 contains all those Unicode characters which are commonly mapped to nothing.


3. stringprep.map_table_b2(code):
Unlike the previous two functions, this function does not return "True" or "False", but returns the mapped value for the code. Table B.1 is the mappings for case-folding used with NFKC normalization form.

Case folding is basically removing of case distinction, by replacing the upper case and title case characters with the lower case 
Compatibility mappings substitute characters with their compatibility decomposition. Many compatibility mappings are folding, some are multigraph expansions(replacement of the multigraph such as example double prime, by its expansion into an equivalent series of single characters, in this case, two single primes. these are a subset of compatibility mappings).
These types of case folding are useful in fuzzy searches.


4. stringprep.map_table_b3(code):
This function from stringprep library in python returns the mapped value for the code. Table B.1 is the mappings for case-folding used with no normalization. It simply converts them to lower case.


More Articles of Arkaja Sharan:

Name Views Likes
Python codecs Library Error Handling schemes module functions 52 0
Python codecs Library Error Handler register_error and lookup_error functions 51 0
Python codecs Library Error Handlers 50 0
Python codecs Library open and EncodedFile functions 47 0
Python codecs Library iterencode and iterdecode functions 53 0
Python codecs Library register and unregister functions 44 0
Python codecs Library getreader and getwriter functions 51 0
Python codecs Library getincrementalencoder and getincrementaldecoder 42 0
Python codecs Library getencoder and getdecoder functions 47 0
Python Introduction to codecs Library 68 0
Python fcntl Library flock and lockf functions 49 0
Python fcntl Library fcntl and ioctl functions 66 0
Python Resource Library resource usage functions 64 0
Python Resource Library resource usage symbolic constants 52 0
Python Resource Library Resource Limit Functions 61 0
Python resource library resource limit symbolic constants 58 0
Python Introduction to Resource Library 44 0
Python stringprep Library in_table_d1 and in_table_d2 functions 50 0
Python stringprep Library in_table_c8 and in_table_c9 functions 57 0
Python stringprep Library in_table_c5 in_table_c6 and in_table_c7 functions 47 0
Python stringprep Library in_table_c3 and in_table_c4 functions 50 0
Python stringprep library in_table_c21 in_table_c22 and in_table_c21_c22 44 0
Python stringprep library functions in_table_c11 in_table_c12 and in_table_c11_c12 53 0
Python Introduction to stringprep Library 53 0
Python unicodedata library is_normalized unidata_version and ucd_3_2_0 52 0
Python Unicodedata Library functions normalize and decomposition 101 0
Python Unicodedata Library functions east_asian_width and mirrored 53 1
Python Unicodedata Library category bidirectional and combining functions 87 0
Introduction to Unicodedata library lookup and name functions 51 0
Unicode Library decimal digit and numeric functions 55 0
Introduction to Unicode Data library 0 0