Python codecs Library: Error Handlers
This helps the system know about the technique with which they need to handle the errors occurred in the encoding and decoding.
Codecs have defined various error handling schemes to simplify and standardize error handling.
The schemes can be implemented by accepting the 'errors' string argument in the various encoding and decoding functions in this library.
The various error handlers that can be used with all the Python standard encodings:
1."strict":
When passed this in the 'errors' argument of the functions, it raises UnicodeError or its subclass.
This is implemented using an explicit function in this module named 'codecs.strict_errors(exception)'.
This is the default error scheme used.
2."ignore":
When this is passes as the error argument of the functions in their calls, it implements the error handling scheme which ignores the malformed data and continue without further notice.
This is implemented using an explicit function in this module named 'codecs.ignore_errors(exception)'.
3."replace":
When this is passed in errors, it replaces errors with a replacement marker. On encoding, the replacement marker used is '?' and on decoding the replacement marker used is '�'-"U+FFFD". These are the official replacement characters.
This is implemented using an explicit function in this module named 'codecs.replace_errors(exception)'.
4."backslashreplace":
When this is passed in errors argument of the module function calls, it replaces the errors with the backslashed escape sequences. On encoding, hexadecimal form of the Unicode code points are used with the formats "\xhh", "\uxxxx" or "\Uxxxxxxxx". On decoding, the hexadecimal form of byte value with the format "\xhh".
This is implemented using an explicit function in this module named 'codecs.backslashreplace_errors(exception)'.
5."surrogateescape":
When this is passed in errors argument of the module function calls, it replaces the errors. On decoding, it replaces byte with individual surrogate code ranging from "U+DC80" to "U+DCFF". This code will be turned back to the same byte when 'surrogateescape' error handler is used while encoding the data.
Some of the error handlers are only applicable while encoding of the data:
1."xmlcharrefreplace":
This is only used with encoding functions, this when passed as the error argument will replace the errors with XML/HTML numeric character reference, which is a decimal form of Unicode code point with format '&#num;'.
This is implemented using an explicit function in this module named 'codecs.xmlcharrefreplace_errors(exception)'.
2. "namereplace":
This is only used with encoding functions, this when passed as the error argument will replace the errors with '\N{…}'escape sequences, what appears in the braces is the Name property from Unicode Character Database.
This is implemented using an explicit function in this module named 'codecs.namereplace_errors(exception)'.
Some other error handler for specific codecs:
1.'surrogatepass':
This is a special error handler scheme specific to the given codecs -->{utf-8, utf-16, utf-32, utf-16-be, utf-16-le, utf-32-be, utf-32-le}.
It allows encoding and decoding of the surrogate code point (U+D800 to U+DFFF) as normal code point. Otherwise these codecs treat the presence of surrogate code point in str as an error.
Comments