Python bz2 Introduction














































Python bz2 Introduction



What is bzip2?

bzip2 is a popular, free and open-source file compression tool. The main feature of the bzip2 program is that it compresses single files only. Due to its immense popularity among developers, there is a comprehensive interface in python maintained as bz2 library to support bzip2 operations.

Now let's have a quick glance at the bz2 library:


Of course, first, we have to import the library as:
import bz2

Let's look at some use cases of bz2:

Since this is a data compression and decompression library, we will need some data! Consider the following raw binary string as our data for example:
bytes_string = b"Hello, I am learning bz2 compression and about to compress this string. This so exciting!"


Now, let's compress our string using bz2.compress() function:
>>> compressed_string = bz2.compress(byte_string)
>>> print(compressed_string)
b'BZh91AY&SY%uFFFD\%uFFFDm%uFFFD%uFFFD\%uFFFD%uFFFD`\%uFFFD%uFFFD \%uFFFD>%uFFFD%uFFFDP %uFFFDH%uFFFD%uFFFDI%uFFFDOB2mF%uFFFDD5\"4%uFFFDd%u0104%uFFFD%uFFFD%uFFFDg%uFFFD%uFFFD*4%uFFFDW%uFFFD4\%uFFFD%uFFFD%u012Ax%uFFFD%uFFFD%uFFFD \mB<%uFFFD%uFFFDp%uFFFD3a%uFFFD%uFFFDP%uFFFD'0z8 :F%u063B%uFFFD)\%uFFFD@%uFFFD#h'

Cool! But how do we decompress and get our string back? bz2 provides bz2.decompress() function:
>>> decompressed_string = bz2.decompress(compressed_string)
>>> print(decompressed_string)
b'hello, I am learning bz2 compression and about to compress this string. This is so exciting!'


To test whether our data is not mutated in the compression process, we can use a simple comparator operation, for instance:
>>> bytes_string == decompressed_string
True

This can be helpful when our data is large and cannot be manually compared!


Caution: We must use byte object in bz2.compress(). Otherwise, the following error occurs:
>>> string = "hello, I am learning bz2 compression and about to compress this string!"
>>> compressed_string = bz2.compress(string)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\..\miniconda3\lib\bz2.py", line 334, in compress
    return comp.compress(string) + comp.flush()
TypeError: a bytes-like object is required, not 'str'

Foot Notes:


There are various features in the bz2 library which we will explore in the coming articles. We will cover:
  •  bz2.compress()and  bz2.decompress()functions.
  • BZ2Decompressor and BZ2Compressor classes -- very useful feature
  • open function and BZ2File class -- basic features for handling File I/O operations



Comments