Python bz2 Introduction

Python bz2 Introduction

What is bzip2?

bzip2 is a popular, free and open-source file compression tool. The main feature of the bzip2 program is that it compresses single files only. Due to its immense popularity among developers, there is a comprehensive interface in python maintained as bz2 library to support bzip2 operations.

Now let's have a quick glance at the bz2 library:

Of course, first, we have to import the library as:
import bz2

Let's look at some use cases of bz2:

Since this is a data compression and decompression library, we will need some data! Consider the following raw binary string as our data for example:
bytes_string = b"Hello, I am learning bz2 compression and about to compress this string. This so exciting!"

Now, let's compress our string using bz2.compress() function:
>>> compressed_string = bz2.compress(byte_string)
>>> print(compressed_string)

Cool! But how do we decompress and get our string back? bz2 provides bz2.decompress() function:
>>> decompressed_string = bz2.decompress(compressed_string)
>>> print(decompressed_string)
b'hello, I am learning bz2 compression and about to compress this string. This is so exciting!'

To test whether our data is not mutated in the compression process, we can use a simple comparator operation, for instance:
>>> bytes_string == decompressed_string

This can be helpful when our data is large and cannot be manually compared!

Caution: We must use byte object in bz2.compress(). Otherwise, the following error occurs:
>>> string = "hello, I am learning bz2 compression and about to compress this string!"
>>> compressed_string = bz2.compress(string)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\..\miniconda3\lib\", line 334, in compress
    return comp.compress(string) + comp.flush()
TypeError: a bytes-like object is required, not 'str'

Foot Notes:

There are various features in the bz2 library which we will explore in the coming articles. We will cover:
  •  bz2.compress()and  bz2.decompress()functions.
  • BZ2Decompressor and BZ2Compressor classes -- very useful feature
  • open function and BZ2File class -- basic features for handling File I/O operations