Library_Tarfile














































Library_Tarfile



INTRODUCTION-

 

Python tarfile module is used to read and write tar archives. Python provides us excellent tools and modules to manage compressed files, which includes performing file and directory compression with different mechanisms like gzip, bz2 and lzma compression.

 

"Tar" is an archiving format that has become popular in the open source world and it takes several files and bundles them into one file. It's a popular method for both archiving purposes and for sending multiple files over the internet, like for software downloads. Python's standard library comes with a module which makes creating and extracting tar_file very simple.

There are three tar formats that can be created with the tarfile module: 

 

1.      The POSIX.1-1988 us tar format (USTAR_FORMAT)-            

                                                                               It supports file names  up to a length of at best 256 characters and link names up to 100 characters. The maximum file size is 8 GB. This is an old and limited but widely supported format. 

2.      The GNU tar format (GNU_FORMAT)-

                                                         It supports long filenames and link names, files bigger than 8 GB and sparse files. It is the de facto standard on GNU/Linux systems, tarfile fully supports the GNU tar extensions for long names, and sparse file support is read-only. 

3.      The POSIX.1-2001 pax format (PAX_FORMAT)-

                                                                        It is the most flexible format with virtually no limits. It supports long filenames and link names, large files and stores pathnames in a portable way. However, not all tar implementations today are able to handle pax archives properly.

 

Read and write tar archive files using Python (tarfile)


Open()-

This function returns a TarFile object corresponding to file name which is provided to it as parameter.The function requires another parameter called mode, which by default is %u2018r%u2019 indicating no compression. Other modes are listed below

 

Sr. No.

Modes & Action

1.

'r' or 'r:*'

Open for reading with transparent compression.

2.

'r:'

Open for reading without compression.

3.

'r:gz'

Open for reading with gzip compression.

4.

'r:bz2'

Open for reading with bzip2 compression.

5.

'r:xz'

Open for reading with lzma compression.

6.

'x' or 'x:'

Create a tar_file exclusively without compression.

7.

'x:gz'

Create a tar_file with gzip compression.

9.

'x:xz'

Create a tar_file with lzma compression.

10.

'a' or 'a:'

Open for appending with no compression.

11.

'w' or 'w:'

Open for uncompressed writing.

12.

'w:gz'

Open for gzip compressed writing.

13.

'w:bz2'

Open for bzip2 compressed writing.

14.

'w:xz'

Open for lzma compressed writing.

 

TarFile Object-

 It provides an interface to a tar archive. A tar archive is a sequence of blocks.An archive member is made up of a header block followed by data blocks. It is possible to store a file in a tar archive several time.

                A TarFile object can be used as a context manager in with statement. It will automatically be closed when the block is completed.

Following attributes are:-

TarFile.getmembers()

Return the members of the archive as a list of TarInfo objects. The list has the same order as the members in the archive.

TarFile.getnames()

Return the members as a list of their names. It has the same order as the list returned by getmembers().

TarFile.list()

Print a table of contents. If it is False, only the names of the members are printed. If it is True, output similar to that of is -l is produced.

TarFile.next()

Return the next member of the archive as a TarInfo object, when Tar_File is opened for reading. Return none if there is no more available.

TarFile.extractall()

Extract all members from the archive to the current working directory or directory path.

TarFile.extract()

Extract a member from the archive to the current working directory, using its full name.

TarFile.close()

         Close the TarFile.


TarInfo Objects-


It provides an interface to a tar archive. A tar archive is a sequence of  blocks. An archive member is made up of a header block followed by data blocks. It is possible to store a file in a tar archive several time.

                A TarFile object can be used as a context manager in with statement. It will automatically be closed when the block is completed.


A TarInfo objects has the following data attributes:


TarInfo.name()

Name of the archive member.

TarInfo.size()

Size in bytes.

TarInfo.mtime()

Time of last modification.

TarInfo.mode()

Permission bits..

TarInfo.uid()

User ID of the user who originally stored this member.

TarInfo.gid()

Group ID of the user who originally stored this member.

TarInfo.uname()

          User name.

TarInfo.gname()

Group name.

TarInfo.isfile()

 Return True if the Tarinfo object is a regular file.

TarInfo.isreg()

 Same as isfile().

TarInfo.isdir()

 Return True if it is a directory.

TarInfo.issym()

 Return True if it is a symbolic link.

TarInfo.islnk()

 Return True if it is a hard link.

TarInfo.ischr()

 Return True if it is a character device.

TarInfo.isblk()

 Return True if it is a block device.

TarInfo.isfifo()

 Return True if it is a FIFO.

TarInfo.isdev()

 Return True if it is one of character device, block device or FIFO.

 

EXAMPLES-


How to: Check if a file is a tar file

import os
import tarfile
def name(tar_file_name):
    print (tarfile.is_tarfile(tar_file_name))
name("simplejson-3.17.0.tar.gz")


Output:-




How to: Extract all files from a tar file


import os
import tarfile

def files(m):
    for tarinfo in m:
        if os.path.splitext(tarinfo.name)[1] == ".py":
            yield tarinfo

tar = tarfile.open("simplejson-3.17.0.tar.gz")
tar.extractall(members=files(tar))
print("extracted successfully""\nsee in folder where simplejson-3.17.0.tar.gz folder is stored")

tar.close()


Output:-




Ø You can see

Simplejson-3.17.0 folder after extraction from Simplejson-3.17.0.tar.gz folder




All files inside Simplejson-3.17.0 folder






Comments

  • Sumit
    1-Apr-2020 10:40:12 PM
    It's a useful article