C++ Greedy Approach-Huffman Codes














































C++ Greedy Approach-Huffman Codes




Huffman Codes:

Huffman coding is a lossless data compression algorithm. In this algorithm, a variable-length code is assigned  to input different characters. The code length is related to how frequently characters are used. Most frequent characters have  the smallest codes and longer codes for least frequent characters.

Huffman  codes are constructed with the help of prefix codes.

Prefix Codes, means the codes (bit sequences) are assigned in such a way that the code assigned to one character is not the prefix of code assigned to any other character. This is how Huffman Coding makes sure that there is no ambiguity when decoding the generated bitstream.
Let us understand prefix codes with a counter example. Let there be four characters a, b, c and d, and their corresponding variable length codes be 00, 01, 0 and 1. This coding leads to ambiguity because code assigned to c is the prefix of codes assigned to a and b. If the compressed bit stream is 0001, the de-compressed output may be cccd or ccb or
or acd or ab .

There are mainly two major parts in Huffman Coding:

1) Build a Huffman Tree from input characters.
2) Traverse the Huffman Tree and assign codes to characters.

Steps to build Huffman Tree:

Input is an array of unique characters along with their frequency of occurrences and output is Huffman Tree.

1. Create  a leaf node for each unique character and build a min heap of all leaf nodes (Min Heap is used as a priority queue. The value of frequency field is used to compare two nodes in min heap. Initially, the least frequent character is at root)

2. Extract two nodes with the minimum frequency from the min heap.

3. Create a new internal node with a frequency equal to the sum of the two nodes frequencies. Make the first extracted node as its left child and the other extracted node as its right child. Add this node to the min heap.

4. Repeat  steps#2 and #3 until the heap contains only one node. The remaining node is the
root node and the tree is complete.

Let us understand the algorithm
with an example:


character   Frequency

-----------------------------

    a            10

    b            5

    c            14

    d            14

    e             7

    f              9

Step 1. Build a min heap that contains 6 nodes
where each node represents root of a tree with single node.

Step 2 Extract
two minimum frequency nodes from min heap. Add a new internal node with
frequency 5 + 7 = 12.




Now min heap contains 5 nodes where 1 node is internal node containing b and c nodes with combined frequency 12


character                                 Frequency
       a                                           10
       Internal Node(be)    12
       d                                           14
    c                   14
       f                                            9


Step 3: Extract two minimum frequency nodes from heap. Add a new internal node with frequency 10+9=19



Now min heap contains 4 nodes where 2 nodes are roots of trees with single element each, and two heap nodes are root of tree with more than one nodes.

character           Frequency
Internal Node          12
Internal Node          19 
      c                14
      d                14

Step 4: Extract two minimum frequency nodes. Add a new internal
node with frequency 14 + 12 = 26



Now min heap contains 3 nodes.

character          Frequency
Internal Node         19
Internal Node         26
      d                       14

Step 5: Extract
two minimum frequency nodes. Add a new internal node with frequency 19 + 14= 33

Now min heap contains 2 nodes.

character           Frequency
 Internal Node         26
Internal Node          33

Step 6: Extract
two minimum frequency nodes. Add a new internal node with frequency 33+26=59

Now min heap contains only one node.

character      Frequency
Internal Node    59

Since the heap contains only one node, the algorithm stops here.

Steps to print codes from Huffman Tree:
Traverse the tree formed starting from the root. Maintain an auxiliary array. While moving to the left child, write 0 to the array. While moving to the right child, write 1 to the array. Print the array when a leaf node is encountered. 
The codes are as follows:

character   code-word
    b          000
    e          001
    c          01
    d          10
    a          110
    f          111

Below is the implementation of
above approach:

// C++ program for Huffman Coding

#include <bits/stdc++.h>

using namespace std;

 

// A Huffman tree node

struct MinHeapNode {

    // One of the input characters

    char data;


    // Frequency of the character

    unsigned freq;

    // Left and right child

    MinHeapNode *left, *right;


    MinHeapNode(char data, unsigned freq)

    {


        left = right = NULL;

        this->data = data;

        this->freq = freq;

    }

  };

 

// For comparison of

// two heap nodes (needed in min heap)

struct compare {

 

    bool operator()(MinHeapNode* l, MinHeapNode* r)

   {

  return (l->freq > r->freq);

 }

};

// Prints huffman codes from
// the root of Huffman Tree.
void printCodes(struct MinHeapNode* root, string str)
{

    if (!root)
        return;

    if (root->data != '$')
        cout << root->data << ": " << str << "\n";

    printCodes(root->left, str + "0");
    printCodes(root->right, str + "1");
}

// The main function that builds a Huffman Tree and
// print codes by traversing the built Huffman Tree
void HuffmanCodes(char data[], int freq[], int size)
{
    struct MinHeapNode *left, *right, *top;

    // Create a min heap & inserts all characters of data[]
    priority_queue<MinHeapNode*, vector<MinHeapNode*>, compare> minHeap;

    for (int i = 0; i < size; ++i)
        minHeap.push(new MinHeapNode(data[i], freq[i]));

    // Iterate while size of heap doesn't become 1
    while (minHeap.size() != 1) {

        // Extract the two minimum
        // freq items from min heap
        left = minHeap.top();
        minHeap.pop();

        right = minHeap.top();
        minHeap.pop();

        // Create a new internal node with
        // frequency equal to the sum of the
        // two nodes frequencies. Make the
        // two extracted node as left and right children
        // of this new node. Add this node
        // to the min heap '$' is a special value
        // for internal nodes, not used
        top = new MinHeapNode('$', left->freq + right->freq);

        top->left = left;
        top->right = right;

        minHeap.push(top);
    }

    // Print Huffman codes using
    // the Huffman tree built above
    printCodes(minHeap.top(), "");
}

// Driver program to test above functions
int main()
{

    char arr[] = { 'a', 'b', 'c', 'd', 'e', 'f' };
    int freq[] = { 10, 5, 14, 14, 7, 9 };

    int size = sizeof(arr) / sizeof(arr[0]);

    HuffmanCodes(arr, freq, size);

    return 0;
}

----------------------------------------------------------------------------
------------------------------------------------------------------------------------
Output:
b:000
e:001
c:01
d:10
f:111
a:110

 


More Articles of M Mounika:

Name Views Likes
C++ Segmented Sieve (Print Primes In a Range) 146 0
C++ Sieve Of Erastosthenes 105 0
C++ Gold Mine Problem 247 0
C++ Merge K Sorted Arrays 97 0
C++ K Centers Problem 217 0
C++ Find Nth Catalan Number 289 0
C++ Inplace Rotate square matrix by 90 degrees 259 0
C++ Find Non Repeating Elements in Array 63 0
C++ Merge Two Binary Trees 90 0
C++ Sum of Numbers From Root To Leaf Paths 70 0
C++ Meta Strings 76 0
C++ Flood Fill Algorithm 375 0
C++ smallest substring with maximum distinct characters 113 0
C++ Smallest window with all characters in string 72 0
C++ Minimum Removal of Characters from string to make its permutation as palindrome 71 0
C++ Minimum characters added at front of string in palindrome conversion 55 0
C++ Number of Bracket Reversals needed to make expression Balanced 59 0
C++ String to Palindrome with Append Function 58 0
C++ WildCard pattern matching 61 0
C++ Anagram substring Search 55 0
C++ Manachars Algorithm 60 0
C++ Search String in Grid 66 0
C++ String Matching(Z Algorithm) 55 0
C++ String Matching(Naive Algorithm) 82 0
C++ String Matching(KMP Algorithm) 118 0
C++ Remove Duplicates From String 71 0
C++ Basics of String Manipulation 71 1
C++ Disjoint Data Structure Cycle Detection 73 0
C++ Problem On Disjoint Data Structures 65 0
C++ Disjoint Data Structures Part3 65 0
Disjoint Data Structures Part2 79 0
Disjoint Data Structures 81 1
C++ Segment Trees 291 2
C++ Trie Cost of Data 259 1
C++ Trie Datastructure 251 1
C++ Greedy Approach Minimum number of coins 457 0
C++ Greedy Approach Maximum height Pyramid 293 1
C++ Greedy Approach String lexicographically largest subsequence 202 0
C++ Greedy Approach Lexicographically largest subsequence 339 0
C++ Greedy Approach Prims MST 360 1
C++ Greedy Approach Krushkals MST 421 1
C++ Greedy Approach N-array maximum sum 314 1
C++ Greedy Approach Policemen Catch Thieves 539 1
C++ Greedy Approach Maximum product Subset 519 1
C++ Greedy Approach Minimum Product Subset 319 1
C++ Greedy Approach Fractional Knapsack 655 1
C++ Greedy Approach-Activity Selection Problem 653 1
C++ Greedy Approach-Egyptian Fractions 592 0
C++ Greedy Approach-Huffman Codes 815 1
C++ Introduction to Greedy Approach 923 2

Comments