### C++ Greedy Approach-Huffman Codes

Huffman Codes:

Huffman coding is a lossless data compression algorithm. In this algorithm, a variable-length code is assigned  to input different characters. The code length is related to how frequently characters are used. Most frequent characters have  the smallest codes and longer codes for least frequent characters.

Huffman  codes are constructed with the help of prefix codes.

Prefix Codes, means the codes (bit sequences) are assigned in such a way that the code assigned to one character is not the prefix of code assigned to any other character. This is how Huffman Coding makes sure that there is no ambiguity when decoding the generated bitstream.
Let us understand prefix codes with a counter example. Let there be four characters a, b, c and d, and their corresponding variable length codes be 00, 01, 0 and 1. This coding leads to ambiguity because code assigned to c is the prefix of codes assigned to a and b. If the compressed bit stream is 0001, the de-compressed output may be cccd or ccb or
or acd or ab .

There are mainly two major parts in Huffman Coding:

1) Build a Huffman Tree from input characters.
2) Traverse the Huffman Tree and assign codes to characters.

Steps to build Huffman Tree:

Input is an array of unique characters along with their frequency of occurrences and output is Huffman Tree.

1. Create  a leaf node for each unique character and build a min heap of all leaf nodes (Min Heap is used as a priority queue. The value of frequency field is used to compare two nodes in min heap. Initially, the least frequent character is at root)

2. Extract two nodes with the minimum frequency from the min heap.

3. Create a new internal node with a frequency equal to the sum of the two nodes frequencies. Make the first extracted node as its left child and the other extracted node as its right child. Add this node to the min heap.

4. Repeat  steps#2 and #3 until the heap contains only one node. The remaining node is the
root node and the tree is complete.

Let us understand the algorithm
with an example:

character   Frequency

-----------------------------

a            10

b            5

c            14

d            14

e             7

f              9

Step 1. Build a min heap that contains 6 nodes
where each node represents root of a tree with single node.

Step 2 Extract
two minimum frequency nodes from min heap. Add a new internal node with
frequency 5 + 7 = 12.

Now min heap contains 5 nodes where 1 node is internal node containing b and c nodes with combined frequency 12

character                                 Frequency
a                                           10
Internal Node(be)    12
d                                           14
c                   14
f                                            9

Step 3: Extract two minimum frequency nodes from heap. Add a new internal node with frequency 10+9=19

Now min heap contains 4 nodes where 2 nodes are roots of trees with single element each, and two heap nodes are root of tree with more than one nodes.

character           Frequency
Internal Node          12
Internal Node          19
c                14
d                14

Step 4: Extract two minimum frequency nodes. Add a new internal
node with frequency 14 + 12 = 26

Now min heap contains 3 nodes.

character          Frequency
Internal Node         19
Internal Node         26
d                       14

Step 5: Extract
two minimum frequency nodes. Add a new internal node with frequency 19 + 14= 33

Now min heap contains 2 nodes.

character           Frequency
Internal Node         26
Internal Node          33

Step 6: Extract
two minimum frequency nodes. Add a new internal node with frequency 33+26=59

Now min heap contains only one node.

character      Frequency
Internal Node    59

Since the heap contains only one node, the algorithm stops here.

Steps to print codes from Huffman Tree:
Traverse the tree formed starting from the root. Maintain an auxiliary array. While moving to the left child, write 0 to the array. While moving to the right child, write 1 to the array. Print the array when a leaf node is encountered.
The codes are as follows:

character   code-word
b          000
e          001
c          01
d          10
a          110
f          111

Below is the implementation of
above approach:

// C++ program for Huffman Coding

#include <bits/stdc++.h>

using namespace std;

// A Huffman tree node

struct MinHeapNode {

// One of the input characters

char data;

// Frequency of the character

unsigned freq;

// Left and right child

MinHeapNode *left, *right;

MinHeapNode(char data, unsigned freq)

{

left = right = NULL;

this->data = data;

this->freq = freq;

}

};

// For comparison of

// two heap nodes (needed in min heap)

struct compare {

bool operator()(MinHeapNode* l, MinHeapNode* r)

{

return (l->freq > r->freq);

}

};

// Prints huffman codes from
// the root of Huffman Tree.
void printCodes(struct MinHeapNode* root, string str)
{

if (!root)
return;

if (root->data != '\$')
cout << root->data << ": " << str << "\n";

printCodes(root->left, str + "0");
printCodes(root->right, str + "1");
}

// The main function that builds a Huffman Tree and
// print codes by traversing the built Huffman Tree
void HuffmanCodes(char data[], int freq[], int size)
{
struct MinHeapNode *left, *right, *top;

// Create a min heap & inserts all characters of data[]
priority_queue<MinHeapNode*, vector<MinHeapNode*>, compare> minHeap;

for (int i = 0; i < size; ++i)
minHeap.push(new MinHeapNode(data[i], freq[i]));

// Iterate while size of heap doesn't become 1
while (minHeap.size() != 1) {

// Extract the two minimum
// freq items from min heap
left = minHeap.top();
minHeap.pop();

right = minHeap.top();
minHeap.pop();

// Create a new internal node with
// frequency equal to the sum of the
// two nodes frequencies. Make the
// two extracted node as left and right children
// of this new node. Add this node
// to the min heap '\$' is a special value
// for internal nodes, not used
top = new MinHeapNode('\$', left->freq + right->freq);

top->left = left;
top->right = right;

minHeap.push(top);
}

// Print Huffman codes using
// the Huffman tree built above
printCodes(minHeap.top(), "");
}

// Driver program to test above functions
int main()
{

char arr[] = { 'a', 'b', 'c', 'd', 'e', 'f' };
int freq[] = { 10, 5, 14, 14, 7, 9 };

int size = sizeof(arr) / sizeof(arr[0]);

HuffmanCodes(arr, freq, size);

return 0;
}

----------------------------------------------------------------------------
------------------------------------------------------------------------------------
Output:
b:000
e:001
c:01
d:10
f:111
a:110

#### More Articles of M Mounika:

Name Views Likes
C++ Inplace Rotate square matrix by 90 degrees 788 0
C++ Introduction to Greedy Approach 2517 2
C++ Gold Mine Problem 1294 0
C++ Anagram substring Search 584 0
C++ Segment Trees 819 2
Disjoint Data Structures 597 1
C++ Greedy Approach-Activity Selection Problem 2804 1
C++ Trie Datastructure 3151 1
C++ Minimum Removal of Characters from string to make its permutation as palindrome 594 0
C++ Greedy Approach Minimum number of coins 1731 0
C++ Greedy Approach-Huffman Codes 5545 1
C++ Manachars Algorithm 1761 0
C++ smallest substring with maximum distinct characters 1436 0
C++ Trie Cost of Data 1186 1
C++ Greedy Approach Maximum product Subset 1006 1
C++ Greedy Approach Maximum height Pyramid 1022 1
C++ Greedy Approach-Egyptian Fractions 1456 0
C++ String Matching(KMP Algorithm) 996 0
C++ K Centers Problem 1240 0
C++ Find Non Repeating Elements in Array 1070 0
C++ Greedy Approach Minimum Product Subset 1170 1
C++ Merge K Sorted Arrays 662 0
C++ Number of Bracket Reversals needed to make expression Balanced 322 0
C++ Basics of String Manipulation 411 1
C++ Problem On Disjoint Data Structures 466 0
C++ Find Nth Catalan Number 1007 0
C++ Greedy Approach String lexicographically largest subsequence 1130 0
C++ Merge Two Binary Trees 705 0
C++ Remove Duplicates From String 3401 0
C++ Minimum characters added at front of string in palindrome conversion 413 0
Disjoint Data Structures Part2 420 0
C++ Greedy Approach Prims MST 1146 1
C++ Greedy Approach N-array maximum sum 776 1
C++ String Matching(Naive Algorithm) 1816 0
C++ Flood Fill Algorithm 2833 0
C++ WildCard pattern matching 2657 0
C++ String Matching(Z Algorithm) 640 0
C++ Meta Strings 685 0
C++ Sum of Numbers From Root To Leaf Paths 675 0
C++ Greedy Approach Lexicographically largest subsequence 902 0
C++ Disjoint Data Structures Part3 377 0
C++ Sieve Of Erastosthenes 534 0
C++ String to Palindrome with Append Function 579 0
C++ Search String in Grid 1583 0
C++ Greedy Approach Policemen Catch Thieves 2047 1
C++ Smallest window with all characters in string 901 0
C++ Greedy Approach Fractional Knapsack 2852 1
C++ Disjoint Data Structure Cycle Detection 569 0
C++ Segmented Sieve (Print Primes In a Range) 1940 0
C++ Greedy Approach Krushkals MST 946 1