C++ Trie Datastructure














































C++ Trie Datastructure



Tries:

Tries are an extremely special and useful data-structure that are based on the prefix of a string. They are used to represent the "Retrieval" of data and thus the name Trie.

Prefix : What is prefix:

The prefix of a string is nothing but any n letters n%u2264|S| that can be considered beginning strictly from the starting of a string. For example , the word "abacaba" has the following prefixes:

a
ab
aba
abac
abaca
abacab

A Trie is a special data structure used to store strings that can be visualized like a graph. It consists of nodes and edges. Each node consists of at max 26 children and edges connect each parent node to its children. These 26 pointers are nothing but pointers for each of the 26 letters of the English alphabet A separate edge is maintained for every edge.

Strings are stored in a top to bottom manner on the basis of their prefix in a trie. All prefixes of length 1 are stored at until level 1, all prefixes of length 2 are sorted at until level 2 and so on.

For example , consider the following diagram : enter image description here

Now, one would be wondering why to use a data structure such as a trie for processing a single string? Actually, Tries are generally used on groups of strings, rather than a single string. When given multiple strings , we can solve a variety of problems based on them. For example, consider an English dictionary and a single string s, find the prefix of maximum length from the dictionary strings matching the string s. Solving this problem using a naive approach would require us to match the prefix of the given string with the prefix of every other word in the dictionary and note the maximum. The is an expensive process considering the amount of time it would take. Tries can solve this problem in much more efficient way.

Before processing each Query of the type where we need to search the length of the longest prefix, we first need to add all the existing words into the dictionary. A Trie consists of a special node called the root node. This node doesn't have any incoming edges. It only contains 26 outgoing edfes for each letter in the alphabet and is the root of the Trie.

So, the insertion of any string into a Trie starts from the root node. All prefixes of length one are direct children of the root node. In addition, all prefixes of length 2 become children of the nodes existing at level one.

The pseudo code for insertion of a string into a tire would look as follows:

void insert(String s)
{
for(every char in string s)
{
if(child node belonging to current char is null)
{
child node
=new Node();
}
current_node
=child_node;
}
}

The pseudo code to check wether a single word exists in a dictionary of words or not is as follows:

boolean check(String s)
{
for(every char in String s)
{
if(child node is null)
{
return false;
}
}
return true;
}
Lets us see one program on trie

We're going to make our own Contacts application! The application must perform two types of operations:

  1. add name, where name is a string denoting a contact name. This must store name as a new contact in the application.
  2. find partial, where partial is a string denoting a partial name to search the application for. It must count the number of contacts starting with partial and print the count on a new line.

Given n sequential add and find operations, perform each operation in order.

Input Format:

The first line contains a single integer, n, denoting the number of operations to perform. 
Each line i of the n subsequent lines contains an operation in one of the two forms defined above.

Output Format

For each find partial operation, print the number of contact names starting with partial on a new line.

Below is the implementation in C++ :

#include<bits/stdc++.h>
#include<iostream>
using namespace std;
const int alpha=26;
struct node{
node* child[alpha];
bool end;
int countchild;
};
node* getnode(){
node* pnode=new node;
pnode->end=false;
pnode->countchild=0;
for(int i=0;i<alpha;i++){
pnode->child[i]=NULL;
}
return pnode;
}
void insert(node* root,string key){
node* pcrawl=root;
int size=key.length();
for(int i=0;i<size;i++){
int index=key[i]-'a';
if(!pcrawl->child[index]){
pcrawl->child[index]=getnode();
}
pcrawl->countchild+=1;
pcrawl=pcrawl->child[index];
}
pcrawl->end=true;
}

bool search(node* root,string key){
node* pcrawl=root;
int size=key.length();
for(int i=0;i<size;i++){
int index=key[i]-'a';
if(!pcrawl->child[index]){
return false;
}
pcrawl=pcrawl->child[index];
}
return (pcrawl!=NULL and pcrawl->end);
}
int searchchild(node* root,string key){
node* pcrawl=root;
int size=key.length();
for(int i=0;i<size;i++){
int index=key[i]-'a';
if(!pcrawl->child[index]){
return 0;
}
pcrawl=pcrawl->child[index];
}
if (pcrawl==NULL){
return 0;
}
else{
return pcrawl->countchild;
}
}
int main(){
int n;
string s,s1,s2;
node* root=getnode();
cin>>n;
cin.sync();
for(int i=0;i<n;i++){
getline(cin,s);
bool find=false;
s1="";
s2="";
for(unsigned int i=0;i<s.length();i++){
if(s[i]==' '){
find=true;
continue;
}
if(!find){
s1+=s[i];
}
else{
s2+=s[i];
}
}
if(s1.compare("add")==0){
insert(root,s2);
}
else{
cout<<searchchild(root,s2)<<endl;
}

}
}

Sample Output:

4
add hack
add hackerrank
find hac
find hak

  2

0



More Articles of M Mounika:

Name Views Likes
C++ Inplace Rotate square matrix by 90 degrees 788 0
C++ Introduction to Greedy Approach 2517 2
C++ Gold Mine Problem 1294 0
C++ Anagram substring Search 584 0
C++ Segment Trees 819 2
Disjoint Data Structures 597 1
C++ Greedy Approach-Activity Selection Problem 2804 1
C++ Trie Datastructure 3152 1
C++ Minimum Removal of Characters from string to make its permutation as palindrome 594 0
C++ Greedy Approach Minimum number of coins 1731 0
C++ Greedy Approach-Huffman Codes 5545 1
C++ Manachars Algorithm 1761 0
C++ smallest substring with maximum distinct characters 1436 0
C++ Trie Cost of Data 1186 1
C++ Greedy Approach Maximum product Subset 1006 1
C++ Greedy Approach Maximum height Pyramid 1022 1
C++ Greedy Approach-Egyptian Fractions 1456 0
C++ String Matching(KMP Algorithm) 996 0
C++ K Centers Problem 1240 0
C++ Find Non Repeating Elements in Array 1070 0
C++ Greedy Approach Minimum Product Subset 1170 1
C++ Merge K Sorted Arrays 662 0
C++ Number of Bracket Reversals needed to make expression Balanced 322 0
C++ Basics of String Manipulation 411 1
C++ Problem On Disjoint Data Structures 466 0
C++ Find Nth Catalan Number 1007 0
C++ Greedy Approach String lexicographically largest subsequence 1130 0
C++ Merge Two Binary Trees 705 0
C++ Remove Duplicates From String 3401 0
C++ Minimum characters added at front of string in palindrome conversion 413 0
Disjoint Data Structures Part2 420 0
C++ Greedy Approach Prims MST 1146 1
C++ Greedy Approach N-array maximum sum 776 1
C++ String Matching(Naive Algorithm) 1816 0
C++ Flood Fill Algorithm 2833 0
C++ WildCard pattern matching 2657 0
C++ String Matching(Z Algorithm) 640 0
C++ Meta Strings 685 0
C++ Sum of Numbers From Root To Leaf Paths 675 0
C++ Greedy Approach Lexicographically largest subsequence 902 0
C++ Disjoint Data Structures Part3 377 0
C++ Sieve Of Erastosthenes 534 0
C++ String to Palindrome with Append Function 579 0
C++ Search String in Grid 1583 0
C++ Greedy Approach Policemen Catch Thieves 2047 1
C++ Smallest window with all characters in string 901 0
C++ Greedy Approach Fractional Knapsack 2852 1
C++ Disjoint Data Structure Cycle Detection 569 0
C++ Segmented Sieve (Print Primes In a Range) 1940 0
C++ Greedy Approach Krushkals MST 946 1

Comments