C++ Anagram substring Search














































C++ Anagram substring Search



Anagram Substring Search 


Given a text txt[0..n-1] and a pattern pat[0..m-1], write a function search(char pat[], char txt[]) that prints all occurrences of pat[] and its permutations (or anagrams) in txt[]. You may assume that n > m.
Expected time complexity is O(n)

Examples:

1) Input:  txt[] = "BACDGABCDA"  pat[] = "ABCD"
Output: Found at Index 0
Found at Index 5
Found at Index 6
2) Input: txt[] = "AAABABAA" pat[] = "AABA"
Output: Found at Index 0
Found at Index 1
Found at Index 4

This problem is slightly different from standard pattern searching problem, here we need to search for anagrams as well. Therefore, we cannot directly apply standard pattern searching algorithms like KMPRabin KarpBoyer Moore, etc.

A simple idea is to modify Rabin Karp Algorithm. For example we can keep the hash value as sum of ASCII values of all characters under modulo of a big prime number. For every character of text, we can add the current character to hash value and subtract the first character of previous window. This solution looks good, but like standard Rabin Karp, the worst case time complexity of this solution is O(mn). The worst case occurs when all hash values match and we one by one match all characters.

We can achieve O(n) time complexity under the assumption that alphabet size is fixed which is typically true as we have maximum 256 possible characters in ASCII. The idea is to use two count arrays:

1) The first count array store frequencies of characters in pattern.
2) The second count array stores frequencies of characters in current window of text.

The important thing to note is, time complexity to compare two count arrays is O(1) as the number of elements in them are fixed (independent of pattern and text sizes). Following are steps of this algorithm.
1) Store counts of frequencies of pattern in first count array countP[]. Also store counts of frequencies of characters in first window of text in array countTW[].

2) Now run a loop from i = M to N-1. Do following in loop.
.........a) If the two count arrays are identical, we found an occurrence.
.........b) Increment count of current character of text in countTW[]
.........c) Decrement count of first character in previous window in countWT[]

3) The last window is not checked by above loop, so explicitly check it.

Following is the implementation of above algorithm.

// C++ program to search all anagrams of a pattern in a text
#include<iostream>
#include<cstring>
#define MAX 256
using namespace std;

// This function returns true if contents of arr1[] and arr2[]
// are same, otherwise false.
bool compare(char arr1[], char arr2[])
{
    for (int i=0; i<MAX; i++)
        if (arr1[i] != arr2[i])
            return false;
    return true;
}

// This function search for all permutations of pat[] in txt[]
void search(char *pat, char *txt)
{
    int M = strlen(pat), N = strlen(txt);

    // countP[]: Store count of all characters of pattern
    // countTW[]: Store count of current window of text
    char countP[MAX] = {0}, countTW[MAX] = {0};
    for (int i = 0; i < M; i++)
    {
        (countP[pat[i]])++;
        (countTW[txt[i]])++;
    }

    // Traverse through remaining characters of pattern
    for (int i = M; i < N; i++)
    {
        // Compare counts of current window of text with
        // counts of pattern[]
        if (compare(countP, countTW))
            cout << "Found at Index " << (i - M) << endl;

        // Add current character to current window
        (countTW[txt[i]])++;

        // Remove the first character of previous window
        countTW[txt[i-M]]--;
    }

    // Check for the last window in text
    if (compare(countP, countTW))
        cout << "Found at Index " << (N - M) << endl;
}

/* Driver program to test above function */
int main()
{
    char txt[] = "BACDGABCDA";
    char pat[] = "ABCD";
    search(pat, txt);
    return 0;
}


Output:

Found at Index 0
Found at Index 5
Found at Index 6



More Articles of M Mounika:

Name Views Likes
C++ Segmented Sieve (Print Primes In a Range) 162 0
C++ Sieve Of Erastosthenes 135 0
C++ Gold Mine Problem 294 0
C++ Merge K Sorted Arrays 116 0
C++ K Centers Problem 239 0
C++ Find Nth Catalan Number 311 0
C++ Inplace Rotate square matrix by 90 degrees 285 0
C++ Find Non Repeating Elements in Array 86 0
C++ Merge Two Binary Trees 120 0
C++ Sum of Numbers From Root To Leaf Paths 88 0
C++ Meta Strings 91 0
C++ Flood Fill Algorithm 402 0
C++ smallest substring with maximum distinct characters 199 0
C++ Smallest window with all characters in string 93 0
C++ Minimum Removal of Characters from string to make its permutation as palindrome 86 0
C++ Minimum characters added at front of string in palindrome conversion 69 0
C++ Number of Bracket Reversals needed to make expression Balanced 72 0
C++ String to Palindrome with Append Function 83 0
C++ WildCard pattern matching 75 0
C++ Anagram substring Search 72 0
C++ Manachars Algorithm 74 0
C++ Search String in Grid 83 0
C++ String Matching(Z Algorithm) 67 0
C++ String Matching(Naive Algorithm) 113 0
C++ String Matching(KMP Algorithm) 140 0
C++ Remove Duplicates From String 110 0
C++ Basics of String Manipulation 84 1
C++ Disjoint Data Structure Cycle Detection 86 0
C++ Problem On Disjoint Data Structures 94 0
C++ Disjoint Data Structures Part3 78 0
Disjoint Data Structures Part2 90 0
Disjoint Data Structures 93 1
C++ Segment Trees 321 2
C++ Trie Cost of Data 290 1
C++ Trie Datastructure 278 1
C++ Greedy Approach Minimum number of coins 525 0
C++ Greedy Approach Maximum height Pyramid 328 1
C++ Greedy Approach String lexicographically largest subsequence 246 0
C++ Greedy Approach Lexicographically largest subsequence 364 0
C++ Greedy Approach Prims MST 398 1
C++ Greedy Approach Krushkals MST 457 1
C++ Greedy Approach N-array maximum sum 333 1
C++ Greedy Approach Policemen Catch Thieves 563 1
C++ Greedy Approach Maximum product Subset 546 1
C++ Greedy Approach Minimum Product Subset 348 1
C++ Greedy Approach Fractional Knapsack 737 1
C++ Greedy Approach-Activity Selection Problem 744 1
C++ Greedy Approach-Egyptian Fractions 639 0
C++ Greedy Approach-Huffman Codes 1030 1
C++ Introduction to Greedy Approach 955 2

Comments