Home / Competitive programming using C++

Pattern Searching (KMP Algorithm, Rabin-Karp Algorithm)

Pattern searching is a fundamental problem in computer science and is widely used in various applications such as text search engines, data mining, and bioinformatics. It involves finding occurrences of a specific pattern within a larger text or string. In this article, we will explore two popular algorithms for pattern searching: the Knuth-Morris-Pratt (KMP) algorithm and the Rabin-Karp algorithm.

Knuth-Morris-Pratt (KMP) Algorithm

The KMP algorithm is an efficient pattern searching algorithm that avoids unnecessary comparisons by utilizing information about previously matched characters. It achieves this by constructing a prefix array (also known as the longest proper prefix that is also a suffix) of the pattern. The array helps to skip over characters that have already been matched, thereby reducing the number of comparisons required.

The KMP algorithm works as follows:

Construct the prefix array, which stores the longest proper prefix for each substring of the pattern.
Start comparing the characters of the pattern and the text from left to right.
If a mismatch occurs, use the information from the prefix array to determine the next character to be compared.
Continue the process until a match is found or the end of the text is reached.

The time complexity of the KMP algorithm is O(n), where n is the length of the text.

Rabin-Karp Algorithm

The Rabin-Karp algorithm is another efficient pattern searching algorithm based on hashing. It uses a rolling hash function to calculate the hash value of the pattern and sliding window technique to compare the hash values of substrings in the text. By using a rolling hash, the Rabin-Karp algorithm avoids recomputing the entire hash value for each substring, resulting in improved performance.

The Rabin-Karp algorithm works as follows:

Calculate the hash value of the pattern.
Calculate the first hash value of the substring of the text whose length is equal to the pattern's length.
Compare the hash value of the pattern with the hash value of the current substring. If they match, perform a character-by-character comparison.
If there is a mismatch, calculate the hash value of the next substring by using the rolling hash function.
Repeat steps 3 and 4 until a match is found or the end of the text is reached.

The Rabin-Karp algorithm's time complexity is O(m + n), where m is the length of the pattern and n is the length of the text. However, the worst-case time complexity can be O(mn) if the rolling hash function produces a lot of spurious matches.

Conclusion

Pattern searching is a crucial task in many applications, and efficient algorithms like the KMP and Rabin-Karp algorithms are essential for achieving optimal performance. The KMP algorithm utilizes the prefix array to avoid unnecessary comparisons, while the Rabin-Karp algorithm leverages hashing and sliding window techniques for faster pattern matching. By understanding and implementing these algorithms, programmers can efficiently search for patterns in text or strings, enabling various applications to perform tasks like search, data retrieval, and pattern recognition more effectively.