Z-Function Algorithm: Substring Search
Agenda:
- Quick Intro
- Input
- Output
- Complexity
- Algo Logic
- C++ Code
- Real-Life use cases
- Practice Problem
- Conclusion
- Sources for further reading
Quick Intro
- The Z-function is a powerful string-matching algorithm used for pattern matching and searching within a given string. Its primary purpose is to find all occurrences of a specified pattern in the text.
- Definition: Given a string s of length n, the Z-function produces an array Z where Z[i] represents the length of the longest substring starting from s[i] that is also a prefix of s.
Input
- Text(T): The input to the Z-function algorithm is a single string
T
of lengthn
.
Output
- Z-Array(z): The output of the Z-function algorithm is an array
Z
of lengthn
where each elementZ[i]
represents the length of the longest substring starting from the positioni
inT
that matches a prefix ofT
.
Complexity
- Time: O(n)
- Space: O(n)
- Where, n ---> length of input Text.
Algo Logic
Analogy
Imagine you have a string and you're trying to measure how much of it, starting from each position, looks like the very beginning of the string.
Steps
- Full Match: The first position is the whole string.
- Slide: Move right, compare characters to the start.
- Save and Extend: If you’ve seen a match before, use that info to skip checks. If not, check afresh.
Remember
It's like measuring how much each part of a string copies the start of it. You slide through the string, match, and note down the lengths of these matches.
Visualization
Imagine reading a book and checking from each page how much the text matches the first page. That's what the z-function does for strings.
C++ Code
For better understanding, Let's first look at brute force implementation of it:
1) Bruteforce O(N^2) --
We iterate through each position i
and update z[i]
based on matches with previous characters.
// O(n^2).
vector<int> z_function(const string &s)
{
int n = s.size();
vector<int> z(n, 0);
for (int i=1; i<n; i++) {
while ( i+z[i] < n && s[z[i]] == s[i + z[i]] )
z[i]++;
}
return z;
}
2) Linear Time -- O(N):
// O(n).
vector<int> z_function(const string& s)
{
int n = s.size();
vector<int> z(n);
int l = 0, r = 0;
for ( int i=1; i<n; i++ )
{
if ( i < r )
z[i] = min(r - i, z[i - l]);
while(i + z[i] < n && s[z[i]] == s[i + z[i]])
z[i]++;
if ( i+z[i] > r ) {
l = i;
r = i + z[i];
}
}
return z;
}
Real-Life Use Cases
- Text Editors for Pattern Matching:
- It's employed in search engines and text editors to quickly locate substrings.
- Bioinformatics:
- Used in DNA sequence analysis to find motifs and repetitive elements.
- Data Compression:
- Helps in identifying repeated substrings which can be used for efficient data compression.
Practice Problem
- Here is a nice problem to apply this learning: https://leetcode.com/problems/minimum-time-to-revert-word-to-initial-state-ii/
- It was featured in Leet Code Weekly contests 383
- You can find the solutions for it here: 2 Solutions | Prefix Sum, Z-Function - LeetCode Discuss
Conclusion
The Z-function algorithm is a fundamental tool in string processing with applications in various fields including text processing, bioinformatics, and data compression. Its linear time complexity makes it suitable for handling large datasets efficiently.
Comments
Post a Comment