Z-Function Algorithm: Substring Search

Agenda:

  1. Quick Intro
  2. Input
  3. Output
  4. Complexity
  5. Algo Logic
  6. C++ Code
  7. Real-Life use cases
  8. Practice Problem
  9. Conclusion
  10. Sources for further reading

Quick Intro

  • The Z-function is a powerful string-matching algorithm used for pattern matching and searching within a given string. Its primary purpose is to find all occurrences of a specified pattern in the text.
  • Definition: Given a string s of length n, the Z-function produces an array Z where Z[i] represents the length of the longest substring starting from s[i] that is also a prefix of s.

Input

  • Text(T): The input to the Z-function algorithm is a single string T of length n.

Output

  • Z-Array(z): The output of the Z-function algorithm is an array Z of length n where each element Z[i] represents the length of the longest substring starting from the position i in T that matches a prefix of T.

Complexity

  • Time: O(n)
  • Space: O(n)
    • Where, n ---> length of input Text.

Algo Logic

Analogy

Imagine you have a string and you're trying to measure how much of it, starting from each position, looks like the very beginning of the string.

Steps
  1. Full Match: The first position is the whole string.
  2. Slide: Move right, compare characters to the start.
  3. Save and Extend: If you’ve seen a match before, use that info to skip checks. If not, check afresh.
Remember

It's like measuring how much each part of a string copies the start of it. You slide through the string, match, and note down the lengths of these matches.

Visualization

Imagine reading a book and checking from each page how much the text matches the first page. That's what the z-function does for strings.

C++ Code

For better understanding, Let's first look at brute force implementation of it:

1) Bruteforce O(N^2) --

We iterate through each position i and update z[i] based on matches with previous characters.

// O(n^2).
vector<int> z_function(const string &s)
{
    int n = s.size();
    vector<int> z(n, 0);

    for (int i=1; i<n; i++) {
        while ( i+z[i] < n && s[z[i]] == s[i + z[i]] )
            z[i]++;
    }
    return z;
}

2) Linear Time -- O(N):

// O(n).
vector<int> z_function(const string& s) 
{
    int n = s.size();
    vector<int> z(n);
    int l = 0, r = 0;

    for ( int i=1; i<n; i++ ) 
    {
        if ( i < r )
            z[i] = min(r - i, z[i - l]);

        while(i + z[i] < n && s[z[i]] == s[i + z[i]])
            z[i]++;

        if ( i+z[i] > r ) {
            l = i;
            r = i + z[i];
        }
    }
    return z;
}

Real-Life Use Cases

  1. Text Editors for Pattern Matching:
    • It's employed in search engines and text editors to quickly locate substrings.
  2. Bioinformatics:
    • Used in DNA sequence analysis to find motifs and repetitive elements.
  3. Data Compression:
    • Helps in identifying repeated substrings which can be used for efficient data compression.

Practice Problem

Conclusion

The Z-function algorithm is a fundamental tool in string processing with applications in various fields including text processing, bioinformatics, and data compression. Its linear time complexity makes it suitable for handling large datasets efficiently.

Sources for Further Reading

Comments

Popular posts from this blog

KMP Algorithm: Pattern Searching in Text

Back of the envelope estimations