What I Learned From Codeforces 7D

Codeforces is a really amazing place to dig these carefully designed contest problems.

They are concise (I just ignored the messy ones) and kind of hard. The type I like.

7D is basically asking to find all the prefixes of a string, which are palindromes. (The degree part is purely BS.)

The first approach of course, I instinctively thought about precompute a suffix tree/LCA to have a O(1) time complexity answering machine about whether a substring is a palindrome.

The problem is, I don’t know how to build a suffix tree. (Or, I know, it is in the book Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology but I cannot memorize it and I don’t think I can complete it in a reasonable amount of time)

So I turned to suffix array.

Suffix Array and LCP (Longest Common Prefix) constructions are both very interesting problems. Even the O(nlognlogn) construction algorithm is very thoughtful. Some people call it stupid – to me, it is only stupid if you try to use it to construct a suffix array.

There is a simple suffix array construction algorithm with time complexity O(nlogn), and LCP at O(n). n is the length of the string.

The basic idea is iterative improvement. I’m always amazed by iterative improvement algorithms because they see the problem as a whole. Think about some simple dynamic programming examples like LCS and word distance,  they always focus on a part in contrast.

The Suffix Array nlogn algorithm:

  1. We have an string S[0…n)
  2. Create a new index array. SA[0…n), where S[i] = i initially. Each element S[j] stands for a subarray starting from S[j].
  3. Sort SA by the first character in the suffixes.
  4. Now sort SA by the first 2 characters in the suffixes (and yes we plan to do it with first 4, 8, 16… chars). The tricky part is that –
    1. if you remove first character in the suffixes, the remaining strings are still suffixes of the original string – and they are already sorted (so we kind of know their order)
    2. if you already grouped suffixes together by their first chars, in the second iteration, suffixes with different first chars will have the same relative order. You just need to sort and regroup inside previously grouped suffixes. For example, for string ababa, we have suffixes ababa, baba, aba, ba, a. Each time we have suffixes sorted, we have them grouped together: (first 1) [ababa, aba, a], [baba, ba] -> (first 2) [a] [ababa, aba] [baba, ba] – ababa will always come earlier than baba.
    3. So we first create a list of bucket from last sort result. Then for each suffix, we first record their original buckets. Then we put them into the new buckets according to the original bucket of the suffixes formed by removing the first chars from original suffixes.
    4. The suffixes in the new bucket are sorted by their second chars.
    5. Now it’s the time for miracle. We put the suffixes back to their original buckets, by scanning the new buckets in order. In this way:
      1. Suffixes will be put back to their original bucket so all past sorts results are retained.
      2. In each bucket, suffixes are sorted by lexical order of first 2 chars, because they have same first 1 char, and sorted by the second.
      3. The whole SA is sorted by the first 2 chars.
  5. Then sort with first 4 chars.
  6. 8, 16, …
  7. At last we have O(nlogn) construction of suffix array.

Then we construct LCP. We already have suffix array, so we start with the longest suffix (string itself), and the immediate following suffix in the sorted SA.

Say they are SA[i] and SA[i + 1]. SA[i] = 0. Obviously Substr(SA[i]) < Substr(SA[i + 1]). We first find the LCP between these 2 suffixes.This is an O(N) operation, and say the LCP(S[i], S[i + 1]) is L[i]. Here we assume L[i] > 0. If not, this step costs O(1) and continue to next iteration.

Then we move to the second longest suffix S[j] = 1.

We can safely say that LCP(S[j], S[j + 1]) is greater or equal to L[i] – 1, because, if you look at S[i + 1] + 1, we have Substr(SA[i] + 1) < Substr(SA[i + 1] + 1) (because we assumed L[i] > 0). Also LCP(SA[i] + 1, SA[i + 1] + 1) = L[i] – 1.

SA[i] + 1 is SA[j].

So we can safely move on with last step’s result.

So we have LCP.

If we concatenate a string S with reverse(S), and compute SA, then we can start from the joint point SA[i] = len(S) (and extending from there to SA[0] and SA[len – 1]), calculate the LCP between SA[i] and all other suffixes, we can get all the longest prefix palindrome of suffixes. If they happen to have same length as the suffix, then the suffix itself is a palindrome.

Easy, right? (hmmm)

Unfortunately this algorithm timed out.

There are 2 easier solutions to this problem.

  1. Rabin-Karp. For each prefix we just keep calculate its hash bi-directional. H-forward = H-forward * 131 + last-char; H-backward = last-char * 131^(len-1) + H-backward
  2. Rabin-Karp is not precise – there are collisions. Z-Algorithm solves the problem perfectly. Still we need to concatenate S and reverse(S), then remember: The KMP preprocessing means to calculate, for each prefix of the original string, the longest prefix which equals to the suffix (of the prefix). I don’t explain more. You figure out.

Lessons learned: Good to dive deep, but better to think high (wut? no I mean big.). Yeah I don’t know what I’m talking about.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s