Is there a way to determine whether a list of integers can be a prefix function?

Question

Say you had
(0,0,0,1,2,3,4,5,6,7,8,9,10)
or
(0,1,0,1,0,1,2,3,0,1,0,0,1)
Could you use, for example, the KMP algorithm to deduce the validity of the above lists as prefix functions? I know there is a way to find whether a substring exists in a string using KMP, but is there a way to do it the other way around, starting at the apparent prefix function and ascertaining that it is indeed a possible prefix function, without knowing the string/substring itself at the start?

plop · Answer

Denote by $p$ the input prefix function. Refer to the characters of a word having $p$ as prefix function by their indexes $V={0,1,2,...,operatorname{length}(p)-1}$. We will keep a set of pairs $D$ of elements of $V$ to represent the characters that must be different. Finally, $V$ will be partitioned into equivalence classes according to the characters of the word that must be equal. I assume that to answer this question it is also given which is the alphabet $Sigma$ of allowed letters to use. A possible variant of the question would give also a dictionary $L$ of allowed words.

Algorithm

A non-zero as first entry tells you right away that the sequence is not a prefix function.

Each positive value in $p$ tells you equality between certain pairs of characters of the potential word having that (presumptive) prefix function. Keep track of the equivalence classes that these equalities define, maybe with a disjoint set union data structure.

If you ever find an increase by more than $1$ ($p[k+1]>p[k]+1$) you can return immediately that it isn't a prefix function.

Whenever, the sequence doesn't increase it will also tell you one pair of characters that must be different. Collect these pairs in $D$.

Run by the pairs $D$ of vertices that must be different. If any belong to the same class return false.

Finally, consider the graph $G$ with vertices given by the equivalence classes obtained above and where the edges are the elements of $D$. If only the alphabet $Sigma$ is given, then we compute if the vertices of $G$ can be colored with the elements of $Sigma$ such that adjacent vertices don't get the same color. In the variant in which the dictionary $L$ is given, then we check if any word in $L$ has the length of $p$ and its characters that are in positions being in the same equivalence classes are equal, and the characters forming a pair in $D$ are different. If these conditions are satisfied we output true. Otherwise we output false.

Let's do your examples.
Example 1: Given $p=(0,0,0,1,2,3,4,5,6,7,8,9,10)$, then $V={0,1,2,...,12}$. That $p[0]=0$ tells us that we don't return false right from the beginning. That $p[1]=0$ tells us that characters $0$ and $1$ must be different. So, we insert ${(0,1)}$ in $D$. That $p[2]=0$ makes us insert $(0,2)$ in $D$. The rest of the entries of $p$ increase by $1$. So, we never return with a false caused by an increase larger than $1$. Moreover, there are no new insertions in $D$, but we do get the equivalence classes ${0,3,6,9,12},{1,4,7,10},{2,5,8,11}$. Since the two pairs in $D$, $(0,1)$ and $(0,2)$ don't consist of elements belonging the same equivalence class then we output true. Well, conditional of the variants of the problem in which the alphabet is small and the variant in which the dictionary of all words are given.
For example, the word $1231231231231$ has $p$ as prefix function. Well, if the alphabet has less than $2$ letters then we could give the example $1221221221221$. If the alphabet has only $1$ letter, we would need to output false.
Example 2: Given $p=(0,1,0,1,0,1,2,3,0,1,0,0,1)$, then $V={0,1,2,...,12}$. That $p[0]=0$ makes us not return false from the beginning. That $p[1]=1$ tells use that $0$ and $1$ are in the same equivalence class. That $p[2]=0$ makes us insert $(0,2)$ in $D$. That $p[3]=1$ tells us that $0,1,3$ are in the same equivalence class. That $p[4]=0$ makes us insert $(0,4)$ in $D$. that $[5]=1$, tells us that ${0,1,3,5}$ are in a equivalence class. I guess you can continue. At the end the only equivalence class with more than $1$ element is ${0,1,3,5,6,9,12}$. The rest of the characters are on their own equivalence classes. In $D$ we have the pairs $(0,2),(0,4),(0,7),(0,8),(0,10),(0,11)$. None of these consist of characters from the same equivalence class. So, we return true.
For example, the word $0010200130450$ has $p$ as prefix function.
Let me add an example that returns false to have an example of that case.
Example 3: Given $p=(0,0,1,2,3,4,2,0)$, then $V={0,1,2,...,6}$. Since $p[0]=0$ we don't return false from the beginning. That $p[1]=0$ makes us insert $(0,1)$ in $D$. That $p[2]=1$ tells us that $0,2$ are in the same equivalence class. That $p[3]=2$, tell us that $1,3$ are in the same equivalence class and that $0,2$, which we already knew are also equivalent. That $p[4]=3$ tells that $0,4$ are equivalent, $1,3$, which we knew, and $2,4$, which by now we knew too. That $p[5]=4$ tells that $1,3,5$ are equivalent. Now, that $p[6]=2$ tells us that $6$ and $1$ are equivalent. That's okay so far. But it also tells us that $0$ and $5$ are equivalent. Now, the problem with that is that that makes $1$ (which is equivalent to $5$) in the same equivalence class as $0$, but $(0,1)$ is a pair in $D$. So, we will have to return false.

Is there a way to determine whether a list of integers can be a prefix function?

One Answer

Add your own answers!

Ask a Question