Theoretical Computer Science Asked on October 30, 2021
Compaction is a particularly weak form of sorting. The problem can be phrased as follows:
Given an array $A$ of $N$ cells, with at most $R$ of the cells distinguished (say by a bit), produce an array $D$ of at most size $O(R)$ containing all of the marked cells
Compaction is said to be:
One can construct tight, order-preserving, data-oblivious compaction from a sorting network. You have to use non-standard comparison function which:
But such a function is easy to specify, and can be computed in $O(log N)$ time.
This means that one can construct tight, order-preserving, data-oblivious compaction from the AKS sorting network (giving an $O(N(log N)^2)$ algorithm) or from a more practical sorting network like Batcher Odd-Even Mergesort (giving an $O(N(log N)^3)$ algorithm).
Anyone who has worked for sorting networks before has heard that there are the asymptotically optimal algorithms (namely the AKS sorting network and ZigZag Sort, both using $O(Nlog N)$ comparisons), and the practically efficient algorithms (Batcher’s mentioned above is one example, but there are many which are $O(N(log N)^2)$ comparisons).
I’m essentially curious if there is something similar for compaction — I have seen papers such as this which give oblivious tight compaction in $O(N)$ time <1>.
This paper’s improvement is lowering the implied constant from $2^{228}$ to ~$9000$. I imagine that for al "reasonably-sized arrays" it is still more efficient to pair Batcher’s sorting network with the aforementioned non-standard comparison function.
So has been research done into practically-efficient compaction algorithms? While I eventually want something which is tight, order-preserving, and data-oblivious, the most important keyword by far is "data-oblivious". I’m willing to tolerate a negligible (meaning $N^{-omega(1)}$) probability of failure, as long as it has "good constants".
Moreover the data to be compacted is "random" in a strong sense. In particular, each element of the array is marked/not marked according to the result of an i.i.d. $mathsf{Bern}(1/2)$.
<1> The linked algorithm has a negligible probability of failure, which is fine for my purposes.
It looks like there are some known lower bounds (see page two of Sorting Short Keys in Circuits of Size $o(nlog n)$). The relevant section is copied below:
More specifically, Lin, Shi, and Xie [LSX19] showed that any stable compaction circuit that treats the payload as indivisible must have at least $Omega(n log n)$ selector gates — here the indivisibility assumptions requires that the circuit only move the payload around through selector gates but not perform boolean computation on the encoding of the payload. The indivisibility restriction in Lin et al.’s lower bound [LSX19] can be removed if one assumes the Li-Li network coding conjecture [LL04] — this is implied by Afshani et al. [AFKL19]’s recent work$^2$.
Given that this is the case, it seems like the best thing to do is to use sorting networks. There seem to be a few new ones which may be competitive (see for example this), but I can't say anything definitive.
Answered by Mark on October 30, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP