All solutions received this month had in common that the basic algorithm sorted
non-negative integers, rather than general integers. For this reason, solutions
typically included a preparation phase, where the input was made non-negative.
This can be done as follows in O(*n*):

def sorting(array): if len(array)==0: return [] minimum=min(array) nnarray=[x-minimum for x in array] return [x+minimum for x in sorting_nonnegative(nnarray)](For clarity, code shown throughout is in Python shorthand, but it should be clear how to implement all commands in the desired complexity in the target computational model.)

Another commonality to all solutions is that they utilized the fact that manipulation of large integers can be done in O(1). They did this by packing the information from many integers into a single integer, then used arithmetic on the new integer as a form of performing parallel computation on all the integers that served as its source. We will call this packing procedure "encoding" and its reverse "decoding".

At this point, solutions divided into two major approaches, depending on the
basic type of encoding used. One way was to encode the *set* of numbers
{*x _{i}*} by the cumulative OR of 1<<

The set-based approach further requires, at its core, uniqueness of the set elements, so most solutions kept a count of the number of times each integer appears in the input array. One way to do this is by use of an additional array of size "max(array)+1". This is similar to what is done in Bucket sorting.

The set-based approach lends itself naturally to very elegant solutions, as soon
as one learns the basics of how to use it. For example, note that "B&-B" is
a way to calculate the position of the least '1' in "B". Using this simple
function, one can encode the original set of numbers in O(*n*), then
read them out, smallest to largest, again in O(*n*). This solution is
implemented here.

Using vector encoding the basic parallel function is "ge" (greater or equal).
Given two arrays, (*a _{i}*) and (

The only problem with "~B+ones_vector" is that the addition may introduce
unwanted overflows between vector elements. To avoid this, we choose
the *width* parameter of the vector to be at least one larger than is
needed to store the largest integer in the vector. This gives us an additional
"flag" bit. After an addition, it is a carry bit. After subtraction, it can
be used as a sign bit. Furthermore, at any time we can zero it by AND-ing
with an appropriate mask. For this purpose we will pre-encode two additional
vectors, "flagmask_vector" and "datamask_vector". The former will have ones
only in the flag-bit position, the latter only in the data-bit positions.

Pulling all this together, we can subtract two vectors and read the sign of the result, giving us the desired greater-or-equal function:

def ge(a,b,ones_vector,datamask_vector,flagmask_vector,width): return ((a+(~b & datamask_vector)+ones_vector) & flagmask_vector) >> (width-1)Here is an example solution given the vector-based approach that uses "ge" to sort in O(n). The way it works is by first encoding the input array, then comparing it to cyclic shifts of itself. In

Both solutions above can be improved considerably. They are presented as they
are for clarity. As an example, *width*, in the vector-based solution,
can be chosen to be as small as floor(log_{2} *m*)+2, where
*m* is the maximum value we may wish to store in the vector. Instead,
we pick *width* to be much larger, namely *m*+1, just so as not
to go into unnecessary details. Given a few
optimizations, the real-life performance of the vector-based solution above
is comparable to that of Bubble sorting (where, by "real life performance"
I mean the computational model in which addition, subtraction, bit-shifts and
bitwise operations all take time-complexity proportional to the length of their
inputs and outputs).

This demonstrates an important property of the vector-based solutions: what
they lack in elegance they make up for in practicality and wide-applicability.
A good demonstration of this is the vector-based solution sent by Sen Gu. It is
a sorting algorithm imported directly from parallel computation theory.
Using the parallelized "ge", this solution implements in O(1) a single round of
Odd-Even Transposition
Sort, a classic sorting algorithm which is a parallelized version
of Bubble sort. The entire sort takes *n* such rounds and so can be
performed in O(*n*). Sen Gu simply
substituted the O(*n*) simultaneous comparisons normally used in Odd-Even
sort with a single vector comparison.

In all solution examples given above, the key ingredient in achieving the desired parallelism is a mix of bitwise operations and arithmetic operations. The last solution I present here comes from Yingjie Xu, and is worth seeing because it demonstrates that this mix in operation types is not an essential ingredient. This solution uses addition, subtraction and bit-shifts only. Using these, it counts how many values are smaller/larger than a specific element, doing so in the set-based approach. It is available here.

One last point: I was told by one reader that the fact that we allow unlimited
integers here is a trick, and that the sorting procedures I ask people to come
up with would not work in the more realistic case of bounded integers. This is
true. If you want to sort *n* bounded integers in O(*n*) time, use
Radix sort.