## December 2008 solution

This month's riddle reaped a large number of different solutions.

All solutions received this month had in common that the basic algorithm sorted non-negative integers, rather than general integers. For this reason, solutions typically included a preparation phase, where the input was made non-negative. This can be done as follows in O(n):

```def sorting(array):
if len(array)==0:
return []
minimum=min(array)
nnarray=[x-minimum for x in array]
return [x+minimum for x in sorting_nonnegative(nnarray)]
```
(For clarity, code shown throughout is in Python shorthand, but it should be clear how to implement all commands in the desired complexity in the target computational model.)

Another commonality to all solutions is that they utilized the fact that manipulation of large integers can be done in O(1). They did this by packing the information from many integers into a single integer, then used arithmetic on the new integer as a form of performing parallel computation on all the integers that served as its source. We will call this packing procedure "encoding" and its reverse "decoding".

At this point, solutions divided into two major approaches, depending on the basic type of encoding used. One way was to encode the set of numbers {xi} by the cumulative OR of 1<<xi. I call this "the set-based approach". The other way was to encode the vector of values x0, ..., xn-1 by the sum of xi2i*width, where width is chosen to be a large enough number so that "max(xi)" is a number that has at most width bits. This is "the vector-based approach".

The set-based approach further requires, at its core, uniqueness of the set elements, so most solutions kept a count of the number of times each integer appears in the input array. One way to do this is by use of an additional array of size "max(array)+1". This is similar to what is done in Bucket sorting.

The set-based approach lends itself naturally to very elegant solutions, as soon as one learns the basics of how to use it. For example, note that "B&-B" is a way to calculate the position of the least '1' in "B". Using this simple function, one can encode the original set of numbers in O(n), then read them out, smallest to largest, again in O(n). This solution is implemented here.

Using vector encoding the basic parallel function is "ge" (greater or equal). Given two arrays, (ai) and (bi), we wish to calculate, in O(1), the vector that is 1 in every position where aibi and 0 everywhere else. To do this, consider first that if A is the integer encoding (ai) and B is the integer encoding (bi), then A+B is the integer encoding their element-by-element sum. Furthermore, if we wish to calculate their element by element difference, we can do so by first calculating the integer corresponding to (-bi). This can be done using the 2's-complement formula: "-b=~b+1". In order to do this for an entire vector, we simply need to switch the "1" for an all-ones vector that we will pre-encode in advance. After that, "~B+ones_vector" yields the desired result.

The only problem with "~B+ones_vector" is that the addition may introduce unwanted overflows between vector elements. To avoid this, we choose the width parameter of the vector to be at least one larger than is needed to store the largest integer in the vector. This gives us an additional "flag" bit. After an addition, it is a carry bit. After subtraction, it can be used as a sign bit. Furthermore, at any time we can zero it by AND-ing with an appropriate mask. For this purpose we will pre-encode two additional vectors, "flagmask_vector" and "datamask_vector". The former will have ones only in the flag-bit position, the latter only in the data-bit positions.

Pulling all this together, we can subtract two vectors and read the sign of the result, giving us the desired greater-or-equal function:

```def ge(a,b,ones_vector,datamask_vector,flagmask_vector,width):