Let the numbers to be sorted be in an array called A.
We can freely assume that all values in A are between 0 and 1. (If you cannot assume this, first calculate, in O(n) time, the maximum and minimum values in the array; if they are equal, all elements of the array are the same and the sorting result is trivial; if they are not equal, one can use them to compute an order-preserving linear transformation that maps all elements of A into [0,1).)
Now, for any such A there exists an integer s, so that for any i and j, if A[i]≠A[j] then the two numbers also differ in their first s bits. By taking "int((1<<s)*A[i])", we can transform our problem of sorting reals to a problem of sorting integers, which we know to be doable in O(n). (See, e.g., the last solution given to the December 2008 riddle.)
The question therefore boils down to whether we can find such an s in O(n) time. Zhang's solution is the following.
We will work with a candidate s, beginning with s=0, and will read the values in A sequentially. For each value read, we will check whether it collides, in its first s digits, with a previously-read value in A, and if so, we will increase s accordingly, until the two no longer have an overlap. Because the previously read values in A are known to differ in their first s digits, the collision can only ever be with a single element of A.
The question now becomes how to find the element of A with which there is a collision. This must be done in O(1). In principle, this should be easy to do: we can maintain an array of size 2s, where for each possible sequence of s bits we store which element of A the sequence collides with (if at all), but how do we update this array when s needs to be increased? We do not have time to re-read all elements of the array in order to find out what their values are in the previously discarded bits.
The ingenious part of Zhang's solution is in realising that we don't have to re-read A and don't need to know what the previously discarded bits are.
We can, instead, when increasing s to s', simply place any value in the 2s-length array in 2(s'-s) distinct positions in the new array, each corresponding to one of the 2(s'-s) possible completions of the s-length bit prefix into an s'-length prefix.
The critical observation is that because no two elements of A coincided in their first s bits, no two can overlap in any of their extensions to s' bits. While the look-up in the table will no longer ensure that one will find an A value beginning in the desired sequence of s' bits, one is guaranteed that one will find the only value in A that potentially has the desired prefix. All other elements in A are known not to share this prefix.
Techniques regarding how to adjust the look-up table in O(1) for a new value of s have been discussed in the solutions for the January 2009 riddle, in the context of width changing, but those techniques require the use of Boolean operations, which are here not permitted: we showed how to copy all values to their desired positions in O(1), but then needed a Boolean AND operation to zero out other values that migrated in the process to undesired positions.
Zhang's approach to this problem, circumventing the need for Boolean operations, is the following: the undesired values are simply ignored. Effectively, the new array created is not 2s' values long, but rather 2s'x values long, for some x, and when wanting to read value i of the 2s' possible ones, one reads address ix of the array. All values that are in position numbers that do not divide by x are simply ignored.
Zhang's code, annotated, can be found here.
Once again, thanks Zhang for this wonderful riddle!