Using your Head is Permitted

March 2008 riddle

If you've ever used file names with a numerical component (such as "FILE1.TXT" through "FILE1000.TXT") you know this problem: when listing the content of a directory or otherwise trying to access these files sequentially, you find that "FILE10.TXT", for example, is before "FILE2.TXT". The reason for this is that file names are generally sorted lexicographically, not according to their numerical values. Here is a description of how lexicographic sorting works to find the order in which any two non-identical digit strings, A and B, should be placed:

  • First, if either A or B is the empty string, it is first.
  • Otherwise, the order of A and B is according to the numerical value of their first digit.
  • If their first digits are identical, their order is what it would have been had the first digit been omitted from both strings (recursively).
The problem is that if we use the straightforward encoding of numbers to digit strings, with 1 becoming "1", 2 becoming "2" and 1000 becoming "1000", the resulting lexicographic sorting differs from sorting by numerical values. Normally, what one does to overcome this problem is to switch to a different encoding scheme where both sorting criteria coincide. For example, if we were to encode the number 1 by the digit-string "0001" and likewise pad all numbers with leading zeros to four-digit length, we will find that "FILE0001.TXT" through "FILE1000.TXT" are all sorted properly now.

This encoding has two major flaws:

  1. It can only be used for numbers in a limited range (in this case, 0 through 9999). We want an encoding that can be used for all naturals, 1 and up.
  2. It can bloat some of the numbers (in this case, the small ones) considerably. Let's say that the length of the original number, n, in its standard decimal representation, is U(n), and that the length of its encoded version is E(n). We can say that E(n)=U(n)+R(n), where R(n) is the bloating factor. We know that, in terms of orders of magnitude, U(n) is of length O(log(n)). We'd like R(n) to be less than that.
This month's question: describe an encoding of numbers into digit strings that preserves numerical ordering and satisfies R(n)<O(log(n)). The encoding should work for all naturals 1 and up, and both the encoding and decoding algorithms should be reasonably easy to implement.

As a bonus question, prove that the encoding you found has an R(n) of minimal complexity.

List of solvers:

Oded Margalit (2 March 04:41)
Hongcheng Zhu (2 March 13:48)
Omer Angel (3 March 01:38)
Mark Tilford (3 March 05:10)
Itsik Horovitz (3 March 20:41)
Albert Stadler (3 March 21:00)
Ross Millikan (6 March 03:06)
Brian Kemper (8 March 09:43)
Rani Hod (19 March 05:07)
Ananda Raidu (30 March 17:19)

Elegant solutions can be submitted to the puzzlemaster at Names of solvers will be posted on this page. Notify if you don't want your name to be mentioned.

The solution will be published at the end of the month.


To solution

Back to main page