def is_halting(prog): return floor(H*(1<<prog))%2

where a return code of 1 indicates a halting program and a 0 indicates a non-halting program. The variable 'prog' is the input program, suitably encoded as a natural.

What this program does is simply return the 'prog'th bit of *H*, where
*H* is a real constant used by the program.

The trick to this program is that, as written, the usual complexity assumptions explicitly allow the use of real constants. In practice, no such constants should be allowed in any reasonable computational model, for precisely the reason outlined above: if we are looking for a program that takes a natural and returns a bit (such as "accept" or "don't accept"), any such program can be implemented in constant time using the procedure described above, simply by choosing the appropriate real number that corresponds to the desired return values. There is always exactly one such real number in the range [0,1).

One can argue that we will never be able to *find* the correct
*H* value that will solve the halting problem, but the riddle merely asks
if such a program exists, and the answer is that such a constant and such a
program must certainly exist. If *Using your Head is Permitted* had
continued publishing riddles about the usual complexity assumptions, I would
have altered these slightly so as to explicitly disallow real constants to be
used. (In some contexts, I have been known to also disallow all integer
constants except for 0 and 1, for similar -- but not identical -- reasons.)

So, how does this construction overcome Turing's diagonalisation argument?

The answer is that it doesn't. This program cannot determine the halting properties of every program written using the usual assumptions. The trick is that the input is "a software program, suitably encoded as a natural": there are only as many encodable inputs as there are naturals. This is enough to encode all Turing machines and all programs that do not use real constants, but the number of programs that use real constants is much larger than the number of naturals, and these cannot all be encoded.

First off, without any further assumptions, we can construct explicitly
a solution for the case that *n*=2*k*+1. A line in
ℝ^{n} space can be described as the set of points
satisfying *x*+*vt*, for any real value *t*, where
*x* and *v* are vector parameters describing the line.

For our purposes, it is enough to restrict *x*_{n} to 0 and
*v*_{n} to 1. We must, however, assign a unique *v*
value to every such *x*. Let us describe this by a function,
*f*:ℝ^{2k}→ℝ^{2k}, such
that for every *x* as above, the associated *v* value satisfies

The condition of the riddle can be stated under these restrictions as
*f*(*a*)-*f*(*b*) being linearly independent of
*a*-*b* for every *a* and *b*. This is particularly easy
to attain when *f* is a linear function, i.e.
*f*(*a*)=*M***a*, for some matrix *M*.
Here, the condition on *f* boils down to *M* not having any real
eigenvalues.

When the dimension of *M* is even (in this case, 2*k*), constructing
such a matrix with no real eigenvalues is straightforward. For example, we
can pick

which is a matrix rotating each pair of dimensions by ninety degrees.

This, however, leaves open the case where *n* is even, a case where no
such *M* can be found because complex eigenvalues always come in pairs.

Another case that can be handled without further assumptions is *n*=2,
where it is clearly impossible to construct a set of lines satisfying the
requirements of the riddle.

We remain, therefore, with the case that *n* is even and greater than 2.
We will show for this case that assuming the axiom of choice it is possible
to construct a set of lines satisfying the desired conditions, but it is
not possible if the mapping is continuous in any neighbourhood *N*.

Let us first consider the case where the mapping is continuous in some
neighbourhood *N*. Let *x* be in *N* and let *v* be the
direction of the line going through *x*. Because the mapping is
continuous in the neighbourhood, there is some stricter neighbourhood of
*x* in which no line has a direction that is perpendicular to *v*.

For convenience, let us choose a frame of reference wherein
*x*=(0,..., 0) and *v*=(0,..., 0, 1), then this condition becomes
equivalent to our requirement from the previous portion of the analysis that
we can assume *x*_{n}=0 and *v*_{n}=1.

As demonstrated before, any successful construction must now satisfy that
for any *x* and *y*, *f*(*y*)-*f*(*x*) is
linearly independent of *y*-*x*.

Let us fix *x* as above, and define *g*(*y*) as the normalised
direction vector for the component of *f*(*y*)-*f*(*x*)
that is perpendicular to *y*-*x*.

The impossibility of finding such a *g* is a consequence of the
"hairy ball theorem": there must be at least one *y* in any sphere
around *x* where the perpendicular component vanishes and *g*
cannot be normalised.

We will now show that when looking at discontinuous mappings, we can construct a solution to this problem by use of the axiom of choice (or even some weaker axioms). A method for doing so is as follows.

Take a bijection between ℝ^{n-1} and the first ordinal
of cardinality *c* (continuum). Now, go over all *a* values in
ℝ^{n-1} in the order of the bijection, and for each
*a* pick an *f*(*a*) on the unit sphere that has not been
excluded due to previous choices.

Will there always be a value remaining that can still be picked? Well, each
previously chosen value generates a constraint on *f*(*a*),
forbidding it from being on a certain line. This excludes at most two values
on the unit sphere. However, there were always previously less than
*c* such already fixed values, and therefore also fewer than
*c* forbidden values (the multiplication by 2 makes no difference to
the ordinal), so there will always be more values that can be chosen.

The complete answer is therefore that if assuming the axiom of choice all
*n* values are possible except 2, whereas if assuming the mapping to
be continuous in any neighbourhood, possible values are exactly the odd
*n*.

Thanks again David for this riddle, and thanks Tom and Omer for the beautiful solution.

Before describing it, here is a recap of how to solve the more standard
riddle, where *N* guesses are allowed. (Kevin tells me he has heard this
version from "either Ross Kang or Fiona Skerman".)

We choose to turn all lights off, except for the last (i.e., the *N*'th)
light encountered in each row. That light we keep on. The gaoler now has the
choice of turning the last light off, in which case our friend merely needs
to test the *N* bulbs in the row where all are off, or the gaoler will
turn the last light on, in which case there are *N* lights on, one in
each row, and by guessing each in turn our friend can find the desired
light bulb. It is possible to tell which is the case simply by
counting the
number of lights that are on.

With this in mind, we now turn to Kevin's variation on this riddle, wanting to solve the same problem with less guesses.

To do this, let us consider the following simpler problem: we have *n*
light bulbs, and the gaoler points them to us in an order of his (or her)
choosing, at which time we must choose whether to turn the pointed-to bulb
on or off. We wish to choose a strategy for determining which lights to turn
on and which off that will satisfy the following criteria.

- The
*n*lights will ultimately form a code word in an error correcting code capable of correcting a single error. - The code word will encode the identity of
*k*bulbs which are all not the last bulb to be pointed to. (It does not need to encode any ordering among the*k*.)

Note that once a set *k*_{1} and *n*_{1} and a
set *k*_{2} and *n*_{2} are found, it is always
possible to find a solution to *k*_{1}+*k*_{2} and
*n*_{1}+*n*_{2} simply by appending the two vectors.

A corollary is that once a set of *k* and *n* is found, one can
find arbitrarily large sets with the same *k*/*n* simply by joining
*t* copies of the same construction.

Let (*k*,*n*) be one solution to the simpler problem, and let
us now return to the original problem, but instead of solving it for an
*N***N* grid we will solve it for a rectangular grid with
*n*-*k* rows and *n* columns.

Our strategy will be as follows. Whenever
the gaoler points to a bulb, we use the code-book that we devised previously
to encode the identity of *k* bulbs that are not the last the gaoler
pointed to among the *n*. However, when reaching the last bulb in any
given row, we purposefully introduce an error in the code.

At the end of the process, there will either be *n*-*k* coding
errors, in which case our friend will merely need to guess these, or there
will be one less coding errors, in which case this pinpoints the identity of
a single row, and we can use the encoding in order to narrow the search for
the last bulb chosen by the gaoler to only *n*-*k* possibilities.

In practice, it is unlikely that a set of *N***N* bulbs will avail
an exact reshaping to (*n*-*k*)**n*, but we don't actually
need all rows to be of exactly the same length. All we need is to minimise
the maximum between the number of rows and the maximum among all rows of the
*n*-*k* value used in encoding the row. (Remark: A more general
statement is that if *M* is the total number of light bulbs, whether or
not it is a perfect square, then the function that maps *M* to the
optimal number of guesses required is monotone increasing. This can be shown
by noting that if *M*_{1}<*M*_{2} then solving
for *M*_{2} can be no harder than solving for *M*_{1}
once an Oracle provides us with the identity of the first
*M*_{1}-*M*_{2} light bulbs to be chosen, which can
be no harder than solving for *M*_{2} without such an Oracle.)

Solving by use of rows of unequal lengths
is always possible to do. This can be demonstrated by the fact that
the standard solution is merely a case where *n*=1 and *k*=0.
However, the saving in using a particular choice of (*k*,*n*) pair
compared to the standard solution is the multiplicative difference between
*n*-*k* and
√(*n*-*k*)**n*.

Any *k*/*n*>0 will therefore solve the riddle, and the higher the
value the greater the saving.

One simple solution can be found with *k*=1 and *n*=6 (recalling that
once a particular *k*/*n* is found, the values of *k* and
*n* can be increased arbitrarily by appending more copies of the same
without this degrading the *k*/*n* value) by simply encoding in a
1-correcting code of length 6 the identity of the first bulb to be pointed to.

Just for fun, here's a more involved code with
*k*=4 and *n*=15. It is a
Hamming Code in
which one begins by turning the first 4 chosen lights off, then completes the
rest of the code word to encode the position of these first four lights.
The code itself is given here. The left
column indicates (using its '1's) the identity of the first four bulbs to be
pointed to; the right column shows the completion into a full code word, after
the first four pointed-to lights have all been switched off.

This code was generated by the Python program given here.

However (for the bonus riddle), can we solve this problem in o(*N*)?

The answer is that we can, and this due to the following simple observation:
if it is possible to solve for *M*_{1} bulbs with
*G*_{1} guesses and *M*_{2} bulbs with
*G*_{2} guesses, then it is possible to solve for
*M*_{1}*M*_{2} bulbs with
*G*_{1}*G*_{2} guesses.

What this means is that if there a solution for some
(*M*,*G*) pair, there is also a solution for
(*M*^{k},*G*^{k}) for any *k*,
so if *G*=*M*^{α} then there is a solution in
O(*M*^{α})=O(*N*^{β}) for a general
*M* (and *N*), where β=2α.

The original riddle showed how to do this with α=1/2, our improvement
with (*k*,*n*) schemes works with α values of
log(*n*-*k*)/log(*n*(*n*-*k*)). This places our
(*k*,*n*)=(1,6) scheme at α≈0.473197
(so, ≈O(*N*^{0.946})) and our improved
(*k*,*n*)=(4,15) scheme at a slightly better
α≈0.469628 (so, ≈O(*N*^{0.939})).
Both satisfy the o(*N*) requirement.

How do we prove that there is a solution in
*G*_{1}*G*_{2} guesses?

Divide the *M*_{1}*M*_{2} bulbs into
*M*_{1} groups of *M*_{2} bulbs each. We know that
there is some strategy affording a solution in only *G*_{2} for
*M*_{2} bulbs. We will employ this solution in each of the bulb
groups separately.

Note, however, that this (*M*_{2},*G*_{2}) solution
assumes that the gaoler has control over the setting of the last bulb. In
practice, in *M*_{1}-1 of the *M*_{1} groups, we
have control over this bulb. We will use our control in order to send an
additional bit of information to our friend, and will do so in the following
way.

We will treat each group of *M*_{2} bulbs as though it was a
single bulb. Whenever the last bulb is pointed to in any group, we know that
this group is not going to have the last bulb to be chosen. We will treat this
as we do pointing to a single bulb out of a total of *M*_{1}
(where pointing to a bulb indicates that this bulb is not going to be the
last to be chosen). If in the problem with *M*_{1} bulbs we would
have at this point chosen the light to be on, in the larger problem with
*M*_{1} groups of size *M*_{2} we will either turn
the light on or off, so that the total number of lights that are on is odd.
Conversely, if we would have chosen in the *M*_{1}-sized problem
to turn the light off, we make the total number of on lights in the group even.

By using the (*M*_{1},*G*_{1}) solution over the
parity of the number of lights on in each group, we can
ensure that, at the end, we can narrow the problem down to only
*G*_{1} potential groups where the desired last bulb can be, much
like in the original *M*_{1}-sized problem we were able to
narrow the position of the last bulb to only *G*_{1}
candidate positions.

We now go over each of these *G*_{1} potential groups, and employ
the (*M*_{2},*G*_{2}) tactic on each of them, thus
being able to rule out that they have the last bulb in only
*G*_{2} guesses each.

Thus, in a total of at most *G*_{1}*G*_{2} guesses
the desired bulb is found.

But is β=0.939 the lowest β for which there exists a solution in
O(*N*^{β}) guesses?

In the history of *Using your Head is Permitted* it was many times the
case that readers sent me more elegant solutions than I had prepared originally,
and I would publish these instead of my own solutions, but it was almost never
the case that I would get sent a solution that is quantitatively
better than what I had. How fitting it is therefore that for the very
last part of the site's very last riddle Uoti Urpala sent me a solution that
outdoes by a stretch this β=0.939, which is the best complexity I knew
of for this riddle at the time of publication. Thanks, Uoti!

While I still do not know the best β possible, Uoti's strategy shows that the riddle can be solved with any β>2/3.

I will describe here a simplified, easier to explain, but not quite as efficient version of Uoti's strategy (while still obtaining the β>2/3 result). For those interested in Uoti's full strategy, I've made his full description available here. For all others: rest assured that any ingenious part of the construction is Uoti's, while any inefficiency, in-clarity and incorrectness are entirely my fault.

Here goes.

As before, the method is based on the idea of a code of length *n* that
can encode the identity of *k* elements that are known to not be the
last one set, and that can correct one error. We will construct such a code
for which *n*-*k* is, up to some polylogarithmic
terms, O(√*n*).

Readers are welcome to verify that such a code translates, in the overall
scheme to an O(*N*^{β}) with β equal to 2/3 plus
some asymptotically vanishing terms (due to the polylogarithmic terms of
*n*-*k*), which is what we want to prove.

Our specific error correcting code requires several elements. First is the
idea of a *Hamming checksum*.

To define the Hamming checksum, let us first describe a truncated Hamming code:
to create a code of length *n* we assign to each position a unique number
from 1 to *n*, which we encode as an
*L _{n}*≈O(log(

The Hamming checksum we will use will be the bits at positions associated with
numbers that are powers of two. There are *L _{n}* of these, and
regardless of the values assigned to any of the other bits in a word, one
can complete it into a code word by a unique appropriate assignment of the
checksum bits.

Importantly, the bits can be assigned to the numbers 1 through *n* in any
order, and in particular any subset of *L _{n}* bits can be used
as checksum bits, as long as the decoder knows in advance which bits are
which.

In our case, we will use other parts of the code word in order to encode which
*L _{n}* bits are the Hamming checksum, and will take the other
bits to be assigned such that both the checksum bits and the non-checksum bits
are individually assigned monotone increasing values in the Hamming error
correcting scheme.

Notably, the information of which bits are the checksum bits must itself be stored using a 1-error correcting code. For this we will use a simple (and inefficient) code: we will simply encode the information three times, using three non-overlapping sets of bits.

The full construction is as follows. We begin by describing it from the perspective of the decoder (say, our "friend"), then go back to show that it can be utilised in practice.

We take the *n*-length word and divide it into segments of size *x*
(a length to be decided on later, but which will ultimately be more or less
√*n* in size.

In a legal code-word in our scheme, all segments are very sparse (the portion
of '0's tends to 1 as *n* grows to infinity),
but some segments are more sparse than others. There will be, namely:

- One segment with between 0 and
*L*ones. We name it "Type A"._{n} - Three segments with between 2
*L*_{n}*L*and 3_{x}*L*_{n}*L*ones, where_{x}*L*≈log(_{x}*x*) is the number of bits required to encode a number between 0 and*x*-1. We name these "Type B". - All other segments with exactly
*C*=2*L*_{n}*L*(_{x}*L*+1) ones. We name these "Type C"._{x}

Importantly, in a 1-corrupted code-word, these numbers may be off by ±1, but the differences between the number of set bits in each type of segment are significant enough that no single-bit error can cause ambiguity regarding which type each segment is.

Using a special technique, which we name *subset encoding* and define
below, each of the three "Type B" segments encode the same set of
*L _{n}* numbers in {0,...,

To make this into a (*k*,*n*) construction, we must specify which
*k* elements are known not to be the last elements set, given a specific
(legal) code-word. Here, these *k* are the zero positions in the
"Type C" segments. As can be easily ascertained, if *x* is taken to be
√*n*,
*n*-*k* is on the order of
√*n* times a
polylogarithmic term.

From the perspective of the encoder, this is handled as follows.

Initially, the encoder uses the strategy that the first *x*-*C* bits
selected in every segment are chosen to be 0, and all following bits in
that segment are chosen to be 1. This continues until there are only 4
segments for which *x*-*C* bits have not yet been selected.

At this point, the encoder chooses arbitrarily one of the 4 segments to serve
as a "Type A" and the other three as "Type B"s. The remaining segments are
the "Type C"s. Furthermore, the encoder chooses arbitrarily *L _{n}*
as-yet unselected bits in the "Type A" segment, to be used as the Hamming
checksum. The rest of the "Type A" bits will be set to zero. The rest of the
"Type B" bits will be set by the subset encoding scheme, so as to encode the
positions of the Hamming checksum bits, and the rest of the "Type C" bits will
be set to 1, continuing the original strategy and ensuring that the "Type C"
zeros are known to not contain the last selected bit.

We now complete the construction by describing *subset encoding*.

Subset encoding takes a string of *x* bits, of which all but *A* are
zeros and the rest have not been set yet, and selects value for the remaining
*A* bits in a way that encodes a desired bit string. We will show that
any bit-string of length *A*/(2(*L _{x}*+1)) can be
encoded in this way, allowing for unambiguous decoding, even by a decoder who
does not have the information of which

Once again, we begin by describing the process from the point of view of the decoder.

To decode the word encoded by a string of *x* bits, we use the following
procedure:

- If the number of set bits is zero, the string encoded is the empty string.
- If it is 2, the string encoded is '0'.
- If it is 3, the string encoded is '1'.
- If it is 4 or more, divide the input string to two halves, decode each half separately, and concatenate the two results.

In a legal encoding, the number of set bits is never 1, which is why this possibility is ignored.

In effect, we distribute the bits of *x* along the leaves of a tree, where
each leaf is associated with 0, 2 or 3 set bits. Importantly, a leaf with 0
set bits can be omitted from the tree altogether without this changing the
decoding, for which reason it is not necessary for the encoder and decoder to
use exactly the same tree decomposition.

Also, note that once the tree is pruned of its zero-weight leaves, the rest of the tree structure is unambiguous, because the smallest of the non-leaf nodes has a Hamming weight of 2+2=4, which is more than that of the heaviest leaf node (3). Together, these facts indicate that if the encoder encoded its message by use of leaves of weights 0, 2 and 3, the decoder will decode the message properly.

The encoder works in the following way: the string of *x* bits is
recursively divided into halves, with division stopping whenever the number of
settable bits in a leaf node is 4 or less. If it is 2 or less, those bits are
zeroed out. The remaining leaves have either 3 or 4 settable bits. We now prune
these to either 2 or 3 set bits, according to the values we wish to encode.

How inefficient can this encoding be? If *x* has at least 3 settable bits,
it will eventually encode at least one bit of output. We can therefore take
each bit of output, and bound the maximum number of settable bits *x* may
have had along its encoding path from root to leaf. Multiplied by
the number of encoded bits, this gives an upper bound to the number of bits
needed for encoding.

According to this upper bound, each bit encoded requires at most
2(*L _{x}*+1) settable bits in

Q.E.D.

As mentioned, it is at this time unknown (to me) whether and by how much β can be pushed down even further from this present 2/3 bound.

Thanks, Uoti for the beautiful construction, and thanks Kevin for this wonderful riddle.

And thank you to all readers, all who contributed riddles, all who contributed solutions, and all who supported this site throughout its run.

I will miss you all.

**
PS --
**

**
For anyone who is interested in reading other things I write, I have just now
started a new web-column, called "Musings of a Junk Connoisseur", where I give
movie and TV reviews. Right now there is only one review there (for "The
Orville"), but I have quite a few reviews ready, which I'll be uploading over
the next week or two, so have faith and some patience. The review page can be
found here.
**

**
In one of the shortly-to-be-uploaded reviews I will also be explaining why it
is that I felt the need to do this on a web-page of my own, rather than use
established sites such as IMDb or Rotten Tomatoes.
**