## January 2012 solution

As was shown in the solution to IBM's Ponder This September 2011 challenge, For a given n, the expectation is maximized by the following algorithm:

Consider An in base B: Ani=0tkiBi.

This is a natural partitioning of any set of cardinality An into sets of cardinality Bi, for i=0,... . The elements of each such set can be mapped by an arbitrary bijection into the set of all strings of length i from the alphabet 0,...,B-1.

There are An possible inputs after reading n input numbers. An arbitrary bijection can be used to map these into the elements of the partition's parts, and from there to the strings in the output alphabet. These strings then form the outputs.

The condition of maximality is that for all i, 0≤ki<B. This is why the partitioning given by writing down An in base B works.

The question in the present riddle is whether these strings can always be extended given an n+1'th input. To prove the positive answer to this question, consider what happens when a new input is received. Each of the original possible inputs now becomes A different possibilities. If, before, we had ki different "full sets" of i-length strings, now we have Aki.

Obviously Aki may now be greater or equal to B, so as-is this partitioning is no longer optimal. However, we can always take B such full sets of the i-length base-B strings and extend the strings of each such set by a distinct suffix, thereby creating from the B original full sets 1 new full set of the (i+1)-length base-B strings. Let r=Aki. Repeating the procedure described r div B times, the number of i-length full sets is reduced to r mod B, so, necessarily less than B.

However, the catch is that when we do this, the strings of length i-1 may be extended to form new strings of length i, and, once again, optimality will not be achieved. To overcome this problem, we work systematically from i=0 and up. Each time, optimality is only threatened for larger i values, which then get fixed. Ultimately, the procedure ensures that all new ki values are in their proper range and optimality is preserved.

Here is a shorter and more intuitive way to view this entire process: we have shown that the base-B representation of An gives all necessary information to describe our outputs. When a new input is received, what we need to do amounts to multiplying An by A in base B. The "clumping" of B full sets of strings of length i into 1 full set of length i+1 (with the consequence of increasing output length) now simply becomes a "carry" operation.

As described, this algorithm is highly inefficient in memory. However, in fact, because we are only ever interested in one specific input sequence, it can be implemented more efficiently as given here in pseudo-Python code:

```x=0 # This variable keeps track of which "clump" we are on.
n=0 # This is the number of integers output so far.
i=0 # This is the number of integers input so far.
loop:
i+=1
x=x*A+inp
y=(((A**(i-1)) % (B**(n+1)))*A)/(B**n) # y calculates how many clumps are now
# of length n, after multiplication by
# A and including any carry from
# previous digits.
while x/B != y/B:
output(x%B)
n+=1
x=(x/B)+((A**(i-1)/(B**n))%B)*A        # Perform carry operation.
y=(((A**(i-1)) % (B**(n+1)))*A)/(B**n) # Update y to reflect new n.
x=x%B
```

Both of this month's solvers asked me if the solution I am proposing can be implemented in reasonable memory. My answer to this is as follows:

The code above requires no storage from iteration to iteration of the loop, other than x,i and n. The variable x is in the range 0≤x<B, and the two counters require space that is logarithmic in the input and output lengths, respectively. In terms of storage between invocations of the loop, we are therefore reasonable already.

During each iteration, the computation of y is not space-efficient as described. However, it can be made efficient. The crux of how to do this is given as next month's riddle.