First, consider the main problem of the algorithm presented: it has to handle too many sets explicitly. Let's begin by considering which sets we actually need and which can be infered. Consider the following more compact description of an "intermediate status" of the board. Regarding each of the N2xN2 positions, we will maintain a list of the i values we have still not eliminated from appearance in the position. This set will be denoted xmax for the position x, with ximax being an indicator variable whose value is 1 if i is in xmax or 0 otherwise. ximax is a renaming of what was formerly Ximax for X={x}.
This new explicit description translates to the following implicit description in the terminology of the previous algorithm:
For a set A that is a subset of a basic constraint C, Aimax=0 if A is the union of positions from which i has been eliminated. In this case, (C\A)imin=1. For all other sets, Aimax=1 and Aimin=0.
There are two forms of deduction methods that can be used to eliminate further elements from appearance in certain positions:
It can easily be shown that putting all these rules together, any reasonable Sudoku problem can be solved. The rules cover all the rules used by the brute-force solution. This, indeed, is what is implemented by the following given program. Its interface is very much like that of the program given in the question, except its "explain" method returns different explanations and, for obvious reasons, does not support queries on sets of positions, but only on single positions. Here is a sample session with the new program:
>>> from sudoku import * >>> f=open("board.txt") >>> board=f.read() >>> f.close() >>> p=sudoku() >>> p.solve(board,False) 876315492 532489617 149276385 425967831 718543926 963128754 384752169 651894273 297631548 >>> p.explain("I4","4") I4 is 4. This was discovered while checking ['4'] on column I and finding it only on I4. >>> p.explain("I4","2") 2 is not on I4 . This was discovered while checking row 4 and discovering that E4 is 2. >>> p.explain("G7","2") 2 is not on G7 . This was discovered while reading the board. >>> p.explain("G8","2") 2 is not on G8 . This was discovered while checking row 8 and discovering that C8 is 2.
The way the new program works is by re-checking, whenever a possibility, i, has been eliminated from an xmax, all basic constraints, C, that include x, trying to apply the two rules of deduction. It checks for subsets X of C for which |X| equals the number of elements that appear in the union of ymax over y in X, then removes i from all zmax for z in C\X. It also checks whether the list of positions where i has not been eliminated did not become, due to the recent elimination, a subset of a different basic constraint, D, in which case i should be eliminated from all zmax for z in D\C.
These improvements make the new version much faster (it takes roughly 1 second to solve a Sudoku board) but it's still hyper-exponential. The reason for this has to do with the need to check every subset X of each C checked. Though the number of basic constraints is polynomial, the number of subsets is hyper-exponential in N.
Here is an algorithm description for a simpler-to-analyze version of the same algorithm (though, admittedly, a slower one in practice):
1. Initialize the state by considering all elements as possible in each position. 2. Read the visible board positions. 3. For each visible board position, x: 3.1. Let i be the element in the position. 3.2. Eliminate i from the list of possibles in each of the three basic constraints that x is part of. 4. RecentChanges := True 5. While RecentChanges is True: 5.1. For each basic constraint, C: 5.1.1. Eliminate i from the possible list of position x in C if there exists a subset X of C such that a. The union of the possible lists of the positions of X is of size |X|. b. The union contains i. c. X does not contain x. 5.1.2. For each i: 5.1.2.1. For each basic constraint D other than C: 5.1.2.1.1. If the list of positions in C that may still contain i is a subset of D: 5.1.2.1.1.1. Eliminate i as a possibility from all other positions in D. 5.2. RecentChanges := the list of possibilities has been updated during the current "while" iteration.
Clearly, the entire algorithm would have become polynomial had we managed to implement line 5.1.1 in a polynomial manner. The rest of the lines are polynomially computable, and the "while" loop must perform at least one elimination every iteration, so the number of iterations is bounded by the number of elements times the number of positions, which is polynomial.
The problem is that the given implementation of 5.1.1 enumerates over all the possibilities for X, which is not polynomial. Luckily, there is a different solution.
Consider the fact that 5.1.1 eliminates i from the possibility list of x based on the fact that no assignment of elements to positions of C that includes i in the x position can be composed solely of elements listed as possible in their positions, while still having each element exactly once. We can therefore ask the question directly: is there a matching of the places to the elements that matches each place with a possible element and has i in the x place?
In this formulation, this is clearly a question of finding a perfect matching (or proving none exists) in the bipartite graph with N2 elements on one side and N2 positions on the other, with an edge between a position and an element indicating that the element is possible in the position. In order to insure that i is matched with x we can, for example, erase all other edges to i. Repeating this for each i and each x (a polynomial number of repetitions), the algorithm is complete, recalling that finding a perfect matching in a bipartite graph is a problem with a well-known polynomial solution.
An implementation of a more sophisticated (but still polynomial) version of this solution is provided here. For N=3 it is slightly slower than the hyper-exponential solution provided above, taking roughly 1.5 seconds to the other solution's one second, at least on my computer. For any larger values of N, the polynomial solution would, of course, be much faster. Already for N=4, the polynomial solution takes only a few seconds, compared to several minutes for the hyper-exponential solution.
Notably, the polynomial solution provided does not have any "explaining" functionality, nor does it provide much of the board-consistency validation that is present in the other versions given here.
Nevertheless, they do.
Here's an example:
452678391 6 792 458 8 94 5672 41 56789 687 95 4 79584 36 583 94 67 974 6 8 5 126587943
Though the algorithms presented here for the solution of reasonable Sudoku problems can not complete any of the missing digits of this board, simple trial-and-error shows that the solution is unique. Trying to set D3 to anything but "1" reaches an impasse very quickly. On the other hand, setting it to "1" makes the problem once again reasonable, yielding the following unique solution:
452678391 637921458 819435672 241356789 368719524 795842136 583194267 974263815 126587943Back to riddle