To see that it can be done in log(log(N))+O(1) bits, consider the following protocol: Alice considers her two numbers. Because they are distinct, they must be distinct in at least one of their log(N) bits. Alice communicates to Bob, in log(log(N))+O(1) bits, the position of one of these distinctive bits. Bob replies by sending the value of his number in the communicated position. This takes another bit, for a total of log(log(N))+O(1) bits, as required. (Some solvers pointed out to the existence of another solution with the same number of sent bits, where Alice omits one of the bits from the protocol described above [e.g., she may choose to omit the least bit], in which case Bob sends the identity of his bits in both potential positions.)
More difficult is to show that this is a tight lower bound. Most solvers did this by showing that there must be an optimal protocol in which Alice sends k bits (for some k) and then Bob sends one last bit, this indicating that there exists a protocol in which Bob sends 2k bits and Alice sends nothing, this indicating that 2k bits is at least log2N. I'd like to prove the claim using slightly more flashy means (although, arguably, ones that are essentially equivalent to the above). To do this, let us consider a slight variant on the set-up.
In our variant, a new player, Eve, is the one doing all the speaking and everything she says is heard by both Alice and Bob. Alice, Bob and Eve can set up, ahead of time, a strategy so that Eve, upon seeing both Alice's numbers and Bob's number, communicates a minimum number of bits in order to tell Alice which of her numbers is Bob's.
This could have been very easy to do (in O(1) bits, Eve can always say "The higher one!" or "The lower one!"), but the problem is that neither Alice nor Bob trusts Eve. So, the protocol has to be chosen in such a way that if Eve tries to bluff, at least one of Alice and Bob must be able to detect this.
I claim that the number of bits it takes Eve to communicate in the new game cannot be higher than the number of bits it takes Alice and Bob to communicate in the original game. To see this, consider that for any Alice-Bob protocol, Eve can simply parrot the entire Alice-Bob conversation. Both Alice and Bob can then verify that what Eve says is compatible with what they would have said at any given point in the conversation, and so the entire conversation is verified.
Now, let us consider each possible state of the game as an ordered tuple, (a,b), where a is the number that only Alice has, and b is the number that both Alice and Bob have.
Eve's strategy must satisfy that in any case (x,y) she will say something different than in the case (y,z), for any z. The reason for this is that if the two are equal, Eve can fool both Alice and Bob by maliciously sending this message in the case (z,y), making Alice believe that the situation is (y,z) and Bob believe that the situation is (x,y), with neither of them able to call Eve's bluff, which will cause Alice to make the wrong decision regarding Bob's number.
For a general directed graph G=<V,E>, a function from E that satisfies the criterion that its value for any (x,y) in E is different to its value on any (y,z) in E is called an edge colouring of G. In particular, Eve's strategy in this case is an edge colouring of the directed clique in N vertices. Such an edge colouring is known to require log(N) colours, so communicating one such colour will require, in the worst case, log(log(N)) bits.
For completion, here's a short proof regarding the edge colouring of the directed clique.
Consider any two vertices, x and y in the clique, and consider X, the set of all colours appearing in edges going out from x, and Y, the set of all colours appearing in edges going out from y.
I claim that for every x and y, these sets must be distinct.
Suppose, to the contrary, that they were identical. The colour of the edge (x,y) would by definition appear in X, and therefore by assumption also in Y, but that would indicate that there is an edge going out from y with this colour, contradicting the condition of a proper edge colouring.
However, if for each of the clique's N vertices the set of colours used in its outgoing edges is distinct, this means that the base set that is the union of all these sets must contain at least log(N) distinct elements. Hence, any edge colouring of the clique must contain at least this many elements.
Q.E.D.
As a side note, if we were interested in the expected number of communication bits, rather than the worst-case, better ways are possible, and in general it is possible for Alice and Bob to transfer the necessary information in an expected O(1) bits of communication. For example, if Alice and Bob share a common randomness source, this source can generate a mask of log(N) bits, and Bob can send over to Alice the dot product of his number (written out in bits) and the mask. This will disambiguate Alice's two numbers with probability 1/2.