Friday, February 8, 2019

Published 6:08 PM by with 0 comment

Find All Ints Less Than N with M Digits Equal to 1 in Their Binary Representation

We stumbled on a fun algorithm problem in the wild at work this week.

Real Life Problem

We needed to test commands coming into a system on two paths to make sure there were no race conditions in the processing. Example commands are 'set mode = Voltage; set Range = 10' and 'set mode = Current; set Range = 1'. To test this, we needed to generate an array that consisted of all combinations of the two command arrays such that the order of elements from each array was maintained.

As an example, consider arrays with commands 1, 2 and commands a, b. We wanted all combinations of those two such that 2 was always after 1, and b was always after a:

1,2,a,b
1,a,2,b
a,1,2,b
1,a,b,2
a,1,b,2
a,b,1,2

Solution

It turns out that this problem is the same as finding all ints less than 2^N where N is the length of the combined array, such that the number of 1's in their binary representation is equal to the length of the second array. To see this, consider the same problem as above but replace 1, 2 with 0, 0, and a,b with 1,1:

0011 = 3
0101 = 5
1001 = 9
0110 = 6
1010 = 10
1100 = 12

N = 4, so 2^N is 16. That list is all ints less than 16 with two 1's in their binary representation.

Algorithms

How do you find those?

Our simplest solution was to generate all numbers from 1 to 2^N and count the 1's in their binary representations. Here is a sample implementation:

import time; 
ms = time.time()*1000.0

list1 = [0]*10
list2 = [1]*10
tlen = len(list1) + len(list2)

#get array of all powers of 2 needed here
twos = {}
for i in range(tlen):
  twos[i] = 2**i
  
#generate all numbers up to 2^total length
results = []
for i in range(2**tlen):
  
  #get binary representation of i
  btemp = [int(x) for x in bin(i)[2:]]
    
  #keep number if sum of digits is correct
  if sum(btemp) == len(list2):
    val = 0
    for j in range(len(btemp)):
      if btemp[j] == 1:
        len(btemp) - j
        val += twos[len(btemp) - j - 1]
        
    results.append(val)
    
print(time.time()*1000.0 - ms)

This is really clean and simple. There's one huge problem though. If you work through the complexity of the algorithm, it's O(2^N). That's enormous. If each array is 25 elements, that's ~1x10E15 iterations. For a typical computer, your execution time is measured in years for that. Can you do better?

Look at the pattern of numbers in the example above. You can see a general trend:
  • start with 2^M - 1 where M is the length of the second array
  • shift the leftmost '1' a single position to the left
  • repeat until it reaches the leftmost position
  • if next '1' element is greater than 1 position to the right of the leftmost '1', shift it down one, set the leftmost 1 position left of that '1', and repeat; do this for all 1's
  • when all 1's are adjacent and leftmost is in first position, you're done
What does that look like? Here is a sample implementation. 'Refs' is just an array tracking the positions of the '1' elements in the array:

import time; 
ms = time.time()*1000.0

list1 = [0]*10
list2 = [1]*10
tlen = len(list1) + len(list2)

#build list of refs that tracks 1's in the array
refs = [0]*len(list2)
for i in range(len(list2)):
  refs[i] = len(list1) + i
  
root = 2**(len(list2)) - 1

#build dictionary of powers of two that are needed
twos = {}
for i in range(tlen):
  twos[i] = 2**i
  
#run until all refs are in their leftmost position
flag = True
while(flag):
  
  #walk leftmost ref down to zero
  while refs[0] >= 0:
    root += twos[tlen - refs[0] - 1]
    refs[0] -= 1

  #check if all refs in leftmost position; if not, shift according to first ref that can be left-shifted
  for i in range(1, len(refs)):
    if refs[i] > i:
      refs[i] -= 1
      j = i - 1
      while(j >= 0):
        refs[j] = refs[j + 1] - 1
        j -= 1
      root = 0
      for j in refs:
        root += twos[(tlen - j - 1)]
      flag = True
      break
    flag = False
    
print(time.time()*1000.0 - ms)

The 'walk leftmost ref down to zero' loop has a weird 'add 2^(tlen - ref[0] - 1)' line. What is that? Consider the number 011. That's 3. Now consider 101. That's 5. What you did to get from 3 to 5 is drop the 2^1 bit and add a 2^2 bit. 0101 to 1001 is 5 to 9. You dropped a 2^2 and added a 2^3. 2^2 - 2^1 = 2. 2^3 - 2^2 = 4. This continues. 2^(x) - 2^(x - 1) = 2^(x - 1). That line is thus handling that bit swap by just adding the correct 2^x value.

You can write this with essentially identical logic using arrays of 0's and 1's and swapping adjacent indices if you want. That algorithm was actually very slightly faster for me, but is a bit harder to read so I went with this one here.

What's the complexity? Letting N be the length of the combined array and M be the length of the second array, the complexity is O(N!/[(N - M)!*M!]). It might not be obvious, but that's much better than O(2^N). If each array is 25 elements, that's ~1E14 iterations...a full order of magnitude better than 2^N. I've included a plot of those two below assuming N = 2*M:


That also happens to be the number of combinations. Since that scales as the number of combinations and you need to generate all combinations, I don't think a less 'complex' algorithm is possible here. However, there are some computational tricks that you can use that I did not include here because they complicate the code and didn't significantly increase the performance when I tried them. 

An example is that every even result is 2 times one of the other results. Thus, you can stop once you've found all odd results, then simply multiply those by 2 until you reach >= 2^N to get the remaining results. You can also use bit shifts and logical operations, but they also complicate the code more than I wanted for an overview here.

Conclusion

This was a fun problem. It feels like a complex interview question, and it seems like these only pop up a couple of times a year in normal work so it's always a bit exciting when they do.




      edit

0 comments:

Post a Comment