SQL Server, Analytics, .Net, Machine Learning, R, Python
Mitch Wheat has been working as a professional programmer since 1984, graduating with a honours degree in Mathematics from Warwick University, UK in 1986. He moved to Perth in 1995, having worked in software houses in London and Rotterdam. He has worked in the areas of mining, electronics, research, defence, financial, GIS, telecommunications, engineering, and information management. Mitch has worked mainly with Microsoft technologies (since Windows version 3.0) but has also used UNIX. He holds the following Microsoft certifications: MCPD (Web and Windows) using C# and SQL Server MCITP (Admin and Developer). His preferred development environment is C#, .Net Framework and SQL Server. Mitch has worked as an independent consultant for the last 10 years, and is currently involved with helping teams improve their Software Development Life Cycle. His areas of special interest lie in performance tuning
Monday, March 07, 2011
Binary Search: A Cautionary Tale!
Practically every developer knows what binary search is: a simple (indeed fundamental) searching algorithm which is an example of a natural divide-and-conquer strategy. Basically: starting with an already sorted array, compare the middle element of the array with the value we want to find. If the values are the same we are done, else if the array element is larger, repeat in the remaining lower half of the array, else if the array element is smaller, repeat in the remaining upper half of the array.
Unsurprisingly, binary search is an often used interview question (although perhaps less so, due to increasing complexity and the proliferation of programming frameworks; I must admit I’ve never personally been asked about it in an interview). It is a basic technique that you can reasonably expect every reasonable candidate to know and can be implemented in just a few lines of code.
Despite binary search’s simplicity, it is easy to implement it incorrectly!
Jon Bentley, in his book Programming Pearls (a programming classic), wrote that in a course he ran for professional programmers, he asked the participants to code binary search and found that 90% failed to implement it correctly. If you asked 100 programmers to write an implementation of binary search a large number of them would be incorrect and many of those those that apparently worked would actually contain subtle flaws. Indeed, many published implementations of binary search are wrong:
Indeed, Jon Bentley’s own implementation of binary search, published in the ‘Writing Correct Programs’ chapter of Programming Pearls, contained a subtle bug that remained undetected for over twenty years. Here is an iterative C# implementation, adapted from Bentley’s pseudocode, that contains the bug (can you spot it?):
public static int BinarySearch(int sortedArray, int valueToFind)
int lower = 0;
int upper = sortedArray.Length - 1;
while (lower <= upper)
m = (lower + upper) / 2;
if (sortedArray[m] < valueToFind)
lower = m + 1;
else if (sortedArray[m] == valueToFind)
upper = m - 1;
The bug is in the mid-point assignment
m = (lower + upper)/2
which performed as a direct sum could lead to an integer overflow. It should be replaced by the identically equivalent safe expression
m = lower + (upper – lower)/2
This bug was highlighted in Chapter 1 of Algorithms for Interviews, IMO, a slightly misnamed book which could perhaps be better described as ‘Algorithms to make programmers think’, where the emphasis is on describing problems and getting the reader to attempt to solve them.
[Side Note: A quick check with Reflector shows that this is implemented correctly without the possible overflow flaw in .NET 4, in fact the developers have gone put of their way to make sure someone doesn’t introduce this bug by encapsulating the mid-point division in its own Method GetMedian()]
MSN, Email: mitch døt wheat at gmail.com