SQL Server, Analytics, .Net, Machine Learning, R, Python
Mitch Wheat has been working as a professional programmer since 1984, graduating with a honours degree in Mathematics from Warwick University, UK in 1986. He moved to Perth in 1995, having worked in software houses in London and Rotterdam. He has worked in the areas of mining, electronics, research, defence, financial, GIS, telecommunications, engineering, and information management. Mitch has worked mainly with Microsoft technologies (since Windows version 3.0) but has also used UNIX. He holds the following Microsoft certifications: MCPD (Web and Windows) using C# and SQL Server MCITP (Admin and Developer). His preferred development environment is C#, .Net Framework and SQL Server. Mitch has worked as an independent consultant for the last 10 years, and is currently involved with helping teams improve their Software Development Life Cycle. His areas of special interest lie in performance tuning
Saturday, June 24, 2006
The Pitfalls of Bubble Sort
Approximately 15 years ago, a few months after joining a new company, I was approached by a programmer who had a problem. He knew that I had some experience in algorithm design and implementation. He told me that an application that had been working fine in testing was now running so poorly in production that it had practically come to a standstill. Although I had not seen the source code, I hazarded an educated guess as to the cause of the problem. I came right out and said “You’re using Bubble Sort aren’t you?” He looked at me a little perplexed, and said “…er Yes. But how did you know! It was working fine during testing”.
The problem only showed up in production because they were using a few hundred items in testing, but production had tens of thousands of items. This comparison table shows the time taken to solve some problem of size N using various algorithms of differing complexity. The actual times are not as important as the way in which the time increases:
(Ignoring constants of proportionality, which in somes cases can cause higher order complexity algorithms to perform better than lower complexity ones when N is small)
BubbleSort is an O(N²) algorithm (best and worst cases). So why does anyone continue to teach the use of Bubble Sort in Colleges and Universities? For just a slightly increased complexity, you can implement Shell sort (named after its creator Donald Shell) which will always outperform BubbleSort and has a worst case performance of O(N^1.5) compared with BubbleSort’s O(N²) behaviour. Shellsort is very fast for small data sets (less than 1000 items).
If you want the fastest possible general purpose sorting algorithm then implement Sedgewick’ s median of three Quicksort, with insertion sorting of small subsets (this implemention removes vanilla Quicksort’s pathological O(N²) behaviour in the presence of almost sorted data).
Perhaps this is a candidate for one of those ‘negative’ interview questions: can you write down the bubblesort algorithm in code. This is a bit like asking a candidate if they can write down the code to describe the use of cursors in T-SQL. In my view, it is definitely a plus for those who can’t and prefer to rely upon (wherever possible) set based constructs instead.
MSN, Email: mitch døt wheat at gmail.com