Saturday, June 24, 2006

 

The Pitfalls of Bubble Sort

Approximately 15 years ago, a few months after joining a new company, I was approached by a programmer who had a problem. He knew that I had some experience in algorithm design and implementation. He told me that an application that had been working fine in testing was now running so poorly in production that it had practically come to a standstill. Although I had not seen the source code, I hazarded an educated guess as to the cause of the problem. I came right out and said “You’re using Bubble Sort aren’t you?” He looked at me a little perplexed, and said “…er Yes. But how did you know! It was working fine during testing”.

The problem only showed up in production because they were using a few hundred items in testing, but production had tens of thousands of items. This comparison table shows the time taken to solve some problem of size N using various algorithms of differing complexity. The actual times are not as important as the way in which the time increases:









































Problem SizeNlogNN

100

3.5 secs

0.19 secs

0.05 secs

0.003 secs

1000

1 hour

10 secs

0.46 secs

0.033 secs

10000

38 days

25 minutes

6 secs

0.33 secs

100000

100 years

1.5 days

1 minute

3.3 secs

1000000

100 million years!

5 months

13 minutes

33 secs


(Ignoring constants of proportionality, which in somes cases can cause higher order complexity algorithms to perform better than lower complexity ones when N is small)

BubbleSort is an O(N²) algorithm (best and worst cases). So why does anyone continue to teach the use of Bubble Sort in Colleges and Universities? For just a slightly increased complexity, you can implement Shell sort (named after its creator Donald Shell) which will always outperform BubbleSort and has a worst case performance of O(N^1.5) compared with BubbleSort’s O(N²) behaviour. Shellsort is very fast for small data sets (less than 1000 items).

If you want the fastest possible general purpose sorting algorithm then implement Sedgewick’ s median of three Quicksort, with insertion sorting of small subsets (this implemention removes vanilla Quicksort’s pathological O(N²) behaviour in the presence of almost sorted data).

Perhaps this is a candidate for one of those ‘negative’ interview questions: can you write down the bubblesort algorithm in code. This is a bit like asking a candidate if they can write down the code to describe the use of cursors in T-SQL. In my view, it is definitely a plus for those who can’t and prefer to rely upon (wherever possible) set based constructs instead.


    

Powered by Blogger