Mitch Wheat - Treat the cause, not the symptoms!: March 2011

About

SQL Server, Analytics, .Net, Machine Learning, R, Python

Monday, March 28, 2011

Useful Codeplex Libraries

I recently came across two useful codeplex libraries. The first enables you to create advanced Excel 2007/2010 spreadsheets without Office installed, EppPlus:

EPPlus is a .Net library that reads and writes Excel 2007/2010 files using the Open Office Xml format (xlsx).

The second is a .NET wrapper of the ImageMagick image manipulation API, ImageMagick.NET

# posted by Mitch Wheat @ 8:41 am 0 comments

Sunday, March 27, 2011

Quotes: When there is no…

When there is no wind ... ROW! - Latin Proverb

When there is no peril in the fight there is no glory in the triumph. - Pierre Corneille

When there is no enemy within, the enemies outside cannot hurt you. - African Proverb (often attributed to Winston Churchill)

Real love is a pilgrimage. It happens when there is no strategy, but it is very rare because most people are strategists. – Anita Brookner

Perfection is finally attained not when there is no longer anything to add, but when there is no longer anything to take away - Antoine de Saint Exupéry

Laws and principles are not for the times when there is no temptation: they are for such moments as this, when body and soul rise in mutiny against their rigour; stringent are they; inviolate they shall be. - Jane Eyre in Charlotte Bronte.

When there is no vision the people perish. - Franklin Delano Roosevelt (from his first inaugural address)

It is hard to apply oneself to study when there is no money to pay for food and lodging. - Zora Neale Hurston

# posted by Mitch Wheat @ 8:57 am 0 comments

Saturday, March 26, 2011

Microsoft Virtual Academy

Microsoft (and just about everyone else) has a cloud focus these days, so it's no surprise that Microsoft Learning has also ventured there. Last week saw the official debut of the Microsoft Virtual Academy.

The Microsoft Virtual Academy is targeted at the cloud, with self-paced courses for learning more about private and public clouds, to more specific coverage of Windows Azure and SQL Azure.

MVA requires membership, but it's free. Check out the FAQ here to find out more and get started.

# posted by Mitch Wheat @ 3:02 pm 0 comments

Wednesday, March 09, 2011

Visual Studio 2010 Service Pack 1 Released

Visual Studio 2010 SP1 RTM has been released for MSDN subscribers! Get it here

It will be made generally available over the next day or so.

# posted by Mitch Wheat @ 12:24 pm 0 comments

Monday, March 07, 2011

Binary Search: A Cautionary Tale!

Practically every developer knows what binary search is: a simple (indeed fundamental) searching algorithm which is an example of a natural divide-and-conquer strategy. Basically: starting with an already sorted array, compare the middle element of the array with the value we want to find. If the values are the same we are done, else if the array element is larger, repeat in the remaining lower half of the array, else if the array element is smaller, repeat in the remaining upper half of the array.

Unsurprisingly, binary search is an often used interview question (although perhaps less so, due to increasing complexity and the proliferation of programming frameworks; I must admit I’ve never personally been asked about it in an interview). It is a basic technique that you can reasonably expect every reasonable candidate to know and can be implemented in just a few lines of code.

Despite binary search’s simplicity, it is easy to implement it incorrectly!

Jon Bentley, in his book Programming Pearls (a programming classic), wrote that in a course he ran for professional programmers, he asked the participants to code binary search and found that 90% failed to implement it correctly. If you asked 100 programmers to write an implementation of binary search a large number of them would be incorrect and many of those those that apparently worked would actually contain subtle flaws. Indeed, many published implementations of binary search are wrong:

“a study reported that it is correctly implemented in only five out of twenty textbooks.”

Indeed, Jon Bentley’s own implementation of binary search, published in the ‘Writing Correct Programs’ chapter of Programming Pearls, contained a subtle bug that remained undetected for over twenty years. Here is an iterative C# implementation, adapted from Bentley’s pseudocode, that contains the bug (can you spot it?):

public static int BinarySearch(int[] sortedArray, int valueToFind)
{
    int lower = 0;
    int upper = sortedArray.Length - 1;
    int m;
 
    while (lower <= upper)
    {
        m = (lower + upper) / 2;
        if (sortedArray[m] < valueToFind)
        {
            lower = m + 1;
        }
        else if (sortedArray[m] == valueToFind)
        {
            return m;
        }
        else
        {
            upper = m - 1;
        }
    }
 
    return -1;
}

…Scroll down…

The bug is in the mid-point assignment

m = (lower + upper)/2

which performed as a direct sum could lead to an integer overflow. It should be replaced by the identically equivalent safe expression

m = lower + (upper – lower)/2

This bug was highlighted in Chapter 1 of Algorithms for Interviews, IMO, a slightly misnamed book which could perhaps be better described as ‘Algorithms to make programmers think’, where the emphasis is on describing problems and getting the reader to attempt to solve them.

[Side Note: A quick check with Reflector shows that this is implemented correctly without the possible overflow flaw in .NET 4, in fact the developers have gone put of their way to make sure someone doesn’t introduce this bug by encapsulating the mid-point division in its own Method GetMedian()]

# posted by Mitch Wheat @ 10:03 am 0 comments

Thursday, March 03, 2011

SQL Server: Optimizing tempdb Performance : Moving tempDB to a New Location

The size of tempdb can affect the performance of a system. For example, if the tempdb size is too small, the system processing could be too occupied with autogrowing the database to support your workload requirement every time that you start SQL Server. You can avoid this overhead by pre-sizing tempdb. For more information, see Optimizing tempdb Performance and Capacity Planning for tempdb.

Optimizing tempdb Performance

Preallocate space for all tempdb files by setting the file size to a value large enough to accommodate the typical workload for the environment. This prevents tempdb from expanding too frequently, which can adversely affect performance. The tempdb database should be also set to autogrow, just in case of unplanned growth. But ideally, any tempDB expansions should be rare.
Put tempdb on a fast I/O subsystem. Use RAID 10 if there are sufficient directly attached disks available. Consider using directly attached Solid State Disks (SSD), such as a FusionIO drive.
Create multiple data files to maximize disk bandwidth. Using multiple data files can reduce tempdb contention and yields significantly better scalability. However, do not create too many files because this can reduce performance and increase management overhead. As a general guideline, create as many data files as 1/2 – 1/4 times the number of CPU cores. [Note that a quad-core CPU is considered to be four CPUs.]
Make each data file exactly the same size; this allows for optimal proportional-fill performance. Consider tuning on Trace Flag -T1116
Put the tempdb database on disks (physical spindles) separate from those used by user databases.

If you're seeing PAGELATCH (not PAGEIOLATCH) waits on tempdb, then you can mitigate these using trace flag 1118 and creating multiple tempdb data files. Paul Randal wrote a blog post debunking some myths around this trace flag and why it's still potentially required in SQL 2005 and 2008 - Misconceptions around TF 1118.

-- 1.Determine the logical file names of the tempdb database and current location on disk. 

SELECT name, physical_name AS CurrentLocation
FROM sys.master_files
WHERE database_id = DB_ID(N'tempdb');
GO


-- 2.Change the location of each file using ALTER DATABASE.

USE master;
GO
ALTER DATABASE tempdb 
MODIFY FILE (NAME = tempdev, FILENAME = 'D:\Data\tempdb.mdf', SIZE = 512MB, FILEGROWTH = 128MB);
GO
ALTER DATABASE tempdb 
MODIFY FILE (NAME = templog, FILENAME = 'D:\Logs\templog.ldf', SIZE = 64MB, FILEGROWTH = 64MB);
GO


-- 3.Stop and restart the instance of SQL Server. 
net stop "SQL Server" 
-- or 
net stop "SQL Server (namedinstance)"


-- 4.Verify the file change.

SELECT name, physical_name AS CurrentLocation, state_desc
FROM sys.master_files
WHERE database_id = DB_ID(N'tempdb');

# posted by Mitch Wheat @ 7:49 am 0 comments

Wednesday, March 02, 2011

How to Write Without Writing

I’ve long admired (perhaps even envied) Jeff Atwood’s ability to craft interesting, informative and thought provoking articles on his ‘Coding Horror’ blog. His recent post How to Write Without Writing is a great example why.

Over the last 6 years, I've come to believe deeply in the idea that that becoming a great programmer has very little to do with programming. Yes, it takes a modicum of technical skill and dogged persistence, absolutely. But even more than that, it takes serious communication skills:

The difference between a tolerable programmer and a great programmer is not how many programming languages they know, and it's not whether they prefer Python or Java. It's whether they can communicate their ideas. By persuading other people, they get leverage. By writing clear comments and technical specs, they let other programmers understand their code, which means other programmers can use and work with their code instead of rewriting it. Absent this, their code is worthless. [Joel Spolsky]

The ability to inform, entertain and present connected ideas is a wonderful skill.

Personally, I’ve always found writing difficult (though perhaps less so in the last 5 years).The main reason I started blogging was to improve my writing (and also to record things I find useful and tend to misplace!). If you set yourself a goal of writing a short blog post once a week, you are forcing yourself to write something. The more often you write, the more likely your writing skills will improve.

Some time ago I answered a question on StackOverflow: Should programmers be able to write clearly? I firmly believe the answer to that question is “Yes”. If you can’t communicate and express yourself clearly, how can you write correct code?

# posted by Mitch Wheat @ 5:52 pm 0 comments

Contact

MSN, Email: mitch døt wheat at gmail.com

Mitch Wheat - Treat the cause, not the symptoms!