SQL Server, Analytics, .Net, Machine Learning, R, Python
Mitch Wheat has been working as a professional programmer since 1984, graduating with a honours degree in Mathematics from Warwick University, UK in 1986. He moved to Perth in 1995, having worked in software houses in London and Rotterdam. He has worked in the areas of mining, electronics, research, defence, financial, GIS, telecommunications, engineering, and information management. Mitch has worked mainly with Microsoft technologies (since Windows version 3.0) but has also used UNIX. He holds the following Microsoft certifications: MCPD (Web and Windows) using C# and SQL Server MCITP (Admin and Developer). His preferred development environment is C#, .Net Framework and SQL Server. Mitch has worked as an independent consultant for the last 10 years, and is currently involved with helping teams improve their Software Development Life Cycle. His areas of special interest lie in performance tuning
Friday, June 30, 2006
Patterns and Practices: Guidance Explorer
The Patterns and Practices Team continue to have a major impact on software development both inside and outside Microsoft. Their latest offering is Guidance Explorer.
Guidance Explorer allows you to create and distribute a set of standard performance and security best-practices that your team can adhere to.
From J.D Meier's blog: "Guidance Explorer is a new, experimental tool from the patterns & practices team that radically changes the way you consume guidance as well as the way we create it. If you’ve felt overwhelmed looking across multiple sources for good security or performance guidance then Guidance Explorer is the tool for you"
It's currently aimed at ASP.NET, but windows guidelines are apparently in the pipeline. I've just downloaded it, and I might blog my experiences later...
Thursday, June 29, 2006
Visual Studio 2005 Icon Library
Did you know that Visual Studio 2005 ships with a library of standard windows bitmaps, cursors, icons and metafiles which can be freely used in your windows and web applications? It contains Windows, Office, and Visual Studio icons that are licensed for reuse.
You can find it here: C:\Program Files\Microsoft Visual Studio 8\Common7\VS2005ImageLibrary\VS2005ImageLibrary.zip
In addition the .ico files are in multi icon format with the 16x16, 32x32, 48x48 images (and color depth 256, 16bpp, 24bpp) contained in a single file.
If you're an architect or an aspiring architect, check out skyscrapr. The site was recently launched by Microsoft (May, 2006), and plans to cover all aspects of architecture.
Introduction to Test-Driven Development
This is old news but worth mentioning if you haven't already seen the Introduction to Test-Driven Development webcast by Peter Provost, Scott Densmore, Brad Wilson, Brian Button and Ron Jacobs, and you would like to know more about Test-Driven Development (or even if you are a sceptic!) then download and watch this webcast. Not only is this a gentle introduction to Test-Driven Development but it's also quite funny!
Ron Jacobs also hosts ARCast which has some excellent content. Ron is “...Someone who understands what you are thinking… someone who can tell a good joke.” He also seems to have an infectious sense of humor!
Tuesday, June 27, 2006
The .NET Developer's Guide to Identity
Keith Brown of pluralsight.com has published a must read security guide for all .Net developers here on MSDN: The .NET Developer's Guide to Identity.
Monday, June 26, 2006
Simian: A tool for Detecting Similar Code
Simian is a code similarity analyser that can be used to identify duplication in “…any human readable files…”. Simian runs natively in any .NET 1.1 or higher supported environment and on any Java 1.4 or higher virtual machine.
Howard van Rooijen shows how to integrate Simian into Visual Studio here Detecting duplicate code with Simian and also how to make it more usuable here MonkeyWrangler - Making Simian more usable in Visual Studio
To incorporate it into your NAnt automated build scripts, create a simian target:
<property name="Exec.Simian" value="C:\BuildTools\simian-2.2.8\bin\simian-2.2.8.exe"/>
<target name="runSimian" description="Runs Simian to find duplicate code">
The latest version of CruiseControl.Net already contains the necessary .XSL formatter to display the results in the CC.Net dashboard, just point it to the simian.xml output file.
Software Development Must Haves
If you are starting a career in software development, the choice you make for your first job is extremely important. It can make the difference between an average career and one that stands out from the crowd. When you go for an interview, you have to remember that the interview is a two-way process: you need to interview them as well. Finding an environment that will nurture your skills and direct your development, is often more important than simply finding the company that will pay you the most money. The Guerrilla Guide to Interviewing by Joel Spolsky is well worth reading.
The last point requires some explanation: when you are designing code and deciding ‘what the code should look like’ there is no better way than writing down how you envisage consumers (whoever they are) calling your methods. If you put yourself in the place of the consumer of your methods, you will invariably find the best way to phrase the interface of those methods. This is an important design principle when creating software frameworks.
Sunday, June 25, 2006
Long and Short Variable Naming
Darren Neimke has been talking about variable naming and how long variable names should be: Debunking popular myths. I agree that long variable naming can and has been abused but would also like to throw in the following points (this is a edited version of my comments):
I have seen the situation many times when a programmmer will construct a poor abbreviation just because a rigid coding standard enforced that variable names should be at most N characters, and using the more full and descriptive name would have gone over by a few chars (say five too many). So you end up with a 10 character cryptic (or ambiguous) name as opposed to an 18 character descriptive name. I'd definitely prefer to see and read the latter.
In my view, an even bigger give away of regions of code that warrant closer inspection is when there is a mixture of very terse and very verbose variable naming, either because it’s the work of more than one programmer or just one who was unsure of what they were doing.
I agree that really long names are bad for the reasons Darren mentioned, but also for the reason that they make code harder to read, and therefore slower to understand, and therefore harder to maintain.
I guess in the end it’s about common sense; I obviously try to keep variables as short as possible whilst maximising their meaning. My 32-character maximum length rule of thumb is slightly longer than Darren’s, although in practice it would be extremely rare that I would ever name anything that long.
Saturday, June 24, 2006
The Pitfalls of Bubble Sort
Approximately 15 years ago, a few months after joining a new company, I was approached by a programmer who had a problem. He knew that I had some experience in algorithm design and implementation. He told me that an application that had been working fine in testing was now running so poorly in production that it had practically come to a standstill. Although I had not seen the source code, I hazarded an educated guess as to the cause of the problem. I came right out and said “You’re using Bubble Sort aren’t you?” He looked at me a little perplexed, and said “…er Yes. But how did you know! It was working fine during testing”.
The problem only showed up in production because they were using a few hundred items in testing, but production had tens of thousands of items. This comparison table shows the time taken to solve some problem of size N using various algorithms of differing complexity. The actual times are not as important as the way in which the time increases:
(Ignoring constants of proportionality, which in somes cases can cause higher order complexity algorithms to perform better than lower complexity ones when N is small)
BubbleSort is an O(N²) algorithm (best and worst cases). So why does anyone continue to teach the use of Bubble Sort in Colleges and Universities? For just a slightly increased complexity, you can implement Shell sort (named after its creator Donald Shell) which will always outperform BubbleSort and has a worst case performance of O(N^1.5) compared with BubbleSort’s O(N²) behaviour. Shellsort is very fast for small data sets (less than 1000 items).
If you want the fastest possible general purpose sorting algorithm then implement Sedgewick’ s median of three Quicksort, with insertion sorting of small subsets (this implemention removes vanilla Quicksort’s pathological O(N²) behaviour in the presence of almost sorted data).
Perhaps this is a candidate for one of those ‘negative’ interview questions: can you write down the bubblesort algorithm in code. This is a bit like asking a candidate if they can write down the code to describe the use of cursors in T-SQL. In my view, it is definitely a plus for those who can’t and prefer to rely upon (wherever possible) set based constructs instead.
Detecting and Removing Malware
I updated my virus scanner recently and it occurred to me that I haven’t heard anything in the news about a new virus for ages. Have they gone out of fashion or are new ones simply variants of old ones? Or is Microsoft’s security initiative having an effect?
So I had a trawl, and came across a webcast by Mark Russinovich on detecting and removing malware using 3 of the many Sysinternals tools, SigCheck, AutoRuns and ProcessExplorer. These are great tools and are free (as are all of the SysInternals offerings, such as FileMon and RegMon) and knowing how to use them is a valuable addition to any programmer’s toolkit.
You can find the webcast here: Understanding and Fighting Malware: Viruses, Spyware and Rootkits.
Wednesday, June 21, 2006
Recommended Computing Books
I was just about to order Jeffrey Richter’s book “CLR via C#” to supplement my copy of his previous book “Applied .Net Framework”, when I saw the announcement about the new version of the .Net framework, .Net 3.0. At this rate of change, buying platform specific books is becoming less and less appealing and relevant.
I can’t recall who said it but “you can avoid technical obsolescence by choosing timeless books” is great advice. Here’s a list of recommended reading for all software developers:
Code Complete, Second Edition: Steve McConnell. If you’re in the software industry and you only ever read one book, then this is the book you should read. Every developer, regardless of language, platform or domain, should have read this book at least once. There is no single work that contains so much of relevance to developers. At the last count, I’m on my fifth re-read, cover to cover.
Rapid Development: Steve McConnell. If you only ever read two books on software development, make this the second! Keep this on your desk at all times. Buy two copies; one for work and one for home. It will pay for itself many, many times over. If you are beginning a career in software development, this book could short-circuit 5 years of lessons learned on the job.
The Pragmatic Programmer: Andrew Hunt and Dave Thomas.If you are only going to read one book and you want something a little shorter than either Code Complete or Rapid Development, then this is the one. If you loan it to another developer, do not expect to see it again! The first line of the book states “This book will help you become a better programmer”. It will.
Don't Make Me Think: A Common Sense Approach to Web Usability. Steve Krug. Great for web, and equally applicable to windows. Short, easy read, but valuable. A little gem of a book. If you design web sites, this is required reading.
The Inmates are Running the Asylum: Alan Cooper. Discusses real world examples of usability, and is a highly enjoyable read. You probably won't agree with everything (I didn't), but it certainly gets you thinking.
The Medical Detectives: Berton Roueche. Not a computing book, but a great book on the approach to debugging. A good read to boot, although the prose can be a little laboured at times.
Refactoring: Martin Fowler. A great book that takes the reader on a journey through the process of refactoring actual code.
Head First Design Patterns: Elizabeth Freeman and Eric Freeman. This is a truly amazing book. If you want to learn about design patterns and more importantly how to apply the underlying OO design concepts, this is the best book available on the subject. I recently recommended this to several people.
Patterns of Enterprise Application Architecture by Martin Fowler. Coupled with “Head First Design Patterns” this is a superb reference to have to hand.
UML Distilled: Martin Fowler. If you seriously want to learn UML (and do it quickly without struggling) then this is the book to read.
Behind Closed Doors, Secrets of Great Management: Rothman and Derby. Practical advice on managing a software team. Excellent.
Test-Driven Development: Kent Beck. A slim, very readable, hands-on book that introduces and builds upon the concepts of the ‘write tests first’ development approach. Some would say that this is a natural evolution in the way that software should be created.
SQL Tuning: Dan Tow. A new approach to platform independent tuning of SQL queries. Took a while to get into, but well worth the effort.
The Mythical Man Month: Fred Brooks. Perhaps the classic work on managing software development projects. “How does project slip its schedule? One day at a time”
Programming Pearls: John Bently. An oldie, but a goldie! Insights into how algorithms are conceived and implemented. Introduces the concept of ‘back-of-the-envelope’ calculations. Very useful.
Writing Solid Code: Steve Maguire. Aimed at C programmers but full of insights equally applicable to other languages. This book had a profound effect on the way I write code and the approach I take.
The Psychology of Computer Programming: Silver Anniversary Edition by Gerald Weinberg. An insight into the mind of the programmer, also described as “computer programming as a human activity”.
Design Patterns: by Erich Gamma, Richard Helm, Ralph Johnson, John Vlissides. This classic work is currently being updated.
The Guru’s Guide to Transact SQL: Ken Henderson. If you write T-SQL as part of your day-to-day job, then should be the first of several Ken Henderson books you should read.
Programming Windows Security : Keith Brown. Everything you wanted to know about Windows security but were afraid to ask.
The last two are platform specific, but excellent nonetheless.
I started this list of books some time ago, but was prompted to finish and post it by a colleague whose son is studying computer science, and was concerned about what books he should read.
Persona Coding Patterns…
Darren Neimke has posted a blog entry ‘Persona Patterns’, listing coder types; I love the code examples! I’ve seen all of them in practice, including a ‘day coder’ write this (I kid you not):
' INCREMENT I BY 1
i = i + 1
DateTime and ISO Date Format
Do you use ISO date format for transferring dates?
Dates and DateTimes still cause an awful lot of bugs and grief when they should not.
This is old news for those that already know, but if you don't you should visit this link: The ultimate guide to the datetime datatypes.
[Thanks to Vaughan De Vos, and Greg Low who enlightened me some time ago]
Microsoft Announces Robotics Studio
Interested in Robotics? Does seeing ‘Robby the Robot’ wave his arms bring a tear of nostalgia to your eye? NO? Well this might be for you. (Incidently, that wikipedia poster image is definitely not the way I remember him!)
The MSDN link is here, and the blog is here.
Tuesday, June 20, 2006
Free e-Book: SharePoint 2007
Eli Robillard has posted an interesting post for SharePoint developers; a free download of a SharePoint 2007 book.
Sunday, June 18, 2006
Was John Gall the pioneer of agile development? His little known book Systemantics (published in 1977 and currently out of print) has been influential in shaping the views of several prominent practitioners of software development:
"A complex system that works is invariably found to have evolved from a simple system that worked…A complex system designed from scratch never works and cannot be patched up to make it work. You have to start over with a working simple system."
— Systemantics: How Systems Really Work and How They Fail. John Gall
Gall's Law has strong affinities to the practice of agile software development, where under-specification rather than over-specification is the key to success.
‘I’ is for…Interface
Roy Osherove has posted an excellent blog entry Interface Naming - Anything But Java's Standard, Please, on his thoughts on the use of ‘I’ for interfaces. I agree with Roy that using ‘I’ to prefix interfaces is a good idea because it makes code clearer. This is in line with the .Net Framework Design Guidelines by Brad Abrams and Krzysztof Cwalina. You should name interfaces to describe the behaviours they bestow on the implementing class, for example, IPersistable.
I personally believe that the ultimate aim of a programmer is to write code that reads like prose and should be clearly understandable by the reader. Any naming convention or coding standard that leads directly to more understandable code has to be a good thing.
As a young, naïve programmer writing C code over 20 years ago, I, like many others, took delight in writing convoluted, hard to understand code! Writing code with others in mind is not only more productive, but also a sign of maturity:
Any fool can write code that a computer can understand.
Good programmers write code that humans can understand. – Martin Fowler
Another interesting point that Roy mentions, is the reliance on an IDE to understand code through the use of ‘hover’ tooltips. Do you think it is inevitable that we should require tools to understand code, or should language syntax and the printed page be sufficient?
Wednesday, June 14, 2006
Reporting Best Practices: 10 Things Every Report Should Have
Every report should have a footer containing the following:
Monday, June 12, 2006
Intellisense for SQL Server – Free!
I meant to post this a few weeks ago: Red-Gate have made their intellisense for SQL server editors, SQL Prompt, available for free download until September 1st 2006
"SQL Prompt works with Microsoft Query Analyzer, SQL Server 2005 Management Studio, Visual Studio 2005, Visual Studio .NET 2003, SQL Server 2000 Enterprise Manager, UltraEdit32"
No time-bombs, no restrictions!
Go get it!
Sunday, June 11, 2006
Bits and er, Bans
As most people are aware, the term bit is short for ‘binary digit’. The coining of this word is credited to the mathematician John Tukey, sometime in the late nineteen thirties or early forties.
Claude Shannon, the father of information theory, was a colleague of John Tukey at Bell Labs, and his way of defining the bit was the amount of information required to distinguish between two equally probable outcomes. Around the same time, the British cryptographer, Alan Turing, had also come up with an idea which represented the amount of evidence that made a guess ten times more likely to be true. He called this unit the ban. (Although I suppose that had this been the ‘winning’ formulation, the dit or ‘decimal digit’ might have been coined!)
These historical insights into the information age and many more can be found in “Fortune’s Formula” by William Poundstone. This is a great read featuring gamblers, mathematicians and gangsters with some classic one liners: “In 1974…A computer was something you saw in a movie (often it went berserk and killed people).”
SQL Server 2005 Database Snapshots
I recently had one of those “ahha!” moments with the new SQL Server 2005 Database Snapshot feature (not to be confused with the new transaction Snapshot Isolation mode). Dr Greg Low gave an overview of this great feature at the Perth .Net User Group last year, and I was going over some notes and e-learning material.
When you create a snapshot of a database, SQL Server 2005 efficiently creates a NTFS sparse file that initially contains effectively no data.
When you read data from the snapshot, SQL Server checks to see if the page the data resides upon exists in the snapshot. If it does, it serves the page from the snapshot; otherwise it serves the page from the original database.
How do pages appear in the snapshot? Each time a write is made to the original database, SQL Server checks if the page is already in the snapshot, if not it copies the page into the snapshot BEFORE the write is made to the original, thus preserving the point in time snapshot of the data. For some reason, I had wrongly assumed the page was copied after the write, but that just did not make much sense.
Database snapshots can be applied wherever you want to preserve a point in time state of a database. An excellent example of using a snapshot is point in time reporting. This is just one of many reasons why you should consider upgrading to SQL Server 2005.
Saturday, June 10, 2006
Vault NAnt task
If you are using NAnt for your build scripts and using SourceGear vault as your SCC repository, please be aware that the default behaviour of the Vault NAnt task
I have a single generic build script which I run on the build server and use on developer machines, so that a developer can simply type 'nant' at the command prompt in a solution folder to perform a complete build. (This is much quicker than opening the Visual Studio IDE to perform a build)
See the SourceGear Vault support site for details:
This will be addressed in the next release 3.5 (ETA approx. July 2006)
More on NAnt soon...
Tuesday, June 06, 2006
User Interface Design Books
A while ago on the Stanski ausdotnet list, someone posted a question asking for book recommendations on User Interface design. Here's my recommended list in no particular order:
The Design of Everyday Things : excellent common sense reference, great for getting into the mindset of good design, especially the ideas of visual clues (like push door with plate, pull door with handle - reminds me of that Farside cartoon of the "school for the gifted"!)
Don't Make Me Think: A Common Sense Approach to Web Usability : Great for web, equally applicable to windows. Short, easy read, but valuable. A little gem of a book. If you design web sites, this is required reading.
About Face 2.0 and The Inmates are running the Asylum both by Alan Cooper: I would highly recommended "Inmates", which discusses some real world examples of usability, and is a highly enjoyable read. You probably won't agree with everything (I didn't), but it gets you thinking...
Joels Spolsky's book User Interface Design for Programmers (most of which (if not all) is available free on his web site Joel on Software)
Good News Everyone!
OK I admit it. I'm a recently converted futurama nut!
Last week I sat and passed 70-443, which completes my MCITP (Database Admin SQL Server 2005). Thanks to everyone who gave me encouragement. As exams go it wasn't the easiest or hardest exam I've ever sat, but it did require some serious concentration for a couple of hours.
Considering I'm a developer, there is a slight irony that I should complete the DB Admin side first; just 70-441 remaining for developer certification.
If you're in two minds about certification, my advice would be if you know your stuff, try to get your employer to pay for an exam or two and just sit them. What have you got to lose?
Monday, June 05, 2006
One of concepts I firmly believe in is continuous improvement. If you’re a software developer, you need to constantly improve and keep up with the software industry’s direction. One of the things I strive to do is read one book a month; they are not always software books and I don’t always manage one a month, but the point is the principle. I’m currently reading Bob Walsh’s “Micro ISV: From Vision To Reality”. I heard about this book from Joel Spolsky’s site and Eric Sink’s blog, both of which are worth reading if you’re serious about software development.
Whilst this book is primarily aimed at someone about to take the plunge and go it alone in the software industy, it has excellent advice for developers in general, and it is well written. It covers a range of topics from branding and selling to development infrastucture.
In the past I’ve applied the idea of branding even to in-house software applications and I think people take in-house applications more seriously when they can identify them easily.
MSN, Email: mitch døt wheat at gmail.com