Mitch Wheat - Treat the cause, not the symptoms!: May 2007

About

SQL Server, Analytics, .Net, Machine Learning, R, Python

Saturday, May 26, 2007

There's RAD and There's RAD!

I came across this quote in a post by Scott Allen:

"It doesn't matter what constraints were in place when you cobbled the code
together – it will end up in production for at least 5 years"

# posted by Mitch Wheat @ 7:43 am 0 comments

Wednesday, May 23, 2007

VB6 Interop Forms Toolkit 2.0 Released

If you are still in the VB6 world and have a little .NET envy, a new version of the VB6 Interop Forms Toolkit has been released which you can download from here.

# posted by Mitch Wheat @ 8:21 pm 0 comments

SQL Server: Fastest Way to Create a T-SQL datetime?

Saw this technique to create datetime values over at SQLTeam:

DECLARE @Year smallint

DECLARE @Month smallint

DECLARE @Day smallint

SET @Year  = 2007

SET @Month = 5

SET @Day   = 24

SELECT DATEADD(month, 12 * @Year - 22801 + @Month, @Day - 1)

(Due to Michael Valentine Jones and Peter Larsson).

# posted by Mitch Wheat @ 7:50 pm 0 comments

Tuesday, May 22, 2007

Blogs Blogs Blogs!

Yesterday I passed a mini-milestone; it's one year since I started this blog and over 300 posts later I'd like to say thank you to everyone who drops by occasionally. OK, I know it's nothing groundbreaking, but personally I find blogging is a great way to force yourself to write things up and also have a central place to store snippets.

Quite a few people have offered me encouragement and help along the way (you know who you are!) so in the same spirit I'd like to mention that fellow Perth'ite Keith Woods has started blogging. He already has some good info on his blog, so why not click and have a look!

# posted by Mitch Wheat @ 10:37 pm 0 comments

SQL Server: Compare Date Part of DateTime Value

This is not new, but I wanted to record it for my own future reference.

Developers are sometimes caught when creating WHERE clauses that compare SQL Server datetime column values that have a time portion, such as '2007-05-20 21:32:33.437' as opposed to '2007-05-20 00:00:00'. For instance, if you were to select all orders that had an OrderDate >= '2007-05-10 21:32:33.437', then all rows having the value of that date but a smaller time portion would fail the test.

There are quite a few ways to extract just the date portion of a datetime value, but several suffer from the drawback that they apply a formula to the column being tested and greatly reduce the possibility that an index can be used (i.e. non searchable arguments). The following converts @SomeDateParam to have a zero time portion in a test of all order dates greater than or equal to a given datetime:

...
WHERE DateColumn >= DATEADD(day, DATEDIFF(day, 0, @SomeDateParam), 0)

Of course, not every comparison criteria can be SARG'ed that way, but many can. How you apply such formulas depends on the type of comparison.

# posted by Mitch Wheat @ 9:28 pm 0 comments

Wednesday, May 16, 2007

Save and Restore Files/Images to SQL Server Database

The usual consensus is that it is bad idea to store large files or images in a database, and it is recommended that you instead store the file system path to the image in the database and store each image as a separate file in a suitable folder structure. This helps to reduce the size of the database and keeps it performing efficiently. Despite the disadvantages, there are some advantages in storing images in the database such as keeping all data in a single place.

Storing a file system path is the preferred technique in many situations, but if you have to store the images (such as icons) in an SQL Server database, you should create a separate table containing the image/binary data column (and ideally place it in a different filegroup residing on a separate physical disk). Then reference the images from your main table using a foreign key. This will help to reduce wasted space and improve performance.

There are quite a few partial solutions and code snippets around to solve this problem, and I’ve helped answer this question a few times, both on and off the forums. So I decided to write an easy to reuse solution with simple semantics, and have the ability to automatically compress data going in and out of the database. [The download link is towards the end of this post].

The goal was to be able to simply store a file in the database and receive back an ID so it can be used to reference and retrieve the file:

// Create a FileStore object

DBFileStore dbf = new DBFileStore(connectionString, CompressMode.Compressed);

// Add file to DB

Stored storeID = dbf.Store(filename);

// …OR since returned Stored object is never null (but Id might be zero)

int fileID = dbf.Store(filename).Id;

//...

// Restore file from db to file system with original filename

string localfilename = dbf.Restore(fileID, @"C:\temp");

Originally, I had the Store() method just returning a plain int, but then decided that I wanted the GUID value returned as well. It should be easy to revert the code if you wish…

At the heart of the Store() methods is the following private method that actually does the write to the database:

private Stored Store(byte[] bytearray, int origLength,

                     string name, int? parentID, int? refID,

                     CompressMode compress)

    int fileID = 0;

    Guid guidID = new Guid();

    Stored retID = new Stored(fileID, guidID);

    // Execute stored proc to add data to repository

    using (SqlConnection conn = new SqlConnection(this.connectionString))

        using (SqlCommand cmd = conn.CreateCommand())

            cmd.CommandText = "AddToRepository";

            cmd.CommandType = CommandType.StoredProcedure;

            cmd.Parameters.AddWithValue("@filename", Path.GetFileName(name));

            cmd.Parameters.AddWithValue("@length", origLength);

            cmd.Parameters.AddWithValue("@compressedlength", bytearray.Length);

            cmd.Parameters.AddWithValue("@binarydata", bytearray);

            cmd.Parameters.AddWithValue("@refID", refID);

            cmd.Parameters.AddWithValue("@parentID", parentID);

            cmd.Parameters.AddWithValue("@iscompressed",

                (compress == CompressMode.Compressed) ? true : false);

            // Add output parameters

            SqlParameter p = cmd.Parameters.Add("@ID", SqlDbType.Int);

            p.Direction = ParameterDirection.Output;

            p = cmd.Parameters.Add("@GUID", SqlDbType.UniqueIdentifier);

            p.Direction = ParameterDirection.Output;

            // No retry logic; any Exceptions bubble up to caller...

            conn.Open();

            cmd.ExecuteNonQuery();

            conn.Close();

            // Retrieve output values

            fileID = (int)cmd.Parameters["@ID"].Value;

            guidID = (Guid)cmd.Parameters["@GUID"].Value;

            retID = new Stored(fileID, guidID);

    return retID;

Similarly, the corresponding method that decompresses and writes to the file system:

/// <summary>

/// Restores a saved file from Repository to file system.

/// </summary>

/// <param name="fileID">Integer ID of the file to restore</param>

/// <param name="filepath">Either the folder which the original file

/// is to be restored to, OR the "full file path" + "\" + "new name"</param>

/// <returns>The full filename of the restored file/data</returns>

public string Restore(int fileID, string folderfilepath)

    string filename = String.Empty;

    // Execute stored proc to read data from repository

    using (SqlConnection conn = new SqlConnection(this.connectionString))

        using (SqlCommand cmd = conn.CreateCommand())

            cmd.CommandText = "LoadFromRepository";

            cmd.CommandType = CommandType.StoredProcedure;

            cmd.Parameters.AddWithValue("@ID", fileID);

            conn.Open();

            using (SqlDataReader rdr =

                       cmd.ExecuteReader(CommandBehavior.CloseConnection))

                if (rdr.Read())

                    filename = SaveToFileSystem(rdr, folderfilepath);

    return filename;

private string SaveToFileSystem(SqlDataReader rdr, string folderfilepath)

    byte[] binaryData = (byte[])rdr["binarydata"];

    string dbFilename = rdr["filename"].ToString();

    int length = (int)rdr["length"];

    bool isCompressed = (bool)rdr["iscompressed"];

    DateTime CreationTime;

    DateTime LastWriteTime;

    DateTime.TryParse(rdr["FileCreatedDate"].ToString(), out CreationTime);

    DateTime.TryParse(rdr["FileModifiedDate"].ToString(), out LastWriteTime);

    string filename;

    if (Directory.Exists(folderfilepath))

        // Folder only passed: restore as folder + original name...

        filename = Path.Combine(folderfilepath, dbFilename);

    else

        // Caller has specified exact folder and filename we should restore as...

        filename = folderfilepath;

    using (FileStream fs = new FileStream(filename, FileMode.CreateNew, FileAccess.Write))

        if (isCompressed)

            MemoryStream ms = new MemoryStream(binaryData);

            byte[] decompressedData = new byte[length];

            using (GZipStream cgz = new GZipStream(ms, CompressionMode.Decompress))

                cgz.Read(decompressedData, 0, length);

            fs.Write(decompressedData, 0, length);

        else

            fs.Write(binaryData, 0, length);

    // Set file creation/modified date and time if these exist...

    FileInfo fi = new FileInfo(filename);

    if (CreationTime != DateTime.MinValue)

        fi.CreationTime = CreationTime;

    if (LastWriteTime != DateTime.MinValue)

        fi.LastWriteTime = LastWriteTime;

    return filename;

In addition, I’ve created several Store() method overloads that allow a parent ID and a reference ID to be stored allowing grouping and linking of stored items respectively.

The solution provided here is 100% working, includes several unit tests and all the C# source code which is fairly well commented. Click on this link to download the zipped solution files. [Standard disclaimer applies: use at own risk, no liability or responsibilty taken for any loss or damage, what you do with the downloaded files, you do so at your own risk etc.]

If you have questions, comments, suggestions or bug fixes please leave a comment. Hope you find this useful.

Points to note:

You may want to change the database table names. In which case you’ll need to alter the T-SQL scripts and code.
There are a few places in the SQL scripts where I’ve defined and commented out alternatives between SQL Server 2000 and 2005. For instance: On SQL Server 2000, the column type used to store the binary data is defined as ‘image’, whereas on SQL Server 2005 the preferred type is ‘varbinary(max)’.
In SQL Server 2005 you can make use of the OUTPUT keyword instead of a separate T-SQL call to SCOPE_IDENTITY() to obtain the ID of the most recently added row.
The project’s T-SQL scripts do not contain functionality for setting up the tables on a separate filegroup. Talk to your DBA…
If you don’t need the GUID functionality, you can delete all code relating to this column and remove the column from the T-SQL scripts.
There are several unit tests included with the solution. To run the NUnit tests from within Visual Studio 2005, you will need to download and install TestDriven.NET
The code as it stands has a maximum limit of 2^31 -1 bytes for file sizes; if this isn’t enough then you should ask yourself whether the database is the best place to store those files!
All exceptions are bubbled up to caller. Retries of saving to DB are left up to caller.
Strongly-types datasets have NOT been used, and these are the prefered method when dealing with more complex data.

If you are interested, here is a list of possible enhancements.

Suggestion for Further Enhancements:

Version 3.0 (or is it 3.5?) of the .NET framework will have the ability to create compressed data that is directly compatible with being opened with standard Zip applications. This would mean having the option to restore a file from the repository in a compressed (zipped) state.
Create a method to return a response stream for ASP.NET file streaming (including ContentType such as “image/gif”).
Minimise memory usage (use buffering). Less explicit allocation of large byte[] arrays and taking advantage of ReadByte() and WriteByte() methods on streams.
Having the option to use the DeflateStream in preference to GzipStream.

# posted by Mitch Wheat @ 10:53 pm 0 comments

Monday, May 14, 2007

Alik Levin’s Blog

I love it when I discover a new blog with great content. I came across Alik Levin’s quite by accident after he left a comment on my blog. As far as I can tell, he has not been blogging long, but he is really pumping out some excellent articles. Several of his recent posts stand out for me: How to Use Outlook 2007 RSS To Effectively Aggregate And Distill Information, ARCast With Ron Jacobs - Defending the Application (kudos!), and Why I Blog. It seems we both practice My Pipeline Is My Inbox! I definitely recommended checking out his blog.

# posted by Mitch Wheat @ 10:16 am 0 comments

Sunday, May 13, 2007

Regular Expression Resources: Update

In February, I posted a short regular expression tool roundup and I have just updated the list of resources with a link to this site, Regular-Expressions.info which has great explanations of the various regular expression constructs. The examples and tutorials pages are also worth checking out.

If you have any links to good regular expression examples, please feel free to share them.

# posted by Mitch Wheat @ 3:57 pm 0 comments

Friday, May 11, 2007

Perth .NET User Group, June Meeting...

Nick has mentioned that the Perth .NET user group has a soon to be revealed, mystery presenter coming in June (Thursday, 7th June). I can reveal a few more clues. Any poor puns are entirely my fault!:

He's a dynamic kind of guy
He's been a program manager and developer
He's been known to get excited by G flat!
If he's late, he's usually bound!

On reflection, you'll definitely enjoy his talk. This is an exciting event, and one I urge you not to miss if you're in Perth on the 7th June.

The venue and more details will be announced via the UG feed, so to make sure you don't miss it, add it to your RSS reader...

# posted by Mitch Wheat @ 10:40 pm 0 comments

Wednesday, May 09, 2007

Visual Basic 10: VBx and the Dynamic Language Runtime

Since the introduction of .NET, I’ve thought there was an empty niche just waiting for another VB. With Microsoft’s recent resurgence of interest in dynamic languages and notable hirings in this area (John Lam for instance), it was not really a surprise to see this post “Visual Basic 10 is in the works”.

The working name is VBx. Personally I think they should call it “VB on Rails”! Joking aside, I think this could be the next big thing for Microsoft. Who knows, maybe the .NET Framework will go the way of MFC… I do hope they pick up the Ruby on Rails mantra of “Convention over Configuration”

# posted by Mitch Wheat @ 10:05 pm 0 comments

Steve McConnell has Started Blogging

Steve McConnell starting blogging must have slipped quietly under the radar back in March. The software industry would be less without Steve's many contributions. I hope he finds time to keep it up.

# posted by Mitch Wheat @ 8:10 pm 0 comments

Sunday, May 06, 2007

Notes on Memory Management in the .NET Framework

Nothing new in this article, but it's explained very clearly and I need a quick reference for the next time someone asks me about this topic and I've forgotten it!

# posted by Mitch Wheat @ 6:18 pm 0 comments

Friday, May 04, 2007

Multi-Cores and .NET Threading

My first job after university involved designing and writing parallel algorithms, and over the intervening 20 years I’ve always taken a keen interest in the subject. I’d always thought that by now, desktop PCs would contain upwards of 32 processors, whereas 2 processors are only just becoming commonplace. At the March 2007 MVP summit, Bill Gates said that parallel programming will be one of the big new challenges facing the .NET development programming community:

“…the ability to take multiple processors and use them in parallel has been a
programming challenge going back many, many decades, so now it's important
that we actually solve that problem, and make it possible for developers of all
types to take advantage of these multi-core devices.”

May’s 2007 issue of MSDN magazine has an excellent article on Reusable Parallel Data Structures and Algorithms by Joe Duffy, a renowned developer in the .NET threading arena. He has an upcoming book “Concurrent Programming on Windows”, due to be released by Addison Wesley sometime in 2007. One to watch out for...

Over at Michael Suess’s ThinkingParallel blog, he’s been running a series of interviews with parallelism industry leaders in different environments, including Ten Questions with Joe Duffy about Parallel Programming and .NET Threads.

The following are all good resources on how to get started for developers new to .NET threading:

There is also an index of past MSDN Magazine articles on .NET concurrency.

# posted by Mitch Wheat @ 12:11 am 0 comments

Wednesday, May 02, 2007

Logging and ASP.NET Health Monitoring

In the comments to my post of a few days ago on logging with log4net, Alik Levin raised a good question:

"Why would I actually use log4net for ASP.NET 2 (VS2005) instead of the built in
health monitoring?"

I'm sure I must have seen ASP.NET 2.0 Health Monitoring before but it had completely slipped my mind! I followed the link Alik supplied and it does look interesting. Now, I obviously have not used it in a live application, so I'm shooting from the hip, but to answer Alik's question, the only reasons that spring to mind are to have a coherent logging stragey in place across all application types, and perhaps the number of different 'appenders' that are available. Also, the API seems a bit heavy if I have to instantiate a class for every event that is raised. That said, if you are starting out on a new project it's definitely worth evaluating.

# posted by Mitch Wheat @ 8:06 pm 0 comments

Tuesday, May 01, 2007

WinDBG Update Released

An update of the Windows debugger, WinDBG, has been released (6.7.5.0). You can download here.

# posted by Mitch Wheat @ 9:46 pm 0 comments

Contact

MSN, Email: mitch døt wheat at gmail.com

Mitch Wheat - Treat the cause, not the symptoms!