SQL Server, Analytics, .Net, Machine Learning, R, Python
Mitch Wheat has been working as a professional programmer since 1984, graduating with a honours degree in Mathematics from Warwick University, UK in 1986. He moved to Perth in 1995, having worked in software houses in London and Rotterdam. He has worked in the areas of mining, electronics, research, defence, financial, GIS, telecommunications, engineering, and information management. Mitch has worked mainly with Microsoft technologies (since Windows version 3.0) but has also used UNIX. He holds the following Microsoft certifications: MCPD (Web and Windows) using C# and SQL Server MCITP (Admin and Developer). His preferred development environment is C#, .Net Framework and SQL Server. Mitch has worked as an independent consultant for the last 10 years, and is currently involved with helping teams improve their Software Development Life Cycle. His areas of special interest lie in performance tuning
Tuesday, August 02, 2011
SQL Server: Is it OK to use a uniqueidentifier (GUID) as a Primary Key?
I was recently discussing this topic with a colleague and less than a day later a similar was question was asked on StackOverflow. Then a few days later I was asked a related question by another colleague.
Let’s start with the answer to the question “Is it OK to use a uniqueidentifier (GUID) as a Primary Key?”: Yes, a uniqueidentifier (GUID) column can be fine as a Primary Key, BUT it is not a particularly good choice for the clustered index. In many cases, you will be better off creating the clustered index on a column (or columns) that are likely be used in range searches, and create a non-clustered index on the GUID column.
OK, that’s the answer out of the way! Now for the reasons.
A common reason for using a GUID column as a primary key (over an int or bigint) is the ability to create and assign these in the middle application tier without making a round trip to the database, while still avoiding key clashes between distributed clients.
A primary key is a column (or set of columns) that uniquely identifies each row in a table (or entity). When a column (or a set of columns) is defined as a primary key, SQL Server checks that ALL of the columns that make up the PRIMARY KEY meet the requirement of a PRIMARY KEY constraint which is that they must not contain any NULLs. If the table contains data, a check is made to ensure that this existing data meets the uniqueness constraint. If there are any duplicate rows, the addition of the constraint will fail, as will the creation of the primary key.To enforce this for new rows, SQL Server builds a UNIQUE index. If you do not specify the index type when adding the constraint, SQL Server creates a UNIQUE CLUSTERED index. Which is almost certainly why people sometimes confuse the clustered index with the primary key.
It’s important, so it’s worth repeating: If you don't specify the index type when designating a Primary Key, SQL Server creates a UNIQUE CLUSTERED index by default.
What’s the significance of the clustering key?
Notice point (1): Each index row in the non-clustered index contains the non-clustered key value and a row locator. This locator points to the data row in the clustered index or heap having the key value (Nonclustered Index Structures). So a wide clustering key adds overhead to every non-clustered index. This overhead can be significant.
What are good choices for the clustering key?
A uniqueidentifier (GUID) column requires 16 bytes of storage (4 times wider than an int identity column), so the space used per row is higher (and less rows fit on a database page), and bandwidth consumed by data transfer is higher.
If all you need is a key range larger than that supported by an int (4 bytes, 231 – 1 rows) then consider using a bigint (8 bytes), which supports 263 - 1 rows (which should be sufficient for most applications) and saves 8 bytes per row compared to using a GUID.
Are there any performance considerations? Yes. In addition to the increased storage issues, a clustered index based upon a non-ordered GUID will experience fragmentation due to page splits during inserts. If you absolutely have to have a (non-ordered) GUID as a primary key with a clustered index, make sure you have a regular index rebuild as part of your database maintenance plans, and maybe even consider leaving extra space per page by decreasing the Fill Factor for the clustered index.When an index is created or rebuilt, the fill-factor value determines the percentage of space on each leaf-level page to be filled with data, reserving the remainder on each page as free space for future growth, and reducing the number of page splits. [Note: The fill-factor setting applies only when the index is created, or rebuilt.]
MSN, Email: mitch døt wheat at gmail.com