Tuesday, December 27, 2016


R: Evaluating a classifier using standard performance evaluation metrics

The Azure ML team have released a useful Custom R Evaluator script for computing standard classifier performance metrics. The module expects as input a dataset containing the actual and predicted class labels (i.e. a confusion matrix). The R code is available at GitHub.

Example output:

Actual  a  b  c
     a 27  2  5
     b  1 24  2
     c  1  5 33

                                     a         b         c
Accuracy                     0.8400000 0.8400000 0.8400000
Precision                    0.9310345 0.7741935 0.8250000
Recall                       0.7941176 0.8888889 0.8461538
F1                           0.8571429 0.8275862 0.8354430
MacroAvgPrecision            0.8434093 0.8434093 0.8434093
MacroAvgRecall               0.8430535 0.8430535 0.8430535
MacroAvgF1                   0.8400574 0.8400574 0.8400574
AvgAccuracy                  0.8933333 0.8933333 0.8933333
MicroAvgPrecision            0.8400000 0.8400000 0.8400000
MicroAvgRecall               0.8400000 0.8400000 0.8400000
MicroAvgF1                   0.8400000 0.8400000 0.8400000
MajorityClassAccuracy        0.3900000 0.3900000 0.3900000
MajorityClassPrecision       0.0000000 0.0000000 0.3900000
MajorityClassRecall          0.0000000 0.0000000 1.0000000
MajorityClassF1              0.0000000 0.0000000 0.5611511
Kappa                        0.7581986 0.7581986 0.7581986
RandomGuessAccuracy          0.3333333 0.3333333 0.3333333
RandomGuessPrecision         0.3400000 0.2700000 0.3900000
RandomGuessRecall            0.3333333 0.3333333 0.3333333
RandomGuessF1                0.3366337 0.2983425 0.3594470
RandomWeightedGuessAccuracy  0.3406000 0.3406000 0.3406000
RandomWeightedGuessPrecision 0.3400000 0.2700000 0.3900000
RandomWeightedGuessRecall    0.3400000 0.2700000 0.3900000
RandomWeightedGuessF1        0.3400000 0.2700000 0.3900000

Wednesday, December 21, 2016


Editions and Supported Features for SQL Server 2016

Just posting this link so I can find it easily: Editions and Supported Features for SQL Server 2016

SQL Server 2016 SP1 onwards now supports Data Compression in Standard Edition (every edition in fact, including Express!)

Wednesday, November 30, 2016


Storage Benchmarking with diskspd plus a LINQPad Script for Generating diskspd Batch Scripts

It is always a good idea to measure a performance baseline when commissioning (or choosing) new storage hardware or a new server, particularly for SQL Server. It is not uncommon for SAN’s to be non-optimally configured, so knowing how close the storage’s performance comes to the vendor’s advertised numbers is important. You should also benchmark when you make any hardware/configuration changes to storage.
In the past, SQLIO was one of the commonly used tools to perform I/O testing, but SQLIO has now been superceded. diskspd.exe is Microsoft’s replacement for SQLIO, with a more comprehensive set of testing features and expanded output. Like SQLIO, Diskspd is also a command line tool which means it can easily be scripted to perform reads and writes of various I/O block sizes including random and sequential access patterns to simulate different types of workloads.

Where can I download diskspd?

diskspd is stand-alone executable with no dependencies required to run it. You can download diskspd from Microsoft TechNet – Diskspd, a Robust Storage Testing Tool.
Download the executable and unzip it into an appropriate folder. Once unzipped you will see 3 subfolders with different executable targets: amd64fre (for 64-bit systems: the most common server target), x86fre (for 32-bit systems) and armfre (for ARM systems). The source code is hosted on Github here.

Analyzing I/O Performance: What Metrics should I measure?

The three main characteristics that are used to describe storage performance are (from Glenn Berry’s post: Analyzing I/O Performance for SQL Server):


Latency is the duration between issuing a request and receiving the response. The measurement begins when the operating system sends a request to the storage and ends when the storage completes the request. Reads are complete when the operating system receives the data; writes are complete when the drive signals the operating system that it has received the data.
For writes, the data may still be in a cache on the drive or disk controller, depending on your caching policy and hardware. Write-back caching is much faster than write-through caching, but it requires a battery backup for the disk controller. For SQL Server usage, you want to make sure you are using write-back caching rather than write-through caching if at all possible. You also want to make sure your hardware disk cache is actually enabled: some vendor disk management tools disable it by default.

IOPS (Input/Output Operations per Second )

The second metric is Input/Output Operations per Second (IOPS). A constant latency of 1ms means that a drive can process 1,000 IOs per second with a queue depth of 1. As more IOs are added to the queue, latency will increase. One of the key advantages of flash storage is that it can read/write to multiple NAND channels in parallel, along with the fact that there are no electro-mechanical moving parts to slow disk access down. IOPS actually equals queue depth divided by the latency, and IOPS by itself does not consider the transfer size for an individual disk transfer. You can translate IOPS to MB/sec and MB/sec to latency as long as you know the queue depth and transfer size.
The majority of storage vendors report their IOPS performance using a 4k block size, which is largely irrelevant for SQL Server workloads, since the majority of the time SQL Server reads data in 64k chunks. IOPS Are A Scam. To convert 4k block IOPS into 64k block IOPS simply divide by 16 or to convert IOPS into MB/s measurements multiply IOPS * block transfer size.


Sequential throughput is the rate that you can transfer data, typically measured in megabytes per second (MB/sec) or gigabytes per second (GB/sec). Your sequential throughput metric in MB/sec equals the IOPS times the transfer size. For example, 556 MB/sec equals 135,759 IOPS times a 4096 bytes transfer size, while 135,759 IOPS times a 8192 bytes transfer size would be 1112 MB/sec of sequential throughput. Despite its everyday importance to SQL Server, sequential disk throughput often gets short-changed in enterprise storage, both by storage vendors and by storage administrators. It is also actually fairly common to see the actual magnetic disks in a direct attached storage (DAS) enclosure or a storage area network (SAN) device be so busy that they cannot deliver their full rated sequential throughput.
Sequential throughput is critical for many common database server activities, including full database backups and restores, index creation and rebuilds, and large data warehouse-type sequential read scans (when your data does not fit into the SQL Server buffer pool).

How do I use diskspd?

Ideally, you should perform DskSpd testing when there is no other activity on the server and storage.
WARNING: You could be generating a large amount of disk IO, network traffic and/or CPU load when you run DiskSpd. If you’re in a shared environment, you might want to talk to your administrator before running such a test. This could negatively impact anyone else using other VMs in the same host, other LUNs on the same SAN or other traffic on the same network.
Ensure the user used to run diskspd has been granted the ‘Perform volume maintenance tasks’ right: run secpol.msc -> Local Policies -> User Rights Assignment -> ‘Perform volume maintenance tasks’
NOTE: You should run diskspd from an elevated command prompt (by choosing “Run as Administrator”). This will ensure file creation is fast. Otherwise, diskspd will fall back to a slower method of creating files.

diskpsd parameters

You can get a complete list of all the supported command line parameters and usage by entering the following at a command prompt:
> diskspd.exe
The most common parameters are:
Parameter Description
-d Test duration in seconds. Aim for at least 60 seconds
-W Test warm up time in seconds
-b I/O Block size (K/M/G). e.g. –b8K means an 8KB block size, -b64K means a 64KB block size: both are relevant for SQL Server
-o Number of outstanding I/Os (queue depth) per target, per worker thread
-t Worker threads per test file
-Su Disable software caching
-Sw Enable writethrough (no hardware write caching). Normally used together (-Suw) to replace deprecated -h (or equivently use -Sh)
-L Capture latency info
-r Random data access tests
-si Thread coordinated Sequential data access tests
-w Write percentage. For example, –w10 means 10% writes, 90% reads
-Z<size>[K|M|G|b] Workload test write source buffer size. Used to supply random data (entropy) for writes, which is a good idea for SQL Server testing and for testing de-duping behaviour on flash arrays
-c<size>[K|M|G|b] Create workload file(s) of specified size
diskspd.exe -Suw -L -W5 –Z1G -d60 –c440G -t8 -o4 -b8K -r -w20 E:\iotest.dat > output.txt

This will run a 60 second random I/O test using a 440GB test file located on the E: drive, with a 20% write and 80% read ratio, using an 8K block size and a 5 second warm up. It will use eight worker threads, each with four outstanding I/Os and a write entropy buffer of 1GB, and save the results to a text file named output.txt. This set of parameters is representative of a SQL Server OLTP workload.
Note: The test file size (you can have multiple test files) should be larger than the SAN’s DRAM cache (and ideally not an exact multiple of it).

LINQPad Script

To automate the creation of a bunch of testing scenarios, rather than manually editing (which is tedious and error prone), I’ve written a simple C# LINQPad script:

  const string batchScriptFilename = @"c:\temp\diskspd.bat";

  // Flags used in each run and do not vary
  string disableHardwarecaching = "-Suw";       // -Suw: Disable both hardware and software buffering. SQL Server does this.
                                                // Su = disable software caching, Sw = enable writethrough (no hardware write caching)
  string captureLatency    = "-L";              // capture disk latency numbers
  string warmWorkLoad      = "-W5";             // Warm up time in seconds
  string entropyRandomData = "-Z1G";            // Used to supply random data (K/M/G) for writes, which is good for SQL Server testing.
  string testduration      = "-d120";           // Test duration in seconds NB: At least 60 seconds, 2-3 minutes is good 
  string testFileSize      = "-c440G";          // Nothing smaller than the SAN's cache size (and not an exact multiple of it)
  string testFileFullPath  = @"E:\iotest.dat";  // Test file name (goes at the end of the command)
  string resultsFilename   = @"output.txt";     // File to output the text results

  // prefix results file name with date
  resultsFilename = DateTime.Now.Date.ToString("yyyyMMdd") + "_" + resultsFilename;
  // Lists of varying params to use
  var randomOrSequential = new List<string> { "-r", "-si" };                 // -r = Random, -si = Sequential
  var writepercentage = new List<string> { "-w0", "-w10", "-w25", "-w100" }; // -w0 means no writes: -w10 = 90%/10% reads/writes           
  var blocksize = new List<string> { "-b8K", "-b64K", "-b512K", "-b2M" };    // 2M represents SQL Server read ahead, 512K backups
  var overlappedIOs = new List<string> { "-o2", "-o4", "-o8", "-o16"};       // This is queue depth
  var workerthreads = new List<string> { "-t4", "-t8", "-t16", "-t32" };     // Worker threads

  int runTimeSeconds = randomOrSequential.Count() * writepercentage.Count() * blocksize.Count() * 
                       overlappedIOs.Count() * workerthreads.Count() * 
                       (Int32.Parse(testduration.Substring(2)) + Int32.Parse(warmWorkLoad.Substring(2)));

  using (StreamWriter fs = new StreamWriter(batchScriptFilename))
      fs.WriteLine("REM Expected run time: {0} Minutes == {1:0.0} Hours", runTimeSeconds / 60, runTimeSeconds / 3600.0);

      string cmd = string.Format("diskspd.exe {0} {1} {2} {3} {4} {5} ",
                                 disableHardwarecaching, captureLatency, warmWorkLoad,
                                 entropyRandomData, testduration, testFileSize);
      // Yes, LINQ could be used!
      for (int i1 = 0; i1 < writepercentage.Count(); i1++)
          for (int i2 = 0; i2 < randomOrSequential.Count(); i2++)
              for (int i3 = 0; i3 < blocksize.Count(); i3++)
                  for (int i4 = 0; i4 < overlappedIOs.Count(); i4++)
                      for (int i5 = 0; i5 < workerthreads.Count(); i5++)
                          fs.WriteLine(string.Format("{0} {1} {2} {3} {4} {5} {6} >> {7}",


Short Test Batch Script

A batch script to perform an initial (relatively) quick test would look something like the following:
REM Expected run time: 98 Minutes == 1.6 Hours
diskspd.exe -Suw -L -W3 -Z1G -d120 -c440G  -t4 -o2 -b8K -r -w0 E:\iotest.dat >> output.txt
diskspd.exe -Suw -L -W3 -Z1G -d120 -c440G  -t8 -o2 -b8K -r -w0 E:\iotest.dat >> output.txt
diskspd.exe -Suw -L -W3 -Z1G -d120 -c440G  -t16 -o2 -b8K -r -w0 E:\iotest.dat >> output.txt
diskspd.exe -Suw -L -W3 -Z1G -d120 -c440G  -t4 -o4 -b8K -r -w0 E:\iotest.dat >> output.txt
diskspd.exe -Suw -L -W3 -Z1G -d120 -c440G  -t8 -o4 -b8K -r -w0 E:\iotest.dat >> output.txt
diskspd.exe -Suw -L -W3 -Z1G -d120 -c440G  -t16 -o4 -b8K -r -w0 E:\iotest.dat >> output.txt
diskspd.exe -Suw -L -W3 -Z1G -d120 -c440G  -t4 -o8 -b8K -r -w0 E:\iotest.dat >> output.txt
diskspd.exe -Suw -L -W3 -Z1G -d120 -c440G  -t8 -o8 -b8K -r -w0 E:\iotest.dat >> output.txt
diskspd.exe -Suw -L -W3 -Z1G -d120 -c440G  -t16 -o8 -b8K -r -w0 E:\iotest.dat >> output.txt
diskspd.exe -Suw -L -W3 -Z1G -d120 -c440G  -t4 -o16 -b8K -r -w0 E:\iotest.dat >> output.txt
diskspd.exe -Suw -L -W3 -Z1G -d120 -c440G  -t8 -o16 -b8K -r -w0 E:\iotest.dat >> output.txt
diskspd.exe -Suw -L -W3 -Z1G -d120 -c440G  -t16 -o16 -b8K -r -w0 E:\iotest.dat >> output.txt

diskspd.exe -Suw -L -W3 -Z1G -d120 -c440G  -t4 -o2 -b64K -r -w0 E:\iotest.dat >> output.txt
diskspd.exe -Suw -L -W3 -Z1G -d120 -c440G  -t8 -o2 -b64K -r -w0 E:\iotest.dat >> output.txt
diskspd.exe -Suw -L -W3 -Z1G -d120 -c440G  -t16 -o2 -b64K -r -w0 E:\iotest.dat >> output.txt
diskspd.exe -Suw -L -W3 -Z1G -d120 -c440G  -t4 -o4 -b64K -r -w0 E:\iotest.dat >> output.txt
diskspd.exe -Suw -L -W3 -Z1G -d120 -c440G  -t8 -o4 -b64K -r -w0 E:\iotest.dat >> output.txt
diskspd.exe -Suw -L -W3 -Z1G -d120 -c440G  -t16 -o4 -b64K -r -w0 E:\iotest.dat >> output.txt
diskspd.exe -Suw -L -W3 -Z1G -d120 -c440G  -t4 -o8 -b64K -r -w0 E:\iotest.dat >> output.txt
diskspd.exe -Suw -L -W3 -Z1G -d120 -c440G  -t8 -o8 -b64K -r -w0 E:\iotest.dat >> output.txt
diskspd.exe -Suw -L -W3 -Z1G -d120 -c440G  -t16 -o8 -b64K -r -w0 E:\iotest.dat >> output.txt
diskspd.exe -Suw -L -W3 -Z1G -d120 -c440G  -t4 -o16 -b64K -r -w0 E:\iotest.dat >> output.txt
diskspd.exe -Suw -L -W3 -Z1G -d120 -c440G  -t8 -o16 -b64K -r -w0 E:\iotest.dat >> output.txt
diskspd.exe -Suw -L -W3 -Z1G -d120 -c440G  -t16 -o16 -b64K -r -w0 E:\iotest.dat >> output.txt

diskspd.exe -Suw -L -W3 -Z1G -d120 -c440G  -t4 -o2 -b8K -r -w20 E:\iotest.dat >> output.txt
diskspd.exe -Suw -L -W3 -Z1G -d120 -c440G  -t8 -o2 -b8K -r -w20 E:\iotest.dat >> output.txt
diskspd.exe -Suw -L -W3 -Z1G -d120 -c440G  -t16 -o2 -b8K -r -w20 E:\iotest.dat >> output.txt
diskspd.exe -Suw -L -W3 -Z1G -d120 -c440G  -t4 -o4 -b8K -r -w20 E:\iotest.dat >> output.txt
diskspd.exe -Suw -L -W3 -Z1G -d120 -c440G  -t8 -o4 -b8K -r -w20 E:\iotest.dat >> output.txt
diskspd.exe -Suw -L -W3 -Z1G -d120 -c440G  -t16 -o4 -b8K -r -w20 E:\iotest.dat >> output.txt
diskspd.exe -Suw -L -W3 -Z1G -d120 -c440G  -t4 -o8 -b8K -r -w20 E:\iotest.dat >> output.txt
diskspd.exe -Suw -L -W3 -Z1G -d120 -c440G  -t8 -o8 -b8K -r -w20 E:\iotest.dat >> output.txt
diskspd.exe -Suw -L -W3 -Z1G -d120 -c440G  -t16 -o8 -b8K -r -w20 E:\iotest.dat >> output.txt
diskspd.exe -Suw -L -W3 -Z1G -d120 -c440G  -t4 -o16 -b8K -r -w20 E:\iotest.dat >> output.txt
diskspd.exe -Suw -L -W3 -Z1G -d120 -c440G  -t8 -o16 -b8K -r -w20 E:\iotest.dat >> output.txt
diskspd.exe -Suw -L -W3 -Z1G -d120 -c440G  -t16 -o16 -b8K -r -w20 E:\iotest.dat >> output.txt

diskspd.exe -Suw -L -W3 -Z1G -d120 -c440G  -t4 -o2 -b64K -r -w20 E:\iotest.dat >> output.txt
diskspd.exe -Suw -L -W3 -Z1G -d120 -c440G  -t8 -o2 -b64K -r -w20 E:\iotest.dat >> output.txt
diskspd.exe -Suw -L -W3 -Z1G -d120 -c440G  -t16 -o2 -b64K -r -w20 E:\iotest.dat >> output.txt
diskspd.exe -Suw -L -W3 -Z1G -d120 -c440G  -t4 -o4 -b64K -r -w20 E:\iotest.dat >> output.txt
diskspd.exe -Suw -L -W3 -Z1G -d120 -c440G  -t8 -o4 -b64K -r -w20 E:\iotest.dat >> output.txt
diskspd.exe -Suw -L -W3 -Z1G -d120 -c440G  -t16 -o4 -b64K -r -w20 E:\iotest.dat >> output.txt
diskspd.exe -Suw -L -W3 -Z1G -d120 -c440G  -t4 -o8 -b64K -r -w20 E:\iotest.dat >> output.txt
diskspd.exe -Suw -L -W3 -Z1G -d120 -c440G  -t8 -o8 -b64K -r -w20 E:\iotest.dat >> output.txt
diskspd.exe -Suw -L -W3 -Z1G -d120 -c440G  -t16 -o8 -b64K -r -w20 E:\iotest.dat >> output.txt
diskspd.exe -Suw -L -W3 -Z1G -d120 -c440G  -t4 -o16 -b64K -r -w20 E:\iotest.dat >> output.txt
diskspd.exe -Suw -L -W3 -Z1G -d120 -c440G  -t8 -o16 -b64K -r -w20 E:\iotest.dat >> output.txt
diskspd.exe -Suw -L -W3 -Z1G -d120 -c440G  -t16 -o16 -b64K -r -w20 E:\iotest.dat >> output.txt


Interpreting the diskspd results

diskspd produces quite a bit of output per run. The first section is a recap of the parameters that were used in the command line:
Command Line: diskspd.exe -Suw -L -W5 -Z1G -d120 -c440G -t16 -o4 -b64K -r -w10 E:\iotest.dat

Input parameters:

    timespan:   1
    duration: 120s
    warm up time: 5s
    cool down time: 0s
    measuring latency
    random seed: 0
    path: 'E:\iotest.dat'
        think time: 0ms
        burst size: 0
        software cache disabled
        hardware write cache disabled, writethrough on
        write buffer size: 1073741824
        performing mix test (read/write ratio: 90/10)
        block size: 65536
        using random I/O (alignment: 65536)
        number of outstanding I/O operations: 4
        thread stride size: 0
        threads per file: 16
        using I/O Completion Ports
        IO priority: normal

This is a great improvement over sqlio which did not echo the run parameters or provide a readable summary of the parameters, making it hard to decipher runs at a later date.
Next is a summary of CPU information. This information can help determine if your storage test is CPU bottlenecked:
actual test time:   120.00s
thread count:       16
proc count:     32

CPU |  Usage |  User  |  Kernel |  Idle
   0|  10.21%|   1.09%|    9.11%|  89.79%
   1|  10.31%|   1.09%|    9.22%|  89.69%
   2|  10.14%|   1.08%|    9.06%|  89.86%
   3|  18.26%|   0.94%|   17.32%|  81.74%
   4|   7.86%|   1.12%|    6.74%|  92.14%
   5|   7.79%|   0.91%|    6.87%|  92.21%
   6|   7.55%|   1.15%|    6.41%|  92.45%
   7|   7.71%|   1.13%|    6.58%|  92.29%
   8|   0.00%|   0.00%|    0.00%|   0.00%
avg.|   2.49%|   0.27%|    2.23%|  22.51%

The results for each thread should be very similar in most cases.
After the CPU summary is the I/O summary, split into total (read + write), followed by separate read and write statistics:
Total IO
thread |       bytes     |     I/Os     |     MB/s   |  I/O per s |  AvgLat  | LatStdDev |  file
     0 |     10107486208 |       154228 |      80.33 |    1285.23 |    3.109 |     3.640 | E:\iotest.dat (440GB)
     1 |     10038870016 |       153181 |      79.78 |    1276.50 |    3.130 |     4.082 | E:\iotest.dat (440GB)
     2 |     10062594048 |       153543 |      79.97 |    1279.52 |    3.123 |     4.048 | E:\iotest.dat (440GB)
     3 |     10012590080 |       152780 |      79.57 |    1273.16 |    3.138 |     3.954 | E:\iotest.dat (440GB)
     4 |     10169417728 |       155173 |      80.82 |    1293.10 |    3.090 |     3.909 | E:\iotest.dat (440GB)
     5 |     10148446208 |       154853 |      80.65 |    1290.44 |    3.096 |     4.159 | E:\iotest.dat (440GB)
     6 |     10158669824 |       155009 |      80.73 |    1291.74 |    3.093 |     4.024 | E:\iotest.dat (440GB)
     7 |     10205724672 |       155727 |      81.11 |    1297.72 |    3.079 |     3.901 | E:\iotest.dat (440GB)
     8 |     10096607232 |       154062 |      80.24 |    1283.85 |    3.112 |     3.896 | E:\iotest.dat (440GB)
     9 |     10057023488 |       153458 |      79.93 |    1278.81 |    3.124 |     4.187 | E:\iotest.dat (440GB)
    10 |     10092347392 |       153997 |      80.21 |    1283.30 |    3.113 |     3.951 | E:\iotest.dat (440GB)
    11 |      9996730368 |       152538 |      79.45 |    1271.15 |    3.143 |     3.894 | E:\iotest.dat (440GB)
    12 |     10157883392 |       154997 |      80.73 |    1291.64 |    3.093 |     4.040 | E:\iotest.dat (440GB)
    13 |     10157424640 |       154990 |      80.72 |    1291.58 |    3.093 |     3.934 | E:\iotest.dat (440GB)
    14 |     10177937408 |       155303 |      80.89 |    1294.19 |    3.087 |     3.978 | E:\iotest.dat (440GB)
    15 |     10223681536 |       156001 |      81.25 |    1300.00 |    3.073 |     3.642 | E:\iotest.dat (440GB)
total:      161863434240 |      2469840 |    1286.37 |   20581.94 |    3.106 |     3.955

Remember: The I/Os are recorded in whatever blocksize the test specified. In the case above, the I/Os are 64K I/Os.
Last, but not least are the latency measurements:
  %-ile |  Read (ms) | Write (ms) | Total (ms)
    min |      0.535 |      0.729 |      0.535
   25th |      2.531 |      3.446 |      2.565
   50th |      2.796 |      3.792 |      2.849
   75th |      3.088 |      4.227 |      3.211
   90th |      3.439 |      4.743 |      3.745
   95th |      3.763 |      5.179 |      4.169
   99th |      4.818 |      6.761 |      5.274
3-nines |     38.694 |     42.926 |     39.374
4-nines |    207.585 |    209.483 |    207.734
5-nines |    208.562 |    210.939 |    209.483
6-nines |    209.058 |    211.330 |    210.939
7-nines |    209.256 |    211.330 |    211.330
8-nines |    209.256 |    211.330 |    211.330
9-nines |    209.256 |    211.330 |    211.330
    max |    209.256 |    211.330 |    211.330

This last section shows the latency percentile distribution of the test results from the minimum to the maximum value in milliseconds, split into reads, writes and total latency. It’s essential to know how the storage will perform and respond under load, so this section should examined carefully. The “n-nines” in the ‘%-ile’ column refers to the number of nines, where 3-nines means 99.9%, 4-nines means 99.99% etc. If you want to accurately measure the higher percentiles, you should run longer duration tests that generate a larger number of I/O operations.
What you want to look for in the latency results is the point at which the values make a large jump. In this test, 99% of the reads had a latency of 4.818 milliseconds or less, but if we go higher, 99.9% of the reads had a latency of 38.694 milliseconds or less.

Thursday, June 16, 2016


SQL Server: Create a linked server from an on-premise SQL Server* to SQL Azure

* SQL Server 2008 onwards

The use of SQL Server’s sp_addlinkedserver can be a little arcane. I recently needed to connect from an on-premise SQL Server to SQL Azure for the purpose of querying and downloading data to a reporting server, and there a couple of things to note.

One is that you should ensure that data is encrypted on the wire (* and also when connecting to remote servers from SSMS).


The other is that you have to may specify the server name as a DNS name (depending on the client connection library you are using). This is the case if you receive this error message:

Server name cannot be determined.  It must appear as the first segment of the server's dns name (servername.database.windows.net).  Some libraries do not send the server name, in which case the server name must be included as part of the user name (username@servername).  In addition, if both formats are used, the server names must match. “

Here’s the working TSQL script I used:


-- If linked server already exists, drop it and any associated logins.

begin try

    exec sp_dropserver 'LocalLinkedServername', 'droplogins'

end try

begin catch

end catch


-- Create the linked server: 

EXEC sp_addlinkedserver

    @server     = 'LocalLinkedServername',

    @srvproduct = N'Any',

    @provider   = 'SQLNCLI',

    @datasrc    = '??????????.database.windows.net', -- Azure server name

    @location   = '',

    @provstr    = N'Encrypt=yes;',       -- * Important!

    @catalog    = 'RemoteDatabaseName';  -- remote(Azure) database name



-- Create the login credentials for the linked server 

EXEC sp_addlinkedsrvlogin

    @rmtsrvname  = 'LocalLinkedServername',

    @useself     = 'false',

    @rmtuser     = 'remotesqlusername@??????????.database.windows.net',

    @rmtpassword = 'remote password';




EXEC sp_serveroption 'LocalLinkedServername', 'rpc out', 'true';



-- Don’t elevate to distributed transactions

EXEC sp_serveroption 'LocalLinkedServername', 'remote proc transaction promotion', 'false';



---- Finally, check you can access remote data via your linked server:

select top 100 *

from [LocalLinkedServerName].[RemoteDatabaseName].[RemoteSchemaName].[RemoteTableName];





Another issue you might run into is if you have SQL Database auditing turned on in Azure, Azure wants all connections to be from a secure connection string, and if you run the above script from certain versions of SSMS (I believe SQL Server 2012 and below but not verified) then you might get an error saying only connections with secure connection string are allowed (despite a secure connection being specified). The fix is easy, change




Keep in mind that prior to SQL Server 2012 SP1, remote server statistics won’t be able to be used to determine query plans unless admin credentials are used (not a good idea!). Prior to SQL Server 2012 SP1 there is a workaround shown here: SQL Server Linked Servers and Remote Statistics




Thursday, June 02, 2016


SQL Server and SQL Azure: Clear Plan Cache

For on-premise SQL Servers you can run

dbcc freeproccache

which clears the entire server cache (provided you have the ALTER SERVER STATE permission). Many online resources and SQL Server Books Online give scary warnings about running this, but running DBCC FREEPROCCACHE will cause very few problems, even on a busy OLTP system. It will cause a small CPU spike for a few seconds as query plans get recompiled. It can be a useful tool when base-lining expensive queries or stored procedures.

If that’s not selective enough, you can free the cached plans for a single database using an undocumented DBCC command, FLUSHPROCINDB:

-- Flush all plans from the plan cache for a single database  
declare @dbid int;
select @dbid = dbid from master.dbo.sysdatabases where name = 'MyDatabaseName';

dbcc flushprocindb(@dbid);

If you want to remove a single plan from the cache:

-- Get the plan handle (varbinary(64)) for a cached query plan
    sys.dm_exec_cached_plans cp 
    cross apply sys.dm_exec_sql_text(plan_handle) st
    text LIKE N'%GetJournal%';

-- Remove a specific plan from the cache using its plan handle
dbcc freeproccache (0x060050000C267C1030CE4EC70300000001000000000000000000000000000000000000000000000000000000);

DBCC FREEPROCCACHE is not supported in SQL Azure as that wouldn’t be practical in a multi-tenanted database environment. SQL Azure (and SQL Server 2016) has introduced a new mechanism for clearing the query plans for a single database:



This statement enables the configuration of a number of database configuration settings at the individual database level, independent of these settings for any other database. This statement is available in both SQL Database V12 [SQL Azure] and in SQL Server 2016. These options are:

  • Clear procedure cache.

  • Set the MAXDOP parameter to an arbitrary value (1,2, ...) for the primary database based on what works best for that particular database and set a different value (e.g. 0) for all secondary database used (e.g. for reporting queries).

  • Set the query optimizer cardinality estimation model independent of the database to compatibility level.

  • Enable or disable parameter sniffing at the database level.

  • Enable or disable query optimization hotfixes at the database level.

Database Scoped Configuration

Labels: ,

Tuesday, March 04, 2014


SSAS: The Measures cube either does not exist or has not been processed

If you are attempting to deploy your existing SQL Server Analysis Services cube with calculation changes and get the following (not very helpful!) error:

The Measures cube either does not exist or has not been processed

then you probably forgot a CURRENTCUBE prefix on one or more of your MDX Create Member statements:


Sunday, December 01, 2013


SQL Server Agent Job Visualisation: SQLjobvis

SQL Server must be older than 20 years and yet it still has no agent job schedule visualiser, despite the fact it has been asked for many times. How can that be? Go figure…

I needed one at work recently to identify schedule clashes, so I thought “I wonder how hard it would be to write something fairly basic”. Then programmer laziness swiftly kicked in and the next thought was “If this problem is a pain for so many, then someone must have written something already…?"

And of course, they have: SQLjobvis

It’s basic, simple, works great, and it’s free!

It instantly identifies:

  • Failures
  • Long-running jobs
  • Multiple jobs running simultaneously


  • Monday, May 13, 2013


    Office 2013 Rant: Not wearing any Clothes!

    Having not been able to automatically upgrade to the 64bit version of Office (having the 32 bit version of 2010 installed), I installed the 32bit version of Office 2013, so I could try out a few of Excel’s new features (Yeah, I know I should have VM’ed, but I trust those programmers at Microsoft…)

    Unfortunately, just like VS2012, Microsoft have broken just about every well established and tried and tested UI design rule: and not for the better. Office 2013 looks as if the folks at Microsoft have confused the concepts of reducing too much ‘Chrome’ with proper visual ‘Demarcation’. Having everything blend into everything else is not a good UI, by any measure. I’m gobsmacked that they could have produced something so poor. It’s simply, bloody awful! Microsoft: The Emperor isn’t wearing any clothes!!

    I don’t believe this design ‘strategy’ was done to benefit users. It was done so that Microsoft can seamlessly (supposedly) integrate their mobile and desktop markets. This is not a strategy –>  “We need to shift our emphasis to the mobile market (we sure as hell aren’t selling many versions of Windows 8 for the desktop), lets change our UI to ‘fit’ tablets. Bugger the existing desktop users…”. I’m sorry but that’s a Fail.

    BTW, looks like some Microsoft Teams didn’t get the memo about using All CAPS!! (Excel 2013 [Vanilla] Screenshot)


    Was making Windows 8 share a similar interface to the phone and tablets done to benefit end users? Seems unlikely. Windows 8 is shaping up to be another Vista (Microsoft have been very shy the last 2 quarters and not released the sales figures for Windows 8). Looking forward to having the Start button back? Windows codenamed ‘Blue’ is coming soon…Hurrah!

    Sad thing is, I really want to like Microsoft products. SQL Server is by far the best software I’ve ever used. It is awesome.

    So having said all that, I thought I’d give 2013 ago to see if I could subdue my bias and get used to the rather poor UI. First email I try to send, Outlook 2013 crashes. Restart. Try again, it crashes. I’ve used Outlook 2010 every day for over 2 years and haven’t had a single crash.

    Uninstalled Office 2013. HELLO Office 2010 !

    Tuesday, May 07, 2013


    Office 2013 64 bit Setup

    I’ve used every version of Microsoft Office pretty much since it came out. I currently have the 32 bit version of Office Pro 2010 installed. Attempting to install the 64 bit version of Office 2013 to evaluate Excel’s Power View results in:


    Wouldn’t it be nice if it had the option to do this for me and migrate all my settings?!? Especially since I don’t recall there being a 64 bit version of Office 2010 on release (or was it just that it had issues with drivers and add-ins?).

    Friday, April 12, 2013


    SQL Diagnostic Runner Updated

    Several users reported a bug running SQLDiagCmd; I’d introduced a silly error in the command line options. [SQLDiagUI was unaffected]

    I’ve uploaded an updated version (v1.0.5.13101) which you can download from the links below (or from any of the previous posts):



    Sunday, March 10, 2013


    SQL Fiddle

    No, it’s not Uncle Ernie!

    SQL Fiddle is a very useful online tool for evaluating TSQL against a plethora of database engines (like jsFiddle does for javascript / jQuery), written by Jake Feasel

    I found JS Fiddle to be a great tool for answering javascript / jQuery questions, but I also found that there was nothing available that offered similar functionality for the SQL questions. So, that was my inspiration to build this site. Basically, I built this site as a tool for developers like me to be more effective in assisting other developers.

    It currently supports these database engines (in various versions): MySQL, PostgreSQL, MS SQL Server, Oracle, and SQLite

    Wednesday, March 06, 2013


    .NET Regex: Character Class Subtraction

    You’ve heard of a positive match character group [] and a negative match character group [^]. But did you know there is also a Character Class Subtraction? I didn’t. It’s supported in .NET but not in the majority of RegEx flavours.

    A character class subtraction expression has the following form:

              [base_group - [excluded_group]]

    The square brackets ([]) and hyphen (-) are mandatory. The base_group is a positive or negative character group as described in the Character Class Syntax table. The excluded_group component is another positive or negative character group, or another character class subtraction expression (that is, you can nest character class subtraction expressions).

    For example, suppose you have a base group that consists of the character range from "a" through "z". To define the set of characters that consists of the base group except for the character "m", use [a-z-[m]]. To define the set of characters that consists of the base group except for the set of characters "d", "j", and "p", use [a-z-[djp]]. To define the set of characters that consists of the base group except for the character range from "m" through "p", use [a-z-[m-p]].

    Using this format, the pattern


    can be used in .NET to match all alphanumeric characters (any word character) excluding the letter v and numbers 123. 

    The MSDN page for the .NET Regex definitions doesn’t seem to appear high in the search indexes, so bookmarking here for my future reference: Character Classes in Regular Expressions

    This is useful for comparing Regex capabilities in different languages: regex flavor comparison chart

    Tuesday, February 26, 2013


    SQL Diagnostic Runner Updated

    Thanks to Todd who reported a bug when connecting with username and password (I messed up the connection string).

    I’ve uploaded an updated version (v1.0.4.13057) which you can download from the links below (or from any of the previous posts):



    [Servername will now take a semi-colon separated list of servers to run against, but with the limitation of using the same credentials and diagnostic script.]

    Sunday, February 24, 2013


    Parsing Command Line Arguments

    If you want a full blown Command Line Parser then there are several good options available:

    [I used the Command Line Parser Library recently in the SQL Diagnostic Runner I wrote.]

    If you just want a very basic parser, supporting simple options in the format  /argname:argvalue  then you could use this:

    /// <summary>
    /// Very basic Command Line Args extracter
    /// <para>Parse command line args for args in the following format:</para>
    /// <para> /argname:argvalue /argname:argvalue ...</para>
    /// </summary>
    public class CommandLineArgs
    private const string Pattern = @"\/(?<argname>\w+):(?<argvalue>.+)";
    private readonly Regex _regex = new Regex(Pattern, RegexOptions.IgnoreCase|RegexOptions.Compiled);
    private readonly Dictionary<String, String> _args = new Dictionary<String, String>();

    public CommandLineArgs()

    public string this[string key]
    get { return _args.ContainsKey(key) ? _args[key] : null; }

    public bool ContainsKey(string key)
    return _args.ContainsKey(key);

    private void BuildArgDictionary()
    var args = Environment.GetCommandLineArgs();
    foreach (var match in args.Select(arg => _regex.Match(arg)).Where(m => m.Success))
    _args.Add(match.Groups["argname"].Value, match.Groups["argvalue"].Value);
    // Ignore any duplicate args
    catch (Exception) {}


    Powered by Blogger