How often are SQL Server database index rebuilds required on XtremIO?

A question came up recently from Shridhar Deuskar (@ioguru on twitter) one of my colleagues who works in XtremIO engineering. We were discussing best practice for SQL Server on an XtremIO array and we got on to talking about if SQL Server index rebuilds arenecessary on XtremIO. My first thought was yes they are still necessary, but there will be fewer use cases for index rebuilds than on traditional spinning disk arrays; so I decided to test the hypothesis. Using our lab environment I devised a number of tests introducing fragmentation to database indexes and measuring the impact on performance and array capacity. Based on the testing scenarios used even at high levels of fragmentation there was little impact on performance (1%~2.7%) and disk capacity impact was negated with XtremIO inline deduplication and compression. In this blog post I share the results and explain the following:


  • Why index rebuilds are required for SQL Server
  • How to determine when they are required
  • What is different about XtremIO
  • Finally the technical details of how the tests in the lab were carried out.



Why perform index rebuilds in SQL Server?

A good way to think about index rebuilds is to think about getting your car serviced. When a car has been driven for a while it requires servicing and during this time the driver is inconvenienced. Servicing has advantages. The car will last longer, depreciate less and run smoothly. But conversely if a car is serviced too often you could just be wasting money and unnecessarily inconveniencing the driver. Scale this up to a global fleet of cars run by an enterprise the figures become alarming. Databases are similar, once they have been in use for a while on traditional arrays performance can decrease and more capacity may be required than strictly necessary. This is because data activity can cause fragmentation within the database. Specifically in SQL Server there are two types of fragmentation:


  • Internal fragmentation; wasted space within database pages resulting in increased capacity
  • Logical fragmentation; the pages are physically stored out of order to a tables index which impacts performance


Index rebuilds service the database by compacting wasted space and sorting the data pages, matching the physical order to the logical order. During the process, users are inconvenienced as the operation competes for resources such as CPU cycles,memory and storage limiting the performance to the database. It is important to make intelligent decisions about when to run index rebuilds otherwise you may be wasting company resources, time and inconveniencing users



Why SQL Server on EMC XtremIO is different

  • Capacity – XtremIO inline deduplication and compression works on database pages before they are written to disk,helping to mitigate the wasted space caused by internal fragmentation at the disk level.

1 (2)


Testing SQL Server on XtremIO, a data load of 54GB was inserted to the target database generating ~17,000 page splits per SQL Server batch request. This resulted in internal fragmentation averaging 32% wasted space per data page as well as additional writes to the transaction log.  XtremIO inline compression and deduplication negated this by achieving a42% disk space saving over the logical space allocated.


*A page split occurs when a new row needs to be added to a data page that is full. SQL Server moves around half of the existing page rows to a new page to free up space


  • Performance – For testing on XtremIO with SQL Server, even at high levels of logical fragmentation, query response time for both DSS and OLTP workloads show similar results to the same query after removing index fragmentation with an index rebuild.


Before Duration (Seconds or TPS)

After Duration (Seconds or TPS)

% Performance Difference

% (MAX) Fragmentation

Query Test 1 (Run a DSS query before and after a shrinkfile operation)





Query Test 2 (Data load with high page split ratio. Run DSS query before and after index rebuild)





Query Test 3 Complex Query (data load with high page split ratio. Run complex  DSS query before and after index rebuild)





OLTP Workload Test  (ran before and after a shrinkfile operation)

18163 (TPS)




On EMC XtremIO how should we approach index maintenance?

Microsoft has documented an approach based on the level of fragmentation the details of which you can read here: Reorganize and Rebuild Indexes and Microsoft SQL Server 2000 Index Defragmentation Best Practices

A more applicable test for XtremIO is to take a backup of your database while it is fragmented, then use it as a baseline in a test environment to determine if the current index maintenance strategy for your application workload is improving performance and increasing capacity on XtremIO.

Depending on business requirement’s testing should incorporate a number of iterations to determine the best maintenance plan for example:


Performance Impact Capacity Impact
Baseline – fragmented indexes (record level of fragmentation)

Index Rebuilds

Index Reorganise

Statistics Update Only


Just keep in mind that statistics maintenance is an important consideration and will have to be considered in scenarios where index rebuilds are no longer required or where index reorganise is implemented.


How was testing carried out?

Using an Industry standard DSS benchmark database; data loading the main table using a random cluster key. We ran queries using these conditions:

  • Index order
  • Allocation order
  • Rebuild index to remove fragmentation.


Taking an industry standard OLTP benchmark database and running workloads before and after shrinking the SQL Server data files.



  • For each test we captured data from XtremIO, perfmon, ETW and extended events.
  • The databases ranged in size from 250GB to 1TB.
  • The data loads ranged in size from 4GB – 75GB
  • The disk space savings were measured using XtremIO show-clusters command
  • For each query the SQL Server transaction log and the buffer pool were cleared prior to running the test using the command CHECKPOINT and DBCC DROPCLEANBUFFERS
  • Tested on:
  • SQL Server 2014 Enterprise Edition CU4
  • XtremIO Version: 3.0.0 build 44



Using SQL Server and Windows event tracing a repeatable pattern observed during testing was the workload IO characteristics

  • With fragmented indexes SQL Server generated large numbers of small I/O requests
  • With defragmented indexes SQL Server generated small numbers of large sized I/O.

2 (1)Thanks to sub millisecond latency XtremIO performed equally well servicing both I/O types





Conclusion: Testing showed that SQL server on an XtremIO array may require less index rebuilds increasing the availability of the array for the end user. Even for heavily fragmented indexes testing demonstrated XtremIO’s unique capabilities of sub-millisecond latency combined with inline compression and deduplication resulted in none of the performance or capacity penalties which are normally associated with fragmentation in SQL Server databases. Before implementing an expensive index maintenance plan on an XtremIO array it is worth taking the time to understand if or when the criteria for your workload requires it.

An Introduction to SQL Server I/O Performance Monitoring

Optimal SQL Server performance relies on the I/O subsystem. For example before a transaction can commit the I/O subsystem has to send an acknowledgment that  all the associated  log records have been flushed to disk. Similarly to service transactions SQL Server  will  read pages from disk into memory and then at a later stage write them back out gain to meet recovery intervals.



When working with SQL Server the primarily measurements I use for I/O are:


  • IOPS – The number of disk Input/Output operations per second
  • Latency – The time taken to complete an I/O request
  • Queue Length – The number of outstanding I/O requests
  • IO Size – The size of each I/O request
  • Bandwidth – The rate of bytes sent and retrieved from disk


The tools we use to measure I/O dictate the level of granularity and accuracy we bring to performance analysis.

There is a trade-off in the time it takes to  analyse the performance data and we also have to take into account anyoverhead that may be experienced when running collection routines. The key is to understand from experience what we need to collect for any given scenario and only go to this level. Having said that  the primary tools I use when working with SQL Server at the OS level are:


  • Perfmon
  • SQL Server DMVs
  • Extended Events / ETW

1 (1)



In the figure above taking an example of a  simple SQL Server file layout (2 data and 1 log file) we can see how each tool can be used to collect an ever more granular level of information.


Using Perfmon  we can collect data at the volume level, it is easy to configure and it has built-in analysis. This may be sufficient for a lot of scenarios especially when there is a 1-1 relationship between the SQL Server files and the disk volumes.

But what about the scenario where we have multiple files on the same volume?

What about when we want to look at the response times for a particular operation in SQL Server and the IO size associated with it?


In the table below I have taken the primary I/O measurements & tools described the level of granularity and plotted the various counters, dmv’s and traces associated with them.




In this post we looked at the measurements I use to monitor SQL Server I/O performance. I explained how we can use various tools to collect performance metrics. Understanding what  we need to collect and then using the appropriate tools to collect this information is key to successful performance analysis


SQL Server Indirect Checkpoint feature an initial look

The indirect checkpoint feature introduced in SQL Server 2012 can deliver more reliable and faster recovery times. It can also reduce the I/O spikes often observed with automatic checkpoints. However for OLTP workloads  indirect checkpoints can effect system performance as they sometimes increase the amount of write I/O, in this blog post we are going to explore this  in more detail.


Understanding business requirements and how they translate into technical decision making is a key part of any IT professionals job. In this example we have an OLTP workload part of which contains a hot table that is frequently updated by the business. The table is small and easily fits in buffer pool  memory. Since the tables data pages are frequently accessed they are less lightly to get evicted by the Lazywriter process however they will still need to be flushed to disk as part of a checkpoint operation.


In a test environment to monitor how often these pages are getting flushed to disk I am going to setup an Extended Events Trace that tracks physical writes and stores then in a histogram. I applied a filter to the trace definition to only track data pages from the hot table in our database.




The Test

To compare automatic and indirect checkpoints I completed 2 test runs of 10 minutes each. In the first run I used automatic checkpoints and in the second run I switched to indirect checkpoints setting the target recovery time for the database to 60 seconds. I took the data from the 2 histograms and analysed it in Excel.



With Automatic Checkpoint data pages are written an average of 9 times over 10 minutes

For Indirect Checkpoint  data pages are written  an average of 29 times over 10 minutes




In our example above the indirect checkpoint feature significantly increased the number of times our hot table gets written to disk. This of course has to be taken into account with overall system performance and potential gains from indirect checkpoints.

In a future post we will look at some of the advantages of indirect checkpoints

What do the different SQL Server Drivers mean?


You have been asked to create an ODBC Data Source to connect an application to SQL Server. Most of the time you my not have a choice and the driver you will use to create a connection to SQL Server is dependant on your application requirements. But exactly what does each driver do and why do I see different drivers depending on the server I connect to?

Note: while you can choose between 32 bit and 64bit drivers for the purposes of this post I am going to concentrate on 64bit SQL Server drivers.

If we create a DSN on machine without SQL Server installed we will get the following driver option:


This is the Windows WDAC/MDAC driver which can be used to connect to SQL Server. Typically this would by used by generic application not taking advantage of SQL specific features. This driver is distributed as part of the Windows install. The roadmap for this driver can be viewed here

For SQL Server 2005 through to 2012 Microsoft SQL Server distributes and maintains a SQL Server driver called  SQL Server Native Client (SNAC). Different versions of SQL Server will have different SNAC numbers.

Taking the example of  SQL Server 2008R2 we can see version 10.0 of SNAC


The SNAC driver will be installed as part of the SQL Server install. On client machines where SQL Server will not be installed a standalone package is available as part of the Microsoft SQL Server 2008 R2 Feature Pack.

Why install SNAC?

When your application wishes to take advantage of SQL Server features like read only routing in availiability groups you will need to install SNAC 11  from the SQL Server 2012 feature pack.

With an install of SQL Server 2014 the new ODBC driver for SQL Server is installed. It also includes an install of the  SQL Server 2012 SNAC driver.

To summarise


SQL Server – Microsecond Storage Series – Buffer Pool Extension

Solid State Flash Storage have brought about a fundamental shift in disk response times, crossing the threshold from millisecond to microsecond latency. This has opened up a host of new possibilities covering everything from application and system design to brand new features within SQL Server. In this first post of a series  I will take a look at a new feature in SQL Server 2014 called Buffer Pool Extension.


Figure 1

Processing database transactions requires reading and writing data. SQL Server processes all changes on data structures in an area of memory called the SQL buffer pool.  For data that needs to be read into or written from the buffer pool, the further data is from the CPU core the more time and CPU cycles it will take to retrieve it.  A response time of 20 milliseconds is considered acceptable for spinning disks in most applications.  Now with the advent of flash storage taking off in the enterprise arena we are able to achieve sub-millisecond latency as the norm for flash disk response times.

Serving different business requirements flash storage comes in two basic implementations 1) PCIe flash cards such as EMC XtremSF and 2) Enterprise Storage Arrays such as EMC’s Xtremio all flash arrays. The PCIe flash card is closer to the CPU core but does not offer the enterprise data management capabilities of a SAN array so is best suited for situations where data persistence is not a business goal such as tempdb or as we are going to explore Buffer Pool Extensions.

Overview of Buffer Pool Extensions

SQL Server 2014 introduced a feature that can leverage a PCIe flash card to extend the buffer pool that I introduced earlier. This technology termed Buffer Pool Extension or BPE creates a multi-level caching hierarchy which Microsoft calls L1 and L2 cache, optimising data retrieval by using internal algorithms to place the most relevant clean pages on faster storage. You can find more details about how it works here. Once the BPE feature has been setup it is primarily self-managing, however it does change the way SQL Server processes data pages with corresponding changes to performance tools such as Extended Events, DMV’s & Performance Monitor. For a DBA it is important to understand the impact on performance baselines and troubleshooting. In this first post I will cover an overview of how these changes affect both the Buffer Manager and the Lazywriter process. In the next post of the series I will go into more details on performance management.

Please note that for editorial conciseness I have really simplified the diagrams and charts below. The intention is to show BPE in action. SQL Server has many optimisations and additional components to optimise data requests. If you want additional information this presentation is a great place to start.

Reading Pages

  1. 1. Using a simple example in figure 2 below once the relational engine has determined what access method is needed to service a request (in our example a single 8k page read) it is the responsibility of the buffer manager to read and retrieve the relevant data page.
  2. 2. Depending on the location of the page at the time of the request this can either be a logical (memory 8k) or
  3. 3. A physical (disk 64K since the buffer manager will usually read an entire extent) request.
  4. 4. When BPE is enabled and the data page is resident in the BPE file on the PCIe flash card the buffer manger differentiates the request as an extension page read.

Figure 2

When SQL Server requires data pages from either a PCIe flash card or a SAN it posts an asynchronous request to the operating system. In the operating system this requests is observed as a Physical Disk reads.


Figure 3

In the table below I have associated the various read events with their relevant performance monitor counters

Action Relevant SQL Server Buffer Manager counters Relevant OS Logical (or Physical) Disk counters
  1. 2. If the page exists in the Buffer Pool this is called alogical read.
SQL Server Buffer Manager : Page lookups/secSQL Server Buffer Manager : Readahead pages/secSQL Server Buffer Manager: Buffer Cache Hit Ratio
  1. 3. If the page has to be retrieved from disk this is called aphysical read.
SQL Server Buffer Manager: page reads /sec Logical Disk: Disk Reads /Sec.
  1. 4. When enabled if the page has to be retrieved from BPE this is called an extension page read
SQL Server Buffer Manager: Extension page read/secSQL Server Buffer Manager: Buffer Cache Hit Ratio Logical Disk: Disk Reads /Sec.

Figure 4

Page Writes

SQL Server has two main reasons to write pages from the buffer pool the first is to relieve memory pressure and the second is to maintain the recovery interval of the database.

Clean and Dirty pages

A dirty page is a page that has been changed since it was read in from disk or just created in memory. It will eventually get persisted to disk.

A clean page is a page that has not been modified since it was read from disk therefore it does not need to be written back out to disk

Memory pressure occurs when SQL Server does not have enough unused space in the buffer pool to servicepages needed for a request. For example if you need to read in pages from disk to service a read request. When this situation occurs SQL Server will get rid of pages. With a standard buffer pool configuration dirty pages get written out to disk while clean pages get evicted out from memory.  When BPE is enabled the clean pages may now get spilled out to faster storage. In figure 7 I have plotted page writes for a memory pressure event with additional steps for BPE highlighted in red.


Figure 5

In the table below I have associated the various buffer pool write events and their associated performance monitor counters

Action Buffer Manger counters Relevant OS Logical (or Physical) Disk counters
Write a Dirty Page to disk
  • Buffer Manager: page writes / sec
  • Lazy writes /Sec
Logical Disk: Disk Writes /Sec
BPE write
  • Buffer Manager: Extension page writes /Sec
Logical Disk: Disk  Writes /Sec.

Figure 6

I took an xperf  trace during some SQL Server activity on an instance where BPE is enabled. In the xperf table below we can observe the Lazywriter process writing to the SQL data file and the buffer pool extension file.


Pros and Cons of BPE

Optimising the placement of database pages in memory has been refined by Microsoft and others.  For example SQL Server is really smart at preventing operations such as CHECKDB from flushing the buffer pool. With the advent of super-fast flash based storage Microsoft SQL Server has extended the buffer pool onto SSD disk. From a performance management perspective DBA’s need to take into account that the extended buffer pool is now accounted for in performance counters such as SQL Server Buffer Manager: Buffer Cache Hit Ratio which counts both main memory and BPE when calculating the cache hit ratio.

So is BPE the right option for your environment?

  • One of the intended use cases for BPE is to improve performance for OLTP workloads by offloading small random I/O’s from platter disks to flash based PCIe cards. There are other ways to improve performance for these types of scenarios, if possible adding additional memory to the server may produce better performance results. Indeed according to Microsoft on systems with more than 64 GB of RAM BPE may not yield expected performance gains.
  • For Data Warehouse type workloads large block sequential IO can be optimally serviced from traditional spinning disks.

Similar to considering any new technology you should review your current environment and choose the best approach that balances all your requirements.  Importantly whichever solution you decide to use always test it out prior implementing in production.


In this post I introduced flash based storage and how it is utilised by SQL server for the Buffer Pool Extension feature. I explained how data access is faster the closer it is to the CPU core and how SQL Server uses this to optimise data page access & placement. BPE is designed to further enhance this by taking advantage of super-fast flash storage close to the CPU. I introduced new performance monitor counters and showed how BPE operations get exposed through both the buffer manager and disk OS counters. Finally I discussed the advantages and disadvantages of BPE and highlighted alternative approaches to improve I/O performance.  In the next post of the series I will show you how to correlate the various performance monitor counters to help determine if BPE is right for your workload.

If you are currently using or considering BPE please drop me a line I would really like to hear your experiences with it.



The checkpoint process writes dirty pages from the buffer pool area of memory to physical disk. It does not remove the pages from the buffer pool instead it marks them as clean again.Its purpose is to reduce the active portion of the log necessary to preform recovery so the frequency at which the checkpoint process is run determines the length of time it will take to recover the database when it is started up.  This needs to be balanced against the overhead required to run the checkpoint process. Similarly to the lazy writer process, checkpoint runs as a background task. We can query the sysprocesses system table to view its status:

SELECT cmd, spid, status, dbid, login_time, waittime, lastwaittype,
loginame,cpu, memusage, physical_io
FROM sysprocesses


By default SQL Server determines the interval at which the checkpoint process runs. This is usually every 1 minute. This behaviour can be changed by setting the recovery interval which is an advanced configuration option.

Generally I don’t change the default value of 0 which is to allow SQL Server control the checkpoint process.

Checkpoint and the buffer pool

I want to see exactly what the checkpoint process does and how this impacts on the buffer pool. I have setup a test database called DBA (DB_ID = 7)with a table called tblTest. This is a small table with a couple of rows. I have already determined that it is using page 222. The first thing I want to do is clear the page from cache. TO do this I am going to call the checkpoint statement to clean the page and then I am going to call DBCC DROPCLEANBUFFERS  to remove all clean pages form the buffer pool.  Since I have set the recovery interval to 1000 minutes the automatic checkpoint process should not kick in. Just to be sure I am also going to turn on Trace flag  3502. This will log all checkpoint activity in the SQL Server Log.

DBCC TRACEON (3502, -1);

I can verify that the page is not in the buffer pool by using the dm_os_buffer_descriptors  dmv.

select * from sys.dm_os_buffer_descriptors 
where page_id = 222 and database_id = 7


Next I am going to bring the tblTest into the buffer pool and dirty it., I am going to do this by updating all the existing rows in the table.


I can verify that the page is now in the buffer pool and that the page has been modified (dirty). The is_modiifed column in sys.dm_os_buffer_discriptors indicates the status of the page (clean / dirty as 1/0)

select * from sys.dm_os_buffer_descriptors
where page_id = 222 and database_id = 7


When we run a checkpoint again the page should remain in the buffer pool but it should now be clean

select * from sys.dm_os_buffer_descriptors
where page_id = 222 and database_id = 7


This shows that the checkpoint process cleaned the page and left it in the buffer pool

Checkpoint and the transaction log

I am not going to get into Transaction Log management. For the purposes of the demo I setup the following

  • The database is in the simple recovery mode
  • There were no outstanding active transactions
  • No other transaction log dependant features such as replication or DB mirroring were setup on the box.

I have a database setup and I have run a number of transactions against it. The recovery interval has been set to 1000 minutes. I am going to use DBCC LOGINFO and the undocumented

SELECT [Current LSN],[previous LSN], 
[Operation], context, [Checkpoint Begin], [checkpoint end], 
[page id], [minimum lsn], [Dirty Pages] [spid], [Begin Time], 
[Transaction Name], [End Time], [Lock Information], [Description]
 FROM ::fn_dblog(NULL, NULL) ;
SELECT [Current LSN],[previous LSN], 
[Operation], context, [Checkpoint Begin], [checkpoint end], 
[page id], [minimum lsn], [Dirty Pages] [spid], [Begin Time], 
[Transaction Name], [End Time], [Lock Information], 
[Description] FROM ::fn_dblog(NULL, NULL) ;


Finally when does checkpoint run

When the database is shutdown a checkpoint is run against each database. The only exception to this is when shutdown is called with the no wait option or the server crashes.

The checkpoint process is run before each backup

The transaction exceeds maximum’s for the active portion

SQL Server Clustering 2 – Windows Setup

We currently run a couple of production SQL Server clusters, the trouble for me is that they are very stable, I do not have to do much to maintain availability. to improve my Clustering knowledge I am taking a  2 phased approach. Since I learn best by doing, on top of reading up on the theory I am going to  build a two node windows 2003 cluster (virtualised) I am going to install SQL 2005 on top of it.


I was lucky that a few things went wrong and I learned a lot doing the install. The most important thing I learned do not attempt to install a SQL Server Cluster using RDP. Make sure all RDP connections are closed before starting the install and install from the console. I do not know why this is an issue, I must investigate further??? The error messages returned talked about Error message saying the native client will not install and No process is on the other end of the pipe. troubleshooting both of these, in the end it was RDP that was causing the issues.


I used VMware to build a 2 node windows 2003 cluster and I installed a SQL Server 2005 Developer Edition cluster on top of this. The main documents I consulted during the install were

SQL 2005 Clustering White Paper
Windows 2003 Cluster White Paper


Following this documentation, these are the notes I made to assist me with future installs.

  • Setup a single Virtual Machine using a standard configuration but add an additional NIC for the private heartbeat
  • Install Microsoft Windows Enterprise Edition with all applicable updates on both nodes, keeping both configurations identical. When the setup completes
  • Configure networking for both the LAN and the Private heartbeat NIC’s
  • Under Control Panel > Network Connections > Advanced Settings >
  • Configure the binding order, the LAN NIC should come first
  • Configure the private NIC
  •      Disable everything except TCP/IP
  •      Set the duplicity to half / 10mb’s –reliability over performance
  •      Use a separate class of network to public
  •      FOR TCP/IP have no default gateway, no DNS, disable NETBIOS
  • Power off Node1
  • Within VMware Clone Node1
  • Turn on the cloned Node1 which becomes Node2
  • Change the server name and the assigned IP addresses
  • Add Node2 to a  domain
  • Turn on Node1
  • On both nodes run ipconfig /all and complete a quick comparison between nodes
  • Add Node1 to domain
  • A  minimum of 3 shared disk are required (Quorum, MSDTC, SQL)
  • Add the shared disks to Node1, to do this the disk must first be zeroed out.


  • Use PUTTY to logon to the Server console and run vmkfstools –w diskname.vmdk
  • When adding the disk there is a requirement to use a new SCSI controller which is set to LSI Logic, with the SCSI bus sharing set to Virtual.


  • Add the disks to Node2 following the same procedure
  • Power off both nodes
  • Power on Node1 and add the disks through disk management and DISKPART setting the partition alignment.
  • Verify the disks in window’s explorer – add txt document edit and delete
  • Power off Node1
  • Power on Node2 and Test the disks
  • Power off Node2
  • Power on Node1
  • Ensure to have
    • IP address for Cluster and a cluster name
    • A domain account which has local Admin rights on Node1 and 2
  • from Cluster Administrator launch cluster setup wizard



  • When the setup completed review the cluster configuration log file C:\WINDOWS\system32\LogFiles\Cluster\clcsfsrv.log
  • power on Node2
  • From Node1 add Node2 using Cluster Administrator add node
  • Test that failover works between both nodes.
  • From Cluster Administrator
    • Set the Network Priority. Give private priority
    • Configure Private (heartbeat) for Private only
    • Public (LAN) for mixed

    On both node1 and node2