How often are SQL Server database index rebuilds required on XtremIO?

A question came up recently from Shridhar Deuskar (@ioguru on twitter) one of my colleagues who works in XtremIO engineering. We were discussing best practice for SQL Server on an XtremIO array and we got on to talking about if SQL Server index rebuilds arenecessary on XtremIO. My first thought was yes they are still necessary, but there will be fewer use cases for index rebuilds than on traditional spinning disk arrays; so I decided to test the hypothesis. Using our lab environment I devised a number of tests introducing fragmentation to database indexes and measuring the impact on performance and array capacity. Based on the testing scenarios used even at high levels of fragmentation there was little impact on performance (1%~2.7%) and disk capacity impact was negated with XtremIO inline deduplication and compression. In this blog post I share the results and explain the following:

 

  • Why index rebuilds are required for SQL Server
  • How to determine when they are required
  • What is different about XtremIO
  • Finally the technical details of how the tests in the lab were carried out.

 

 

Why perform index rebuilds in SQL Server?

A good way to think about index rebuilds is to think about getting your car serviced. When a car has been driven for a while it requires servicing and during this time the driver is inconvenienced. Servicing has advantages. The car will last longer, depreciate less and run smoothly. But conversely if a car is serviced too often you could just be wasting money and unnecessarily inconveniencing the driver. Scale this up to a global fleet of cars run by an enterprise the figures become alarming. Databases are similar, once they have been in use for a while on traditional arrays performance can decrease and more capacity may be required than strictly necessary. This is because data activity can cause fragmentation within the database. Specifically in SQL Server there are two types of fragmentation:

 

  • Internal fragmentation; wasted space within database pages resulting in increased capacity
  • Logical fragmentation; the pages are physically stored out of order to a tables index which impacts performance

 

Index rebuilds service the database by compacting wasted space and sorting the data pages, matching the physical order to the logical order. During the process, users are inconvenienced as the operation competes for resources such as CPU cycles,memory and storage limiting the performance to the database. It is important to make intelligent decisions about when to run index rebuilds otherwise you may be wasting company resources, time and inconveniencing users

 

 

Why SQL Server on EMC XtremIO is different

  • Capacity – XtremIO inline deduplication and compression works on database pages before they are written to disk,helping to mitigate the wasted space caused by internal fragmentation at the disk level.

1 (2)

 

Testing SQL Server on XtremIO, a data load of 54GB was inserted to the target database generating ~17,000 page splits per SQL Server batch request. This resulted in internal fragmentation averaging 32% wasted space per data page as well as additional writes to the transaction log.  XtremIO inline compression and deduplication negated this by achieving a42% disk space saving over the logical space allocated.

 

*A page split occurs when a new row needs to be added to a data page that is full. SQL Server moves around half of the existing page rows to a new page to free up space

 

  • Performance – For testing on XtremIO with SQL Server, even at high levels of logical fragmentation, query response time for both DSS and OLTP workloads show similar results to the same query after removing index fragmentation with an index rebuild.

 

Before Duration (Seconds or TPS)

After Duration (Seconds or TPS)

% Performance Difference

% (MAX) Fragmentation

Query Test 1 (Run a DSS query before and after a shrinkfile operation)

348

357

2.7%

93

Query Test 2 (Data load with high page split ratio. Run DSS query before and after index rebuild)

50

48

2.0%

75

Query Test 3 Complex Query (data load with high page split ratio. Run complex  DSS query before and after index rebuild)

344

334

2.6%

75

OLTP Workload Test  (ran before and after a shrinkfile operation)

18163 (TPS)

17971

1.1%

99

On EMC XtremIO how should we approach index maintenance?

Microsoft has documented an approach based on the level of fragmentation the details of which you can read here: Reorganize and Rebuild Indexes and Microsoft SQL Server 2000 Index Defragmentation Best Practices

A more applicable test for XtremIO is to take a backup of your database while it is fragmented, then use it as a baseline in a test environment to determine if the current index maintenance strategy for your application workload is improving performance and increasing capacity on XtremIO.

Depending on business requirement’s testing should incorporate a number of iterations to determine the best maintenance plan for example:

 

Performance Impact Capacity Impact
Baseline – fragmented indexes (record level of fragmentation)

Index Rebuilds

Index Reorganise

Statistics Update Only

 

Just keep in mind that statistics maintenance is an important consideration and will have to be considered in scenarios where index rebuilds are no longer required or where index reorganise is implemented.

 

How was testing carried out?

Using an Industry standard DSS benchmark database; data loading the main table using a random cluster key. We ran queries using these conditions:

  • Index order
  • Allocation order
  • Rebuild index to remove fragmentation.

 

Taking an industry standard OLTP benchmark database and running workloads before and after shrinking the SQL Server data files.

 

Methodology

  • For each test we captured data from XtremIO, perfmon, ETW and extended events.
  • The databases ranged in size from 250GB to 1TB.
  • The data loads ranged in size from 4GB – 75GB
  • The disk space savings were measured using XtremIO show-clusters command
  • For each query the SQL Server transaction log and the buffer pool were cleared prior to running the test using the command CHECKPOINT and DBCC DROPCLEANBUFFERS
  • Tested on:
  • SQL Server 2014 Enterprise Edition CU4
  • XtremIO Version: 3.0.0 build 44

 

 

Using SQL Server and Windows event tracing a repeatable pattern observed during testing was the workload IO characteristics

  • With fragmented indexes SQL Server generated large numbers of small I/O requests
  • With defragmented indexes SQL Server generated small numbers of large sized I/O.

2 (1)Thanks to sub millisecond latency XtremIO performed equally well servicing both I/O types

 

 

 

3

Conclusion: Testing showed that SQL server on an XtremIO array may require less index rebuilds increasing the availability of the array for the end user. Even for heavily fragmented indexes testing demonstrated XtremIO’s unique capabilities of sub-millisecond latency combined with inline compression and deduplication resulted in none of the performance or capacity penalties which are normally associated with fragmentation in SQL Server databases. Before implementing an expensive index maintenance plan on an XtremIO array it is worth taking the time to understand if or when the criteria for your workload requires it.

3 Topics I Learned More About In 2014 With 10 Links I Discovered As A Result

Reflecting back on 2014 it has been a great year for my own personal development. I went on courses, attend lots of webcasts but above all I learned a ton of stuff on the job, we are lucky to have a truly great collaborative environment to work in. Some of this I shared on blog posts throughout the year more of it I will publish in 2015. For now I have picked my own favourite  links I discovered during the year.

Hybrid Cloud

 Deploying  SQL Server on EHC will become so much easier

https://www.youtube.com/watch?v=jgdZX-XHBLs

With more to come in the future

https://www.youtube.com/watch?v=RlH9B_XDgNs

With some of the great business benefits explained here

https://www.youtube.com/watch?v=d2_-wnDDVuM

Vcloud – I attended this course a great learning experience

https://mylearn.vmware.com/mgrreg/courses.cfm?ui=www_edu&a=one&id_subject=43513

Storage

Software Defined Storage was new to me

http://www.emc.com/collateral/technical-documentation/h13420-vspex-private-cloud-hyper-v-scaleio-pig.pdf

(actually my boss did a great job at white boarding SDS but I don’t have a link to this)

The path to storage can be more complex than I ever knew

http://virtualgeek.typepad.com/virtual_geek/2009/06/vmware-io-queues-micro-bursting-and-multipathing.html

In SQL Server

I learnt a lot about running SQL Server on XtremIO

http://www.emc.com/collateral/white-papers/h13163-xtremio-microsoft-sql-server-wp.pdf

I had to get deeper with spin locks

http://download.microsoft.com/download/D/A/A/DAAD63AF-B06F-4F29-AF1D-68A78102ABF4/SQLServerSpinlockContention.pdf

Columstore keeps getting better

http://blogs.msdn.com/b/jimmymay/archive/2014/03/24/sql-server-2014-columnstore-indexes-the-big-deck.aspx

Finally …

RestAPIs are awesome and EMC has a great community

http://virtualgeek.typepad.com/virtual_geek/2014/12/first-emccode-devhigh5-winners.html

An Introduction to SQL Server I/O Performance Monitoring

Optimal SQL Server performance relies on the I/O subsystem. For example before a transaction can commit the I/O subsystem has to send an acknowledgment that  all the associated  log records have been flushed to disk. Similarly to service transactions SQL Server  will  read pages from disk into memory and then at a later stage write them back out gain to meet recovery intervals.

 

 

When working with SQL Server the primarily measurements I use for I/O are:

 

  • IOPS – The number of disk Input/Output operations per second
  • Latency – The time taken to complete an I/O request
  • Queue Length – The number of outstanding I/O requests
  • IO Size – The size of each I/O request
  • Bandwidth – The rate of bytes sent and retrieved from disk

 

The tools we use to measure I/O dictate the level of granularity and accuracy we bring to performance analysis.

There is a trade-off in the time it takes to  analyse the performance data and we also have to take into account anyoverhead that may be experienced when running collection routines. The key is to understand from experience what we need to collect for any given scenario and only go to this level. Having said that  the primary tools I use when working with SQL Server at the OS level are:

 

  • Perfmon
  • SQL Server DMVs
  • Extended Events / ETW

1 (1)

 

 

In the figure above taking an example of a  simple SQL Server file layout (2 data and 1 log file) we can see how each tool can be used to collect an ever more granular level of information.

 

Using Perfmon  we can collect data at the volume level, it is easy to configure and it has built-in analysis. This may be sufficient for a lot of scenarios especially when there is a 1-1 relationship between the SQL Server files and the disk volumes.

But what about the scenario where we have multiple files on the same volume?

What about when we want to look at the response times for a particular operation in SQL Server and the IO size associated with it?

 

In the table below I have taken the primary I/O measurements & tools described the level of granularity and plotted the various counters, dmv’s and traces associated with them.

 

4

Summary

In this post we looked at the measurements I use to monitor SQL Server I/O performance. I explained how we can use various tools to collect performance metrics. Understanding what  we need to collect and then using the appropriate tools to collect this information is key to successful performance analysis

References

http://blogs.technet.com/b/askcore/archive/2012/03/16/windows-performance-monitor-disk-counters-explained.aspx

http://technet.microsoft.com/en-us/library/cc938959.aspx

http://blogs.msdn.com/b/psssql/archive/2012/01/23/how-it-works-sys-dm-io-pending-io-requests.aspx

http://blogs.msdn.com/b/sql_pfe_blog/archive/2013/03/14/dissecting-sql-server-physical-reads-with-extended-events-and-process-monitor.aspx

http://msdn.microsoft.com/en-us/library/ff878413.aspx

http://blogs.technet.com/b/robertsmith/archive/2012/02/07/analyzing-storage-performance-using-the-windows-performance-toolkit.aspx

SQL Server Indirect Checkpoint feature an initial look

The indirect checkpoint feature introduced in SQL Server 2012 can deliver more reliable and faster recovery times. It can also reduce the I/O spikes often observed with automatic checkpoints. However for OLTP workloads  indirect checkpoints can effect system performance as they sometimes increase the amount of write I/O, in this blog post we are going to explore this  in more detail.

 

Understanding business requirements and how they translate into technical decision making is a key part of any IT professionals job. In this example we have an OLTP workload part of which contains a hot table that is frequently updated by the business. The table is small and easily fits in buffer pool  memory. Since the tables data pages are frequently accessed they are less lightly to get evicted by the Lazywriter process however they will still need to be flushed to disk as part of a checkpoint operation.

 

In a test environment to monitor how often these pages are getting flushed to disk I am going to setup an Extended Events Trace that tracks physical writes and stores then in a histogram. I applied a filter to the trace definition to only track data pages from the hot table in our database.

 

1

 

The Test

To compare automatic and indirect checkpoints I completed 2 test runs of 10 minutes each. In the first run I used automatic checkpoints and in the second run I switched to indirect checkpoints setting the target recovery time for the database to 60 seconds. I took the data from the 2 histograms and analysed it in Excel.

 

Analysis

With Automatic Checkpoint data pages are written an average of 9 times over 10 minutes

For Indirect Checkpoint  data pages are written  an average of 29 times over 10 minutes

 

2

Conclusion

In our example above the indirect checkpoint feature significantly increased the number of times our hot table gets written to disk. This of course has to be taken into account with overall system performance and potential gains from indirect checkpoints.

In a future post we will look at some of the advantages of indirect checkpoints

Monitoring Microsecond Storage – Perfmon with a little help from Powershell

I do a lot of performance analysis. I use perfmon as part of my toolkit it is great for performance data collection. I can use it to watch live performance data or I can save performance data for later analysis.

One limitation I have come across with perfmon is that the tool will only display performance data for disk latency (LogicalDisk(*)\Avg. Disk sec/Reads)  in milliseconds.

1

I can take a saved perfmon file and export the data to a csv file or a database and then view the data in at a more granular level.

2

Opening the csv file in Excel can now see the analyse the disk latency in microseconds.

3

 

This still leaves me with the limitation of not been able to view microsecond latency data in the toolkit. This is especially limiting when I want to view live performance data, enter powershell.

Powershell has a cmdlet called get-counter which will display live performance data and for disk latency this is displayed in microsecond granularity  . I can simply type something like the following example:

4

Live performance data for disk latency will be returned to the screen.

6

Summary

Perfmon is a great tool for collecting performance data. One limitation I have come across is that although disk latency is recorded at sub-millisecond level it is only displayed on the tool rounded to the nearest millisecond . Powershell offers the ability to display live disk latency at sub-millisecond latency this could be easily scripted to average latency over a time period or create a graph of live performance data something we will explore in  a later post.

What do the different SQL Server Drivers mean?

Scenario

You have been asked to create an ODBC Data Source to connect an application to SQL Server. Most of the time you my not have a choice and the driver you will use to create a connection to SQL Server is dependant on your application requirements. But exactly what does each driver do and why do I see different drivers depending on the server I connect to?

Note: while you can choose between 32 bit and 64bit drivers for the purposes of this post I am going to concentrate on 64bit SQL Server drivers.

If we create a DSN on machine without SQL Server installed we will get the following driver option:

1

This is the Windows WDAC/MDAC driver which can be used to connect to SQL Server. Typically this would by used by generic application not taking advantage of SQL specific features. This driver is distributed as part of the Windows install. The roadmap for this driver can be viewed here

For SQL Server 2005 through to 2012 Microsoft SQL Server distributes and maintains a SQL Server driver called  SQL Server Native Client (SNAC). Different versions of SQL Server will have different SNAC numbers.

Taking the example of  SQL Server 2008R2 we can see version 10.0 of SNAC

2

The SNAC driver will be installed as part of the SQL Server install. On client machines where SQL Server will not be installed a standalone package is available as part of the Microsoft SQL Server 2008 R2 Feature Pack.

Why install SNAC?

When your application wishes to take advantage of SQL Server features like read only routing in availiability groups you will need to install SNAC 11  from the SQL Server 2012 feature pack.

With an install of SQL Server 2014 the new ODBC driver for SQL Server is installed. It also includes an install of the  SQL Server 2012 SNAC driver.

To summarise

4

SQL Server – Microsecond Storage Series – Buffer Pool Extension

Solid State Flash Storage have brought about a fundamental shift in disk response times, crossing the threshold from millisecond to microsecond latency. This has opened up a host of new possibilities covering everything from application and system design to brand new features within SQL Server. In this first post of a series  I will take a look at a new feature in SQL Server 2014 called Buffer Pool Extension.

StorageSpeed

Figure 1

Processing database transactions requires reading and writing data. SQL Server processes all changes on data structures in an area of memory called the SQL buffer pool.  For data that needs to be read into or written from the buffer pool, the further data is from the CPU core the more time and CPU cycles it will take to retrieve it.  A response time of 20 milliseconds is considered acceptable for spinning disks in most applications.  Now with the advent of flash storage taking off in the enterprise arena we are able to achieve sub-millisecond latency as the norm for flash disk response times.

Serving different business requirements flash storage comes in two basic implementations 1) PCIe flash cards such as EMC XtremSF and 2) Enterprise Storage Arrays such as EMC’s Xtremio all flash arrays. The PCIe flash card is closer to the CPU core but does not offer the enterprise data management capabilities of a SAN array so is best suited for situations where data persistence is not a business goal such as tempdb or as we are going to explore Buffer Pool Extensions.

Overview of Buffer Pool Extensions

SQL Server 2014 introduced a feature that can leverage a PCIe flash card to extend the buffer pool that I introduced earlier. This technology termed Buffer Pool Extension or BPE creates a multi-level caching hierarchy which Microsoft calls L1 and L2 cache, optimising data retrieval by using internal algorithms to place the most relevant clean pages on faster storage. You can find more details about how it works here. Once the BPE feature has been setup it is primarily self-managing, however it does change the way SQL Server processes data pages with corresponding changes to performance tools such as Extended Events, DMV’s & Performance Monitor. For a DBA it is important to understand the impact on performance baselines and troubleshooting. In this first post I will cover an overview of how these changes affect both the Buffer Manager and the Lazywriter process. In the next post of the series I will go into more details on performance management.

Please note that for editorial conciseness I have really simplified the diagrams and charts below. The intention is to show BPE in action. SQL Server has many optimisations and additional components to optimise data requests. If you want additional information this presentation is a great place to start.

Reading Pages

  1. 1. Using a simple example in figure 2 below once the relational engine has determined what access method is needed to service a request (in our example a single 8k page read) it is the responsibility of the buffer manager to read and retrieve the relevant data page.
  2. 2. Depending on the location of the page at the time of the request this can either be a logical (memory 8k) or
  3. 3. A physical (disk 64K since the buffer manager will usually read an entire extent) request.
  4. 4. When BPE is enabled and the data page is resident in the BPE file on the PCIe flash card the buffer manger differentiates the request as an extension page read.
figure2

Figure 2

When SQL Server requires data pages from either a PCIe flash card or a SAN it posts an asynchronous request to the operating system. In the operating system this requests is observed as a Physical Disk reads.

figure3

Figure 3

In the table below I have associated the various read events with their relevant performance monitor counters

Action Relevant SQL Server Buffer Manager counters Relevant OS Logical (or Physical) Disk counters
  1. 2. If the page exists in the Buffer Pool this is called alogical read.
SQL Server Buffer Manager : Page lookups/secSQL Server Buffer Manager : Readahead pages/secSQL Server Buffer Manager: Buffer Cache Hit Ratio
  1. 3. If the page has to be retrieved from disk this is called aphysical read.
SQL Server Buffer Manager: page reads /sec Logical Disk: Disk Reads /Sec.
  1. 4. When enabled if the page has to be retrieved from BPE this is called an extension page read
SQL Server Buffer Manager: Extension page read/secSQL Server Buffer Manager: Buffer Cache Hit Ratio Logical Disk: Disk Reads /Sec.

Figure 4

Page Writes

SQL Server has two main reasons to write pages from the buffer pool the first is to relieve memory pressure and the second is to maintain the recovery interval of the database.

Clean and Dirty pages

A dirty page is a page that has been changed since it was read in from disk or just created in memory. It will eventually get persisted to disk.

A clean page is a page that has not been modified since it was read from disk therefore it does not need to be written back out to disk

Memory pressure occurs when SQL Server does not have enough unused space in the buffer pool to servicepages needed for a request. For example if you need to read in pages from disk to service a read request. When this situation occurs SQL Server will get rid of pages. With a standard buffer pool configuration dirty pages get written out to disk while clean pages get evicted out from memory.  When BPE is enabled the clean pages may now get spilled out to faster storage. In figure 7 I have plotted page writes for a memory pressure event with additional steps for BPE highlighted in red.

figure4

Figure 5

In the table below I have associated the various buffer pool write events and their associated performance monitor counters

Action Buffer Manger counters Relevant OS Logical (or Physical) Disk counters
Write a Dirty Page to disk
  • Buffer Manager: page writes / sec
  • Lazy writes /Sec
Logical Disk: Disk Writes /Sec
BPE write
  • Buffer Manager: Extension page writes /Sec
Logical Disk: Disk  Writes /Sec.

Figure 6

I took an xperf  trace during some SQL Server activity on an instance where BPE is enabled. In the xperf table below we can observe the Lazywriter process writing to the SQL data file and the buffer pool extension file.

figure5

Pros and Cons of BPE

Optimising the placement of database pages in memory has been refined by Microsoft and others.  For example SQL Server is really smart at preventing operations such as CHECKDB from flushing the buffer pool. With the advent of super-fast flash based storage Microsoft SQL Server has extended the buffer pool onto SSD disk. From a performance management perspective DBA’s need to take into account that the extended buffer pool is now accounted for in performance counters such as SQL Server Buffer Manager: Buffer Cache Hit Ratio which counts both main memory and BPE when calculating the cache hit ratio.

So is BPE the right option for your environment?

  • One of the intended use cases for BPE is to improve performance for OLTP workloads by offloading small random I/O’s from platter disks to flash based PCIe cards. There are other ways to improve performance for these types of scenarios, if possible adding additional memory to the server may produce better performance results. Indeed according to Microsoft on systems with more than 64 GB of RAM BPE may not yield expected performance gains.
  • For Data Warehouse type workloads large block sequential IO can be optimally serviced from traditional spinning disks.

Similar to considering any new technology you should review your current environment and choose the best approach that balances all your requirements.  Importantly whichever solution you decide to use always test it out prior implementing in production.

Conclusion

In this post I introduced flash based storage and how it is utilised by SQL server for the Buffer Pool Extension feature. I explained how data access is faster the closer it is to the CPU core and how SQL Server uses this to optimise data page access & placement. BPE is designed to further enhance this by taking advantage of super-fast flash storage close to the CPU. I introduced new performance monitor counters and showed how BPE operations get exposed through both the buffer manager and disk OS counters. Finally I discussed the advantages and disadvantages of BPE and highlighted alternative approaches to improve I/O performance.  In the next post of the series I will show you how to correlate the various performance monitor counters to help determine if BPE is right for your workload.

If you are currently using or considering BPE please drop me a line I would really like to hear your experiences with it.