Prefetch - Querying at the Speed of SAN

Prefetch is a mechanism with which SQL Server can fire up many I/O requests in parallel for Nested Loop join.

The SAN administrator says your data volume has a throughput capacity of 400 MB/sec. But your long running query is waiting for I/Os (PAGEIOLATCH_SH) and Windows Performance Monitor shows your data volume is doing 4 MB/sec. Where lies the problem?

When SQL Server does a Nested Loop join, it might enable Prefetch. Prefetch is a functionality where SQL Server fires up many I/O requests in parallel (many outstanding I/Os). Prefetch will lead to better performance for Nested Loop join queries when there are lots of rows in the outer input table. 25 is the magic number. When SQL Server estimates less than 25 rows for the outer input table Prefetch will be disabled. And when more than 25 rows are estimated Prefetch will be enabled.

This is a perfect design (yes, I am an ex-Microsoft employee!); SQL Server wants to avoid Prefetch for light weight queries, especially in an OLTP environment as too many queries doing Prefetch will hurt performance. I remember from my days at SQL Server Development team, OLTP benchmarks including latest TPC-E disables Prefetch with trace flag 8744. Write me an email and I will send you the link.

Why this is a problem: I observe this at many customers; plan is cached with Prefetch disabled because first execution resulted in less than 25 rows estimation for the outer input table. Then the plan is used for parameters that are resulting in more than 25 rows, actually 1000s of rows for the outer input table. In this case since SQL Server cached the plan, Prefetch is disabled and SQL Server is firing up 1 I/O at a time. This leads to poor utilization of SAN. Symptoms: Query is waiting for I/Os (PAGEIOLATCH_SH) for a long time (reading page after page), but disk queue length is never more than1.

--Example provided by www.sqlworkshops.com

--Create procedure that pulls orders based on City

drop procRegionalOrdersProc

go

create procRegionalOrdersProc @Citychar(20)

as

begin

declare@OrderIDint, @OrderDetailschar(200)

select@OrderID = o.OrderID, @OrderDetails = o.OrderDetails

fromRegionalOrders ao inner join Orders oon(o.OrderID = ao.OrderID)

whereCity = @City

end

go  

set statistics time ongo

--Example provided by www.sqlworkshops.com

--Execute the procedure with parameter SmallCity1

execRegionalOrdersProc'SmallCity1'

go  

--Right click on Clustered Index Scan

--to look at the properties

--Estimated number of rows in the

--outer input table is less than 25

 

     

--Right click on Nested Loops

--to look at the properties

--Notice Prefetch is disabled (missing)

--Clear data cache

checkpoint

dbccdropcleanbuffers

go

Think about a business running a billing job looping over thousands of their customers using stored procedure. If the first customer has placed less than 25 orders for the billing period then Prefetch will be disabled and every customer is limited to using 1 spindle at a time.  

--Example provided by www.sqlworkshops.com

--Execute the procedure with parameter BigCity

--We are using cached plan

execRegionalOrdersProc'BigCity'

go  

--Estimated number of rows in the

--outer input table is less than 25

  

--Notice Prefetch is disabled (missing)

--Query execution time 7 seconds

--What was the execution time? 

--Clear procedure cache

--to trigger a new optimization

dbccfreeproccache

go  

--Example provided by www.sqlworkshops.com

--Execute the procedure with parameter SmallCity2

execRegionalOrdersProc'SmallCity2'

go  

--Clear data cache

checkpointdbccdropcleanbuffers

go  

--Example provided by www.sqlworkshops.com

--Execute the procedure with parameter SmallCity

--We are using cached plan

execRegionalOrdersProc'BigCity'

go  

--Estimated number of rows in the

--outer input table is greater than 25

  

--Notice Prefetch is enabled

--Query execution time 3 seconds

--On a SAN it was less than a second

--What was the execution time?

 

--Example provided by www.sqlworkshops.com

--You can fix the issue by using any of the following

--hints

--Create procedure that pulls orders based on City

drop procRegionalOrdersProc

go

create procRegionalOrdersProc @Citychar(20)

as

begin

declare @OrderID int, @OrderDetailschar(200)

select@OrderID = o.OrderID, @OrderDetails = o.OrderDetails

fromRegionalOrders ao inner join Orders oon(o.OrderID = ao.OrderID)

whereCity = @City

--Hinting optimizer to use SmallCity2 for estimation

option(optimizefor(@City ='SmallCity2'))

--Hinting optimizer to estimate for the currnet parameters

--option (recompile)

--Hinting optimize not to use histogram rather

--density for estimation (average of all 3 cities)

--option (optimize for (@City UNKNOWN))

--option (optimize for UNKNOWN)

end

go 

--Example provided by www.sqlworkshops.com

--Execute the procedure with parameter SmallCity1

execRegionalOrdersProc'SmallCity1'

 go  

--Clear data cache

checkpoint

dbccdropcleanbuffers

go  

--Example provided by www.sqlworkshops.com

--Execute the procedure with parameter BigCity

--We are using cached plan

execRegionalOrdersProc'BigCity'

go  

--Notice Prefetch is enabled

--Estimated number of rows in the

--outer input table is greater than 25

--Query execution time 3 seconds

--On a SAN it was less than a second

--What was your execution time?

Some of you might think plan guides or plan freezing might solve this issue, well it won’t help here. The estimation from your plan guide plan will be ignored by the optimizer. Optimizer makes new estimation based on current set of parameter values (compile time parameter values).

I explain these concepts with detailed examples in my webcasts (http://www.sqlworkshops.com/webcasts), I recommend you to watch them. The best way to learn is to practice. To create the above tables and reproduce the behavior,  subscribe to our Newsletter and I will send you the relevant SQL Scripts.

If you search for trace flag 8744 you will end up with a KB Article 920093 titled: Tuning options for SQL Server 2005 and SQL Server 2008 when running in high performance workloads. There it is explained: Trace flag 8744 disables pre-fetching for the Nested Loops operator (with caution). Even though the topic looks interesting ‘Tuning options for high performance workloads’, don’t try these in production. From my point of view this article is about documenting undocumented trace flags used in benchmarks, let’s leave it at that!

This and similar topics are covered in my workshops. If you are interested in learning practical techniques to improve your SQL Server performance, register for the upcoming 3 Day Advanced Microsoft SQL Server 2014, 2012 and 2008 Performance Monitoring & Tuning Hands-on Workshop in  London, United Kingdom, during March 10-12, 2015. These are hands-on workshops with a maximum of 12 participants and not lectures.

SQL Server Performance Tuning Consulting & Health Checks

This article is for informational purposes only; you use any of the suggestions given here entirely at your own risk.

© 2010-2014 www.sqlworkshops.com. All rights reserved.

Back to Top