Parallel Sort and Merge Join - Watch out for Unpredictability in Performance

When SQL Server sorts or does merge join in parallel the query performance highly depends on other concurrent CPU intensive activities taking place on the server.

In the below example you will see that when we execute a query with MAXDOP 0 it completes in less than a second. When we have a CPU intensive query executing over a CPU core, the above query with MAXDOP 0 takes 42 seconds. In reality it need not just be 42 seconds, it could be many minutes or hours depending on data volume. Also possible solutions are discussed in webcast 3.

Enough explanation, let’s see a real world example.

Let’s execute the query with MAXDOP 0:

The reason I am using convert is to trick the optimizer to grant additional memory. The memory estimated by the optimizer without convert is not enough to perform an in-memory sort. The sort spills to tempdb without the convert leading to non optimal performance. You can learn more about this from my webcasts or watching my SQL Bits V session at http://go.microsoft.com/?linkid=9712350 or reading the MSSQL Tip based on our webcast, http://www.mssqltips.com/tip.asp?tip=1955 (written by Matteo Lorini).

set statistics time on

go

--Example provided by www.sqlworkshops.com

declare @order_date datetime, @order_status int, @order_id int, @order_description varchar(4000)

select @order_date = order_date, @order_status = order_status, @order_id = order_id, @order_description = convert(varchar(4000), order_description) from Orders

where order_date between '2005-01-01' and '2005-06-30'

order by order_id

option (maxdop 0)

go

--Query takes less than a second

  

Now let’s execute a CPU intensive query over another session:

--Example provided by www.sqlworkshops.com

--Execute this CPU intensive query on a different session

declare @i int

while 1=1

select @i = count(*) from sys.objects option (maxdop 1)

go

   

Now let’s execute the above query with MAXDOP 0:

--Example provided by www.sqlworkshops.com

declare @order_date datetime, @order_status int, @order_id int, @order_description varchar(4000)

select @order_date = order_date, @order_status = order_status, @order_id = order_id, @order_description = convert(varchar(4000), order_description)

from Orders

where order_date between '2005-01-01' and '2005-06-30'

order by order_id

option (maxdop 0)

go

--Query takes about 42 second

--How many seconds did it take on your server ?

  

The reason for the performance issue is the way SQL Server implements merging or order preserving exchanges and SQL Server’s CPU/thread scheduling architecture.

You can see our ‘Gather Streams’ has ‘Order By’, the sorted output of the parallel thread has to be merged by ‘Gather Streams’ maintaining the sorted order.

   

Every parallel thread supplies a page of sorted rows (packet) and the ‘Gather Streams’ merges these pages. Once ‘Gather Streams’ runs out of rows from a thread it waits for the next page of sorted rows from that thread. If that thread (which has to supply the next page of sorted rows) is executing on a CPU core that is busy executing a CPU intensive query, then the thread has to wait up to 4ms to supply the next page of sorted rows.

Duration depends on how many CPU cores you have, with more cores, each thread sorts less rows.

4ms is the scheduling quantum in SQL Server. Refer to the article www.sqlworkshops.com/cpupressure

I wrote a while ago and the webcasts for additional details.

   

Some of you might think well in that case I will not execute queries in parallel, will set MAXDOP to 1. There are cases where you have to use all the CPU cores (return on investment (ROI), Maximizing your existing hardware CPU, memory and disk); you cannot tell your customers ‘well everyone has to wait while I execute this report using 1 CPU core keeping the rest of the CPU cores idle for the next hour’. You have to find creative ways to resolve these issues. One way to solve the issue and bring back query performance predictability is to partition the workload; I discuss this in my webcast (www.sqlworkshops.com/webcasts).

This issue is just not limited to Sort; below you can see SQL Server chooses to execute the query in parallel using merge join.

The query executes faster with merge join when the CPU intensive query is not executing:

   

The query does parallel merge join:

  

When we force hash join, the query performance is similar:

     

The query does parallel hash join:

Let's execute the CPU intensive query on a CPU core:

   

Now the Merge join query is very slow, takes in my case 19 seconds:

   

In the execution plan you can see merge join:

    

Now when we force a hash join, the query is fast.

  

In the execution plan you can see hash join:

  

This is because hash join does not suffer from CPU/thread scheduling architecture like merge join. Parallel hash join suffers from memory pressure due to uneven row distribution, similar to parallel sort, which I explain in my webcasts.

In a situation like above, when the cost of parallel merge and parallel hash are close, one can choose hash (with an hint) as it gives performance predictability. And again the other possibility is to partition the workload (Soft-NUMA) like I discuss in webcast 3.

I explain these concepts with detailed examples in my webcasts (www.sqlworkshops.com/webcasts), I recommend you to watch them. The best way to learn is to practice. To create the above tables and reproduce the behavior, join the mailing list here and I will send you the relevant SQL Scripts.

This and similar topics are covered in my workshops. If you are interested in learning practical techniques to improve your SQL server performance, register for the upcoming 3 Day Level 400 Microsoft SQL Server 2012 and 2008 Performance Monitoring & Tuning Hands-on Workshop in London, United Kingdom, during November 11-13, 2014. These are hands-on workshops with a maximum of 12 participants and not lectures.

SQL Server performance tuning consulting

This article is for informational purposes only; you use any of the suggestions given here entirely at your own risk.
© 2013 www.sqlworkshops.com. All rights reserved. Terms of Use