Workshops

Instructor
Schedule
Feedbacks
Webcasts
Contacts




Sign up

for Email

Updates


Download Truly Level 400 SQL Server Performance Monitoring and Tuning Webcast 3 Part 1: SQLWebcast3Part1of3.wmv(103MB), Part 2: SQLWebcast3Part2of3.wmv(116MB) & Part 3: SQLWebcast3Part3of3.wmv(69.1MB) - FREE Download. Webcast 3 released on February 10, 2010 (Box.net download counters as of February 14, 2012: 39,568 downloads, sum of all 3 Parts). Firewall might block downloads, if you have issues downloading let us know, we will provide you with alternate download links.

Finding optimal number of CPUs for a given long running CPU intensive DSS/OLAP like queries/workload:

About the article:
This is an excerpt from the 3 Day Level 400 Microsoft SQL Server 2008 Performance Monitoring & Tuning Hands-on Workshop; I explain the concepts in the workshop with examples. I try to do the best while writing this article, but it is not the same!!

This articles discusses a way to find optimal number of CPUs for a given long running CPU intensive DSS/OLAP like queries/workload by measuring CPU pressure. The articles applies to a specific kind of workload, apply the concepts carefully.

If you find this article helpful, give me feedback, this motivates me to write some more articles. I also encourage you to attend the 3 Day Level 400 Microsoft SQL Server 2008 Performance Monitoring & Tuning Hands-on Workshop. Register for the upcoming 3 Day Level 400 Microsoft SQL Server 2008 Performance Monitoring & Tuning Hands-on Workshop, for workshop schedule click here. These are hands-on workshops with a maximum of 12 participants and not lectures. For consulting engagements click here.

To read additional articles I wrote click here.

Introduction:
This article is applicable for finding optimal number of CPUs for long running CPU intensive queries/workload that doesnt frequently wait for other resources (typical DSS/OLAP workload).

This article is not applicable if your queries/workload is often waiting for resources (like I/Os, Locks, Latches etc.) without consuming CPU in a stretch (typical OLTP workload).

This article can also provide information on uneven CPU load across NUMA nodes and uneven CPU load within same NUMA node (load_factor effect).

It is recommended to analyze Windows Performance Monitor Counters for monitoring CPU pressure. Processor utilization greater then 75% to 80% indicates CPU pressure. Using Windows Performance Monitor should be the 1st step, the procedure suggested in this article should be considered as an additional step.

It is very important to find ways to optimize the queries/workload by tuning the database schema before attempting to add additional CPUs.

Lubor Kollar, Customer Advisory Team, SQL Server Development, Microsoft Corporation:
I have attended the first two days of Rameshs Performance Monitoring and Tuning Hands-on Workshop in Prague on Oct. 15 and 16.
My main goal was to learn in detail the contents of Rameshs class so that I can suggest his class to other Microsoft customers Im working with in the role of SQL Server Customer Advisory Team manager. I was very pleased both with the content and delivery of Rameshs class. He is providing useful, deep and very accurate information that will help our customers when developing, testing, tuning and deploying their applications using SQL Server 2005 or upgrading existing applications from previous SQL Server releases. Ramesh showed extraordinary attention to detail and he explained in depth the internals of the engine.

Description:
When a customer asks you: I am running a DSS/OLAP like SQL job and it takes x amount of time, how can I reduce the time so the SQL job completes sooner, can I add more CPUs ? if yes, how many ? this article will help you answer their question.

When you see CPU pressure, there are 2 options: you can either upgrade to faster CPUs or add additional CPUs. Upgrading to faster CPU will always help. Adding additional CPUs may not always help the SQL job to run faster unless that SQL job can take advantage of additional CPUs. If the customer already has the fastest CPUs available in the market then they have to wait for the next release of faster CPUs. More practical way might be to add additional CPUs if it helps, the below procedure will help you identify if this is the case.

This method calculates total user waits for CPU during the SQL workload and suggests additional CPUs if necessary. If CPU usage is at 100%, but no one waited for CPU during the workload, then adding additional CPU will not help; this is the basics of this calculation.

Current recommendations that are available on this topic calculates signal wait time to wait time ratio to suggest CPU pressure but this cannot help one easily identify number of additional CPUs necessary.

Procedure:
When concurrent users apply simultaneous DSS/OLAP like CPU intensive workload, there could be CPU pressure. We can conclude presence of CPU pressure when at any given moment during this time period at least one or more user tasks waited for CPU resource.

In this case one can run the below query to find out how many CPU on an average will help to scale the workload better. It might be more informative to collect the below information in short time intervals (many samples) than just once to understand during which time of the workload application the CPU pressure was the most. Single sample will lead to average additional CPUs necessary for the entire workload duration.

1. Reset Wait Stats
dbcc sqlperf('sys.dm_os_wait_stats', clear --example provided by www.sqlworkshops.com

2. Apply workload (you can find sample workload query at the end of this article, you need to execute the sample workload query simultaneously in many sessions to simulate concurrent user tasks).

3. Run the below query to find Additional CPUs Necessary it is important to run the query right after the workload completes to get reliable information.
select round(((convert(float, ws.wait_time_ms) / ws.waiting_tasks_count) / (convert(float, si.os_quantum) / si.cpu_ticks_in_ms) * cpu_count), 2) as Additional_Sockets_Necessary
from sys.dm_os_wait_stats ws cross apply sys.dm_os_sys_info si where ws.wait_type = 'SOS_SCHEDULER_YIELD'  --example provided by www.sqlworkshops.coms

The SQL Server 2008 query:

select round(((convert(float, ws.wait_time_ms) / ws.waiting_tasks_count) / si.os_quantum * scheduler_count), 2) as Additional_Sockets_Necessary
from sys.dm_os_wait_stats ws cross apply sys.dm_os_sys_info si where ws.wait_type = 'SOS_SCHEDULER_YIELD'  --example provided by www.sqlworkshops.com

The article was written for SQL Server 2005. The concepts are the same in SQL Server 2008, but the view sys.dm_os_sys_info has changed a bit. Also watch out for uneven distribution of workload over CPU cores before calculating additional sockets necessary.

SQL Webcast Feedbacks

Example:
When you have 2 CPUs and you run the sample workload with just 1 or 2 concurrent sessions you will see no recommendation for addition additional CPUs unless there is unbalanced user task distribution across CPUs. On the other hand if you run the workload with 4 concurrent sessions you will notice the query suggests you to add 2 additional CPUs. If you run with 6 concurrent sessions you will notice the query suggests you to add 4 additional CPUs.

If each workload runs in parallel (MAXDOP not 1), then you will notice additional CPU suggestion, you need to be careful in this case. For example with 2 CPUs when you run the workload (in parallel, MAXDOP 0/2) with 2 concurrent sessions, you will notice the suggestion to add 2 additional CPUs this just indicates the workload is more scalable with more CPUs parallel query execution as you can imagine can consume as many CPUs as you have and can consume more!!

The results are not reliable when other applications are running in the system. Also the results might be incorrect on a hyper threading enabled system.

Explanation:
When there are more user tasks concurrently needing CPU than available CPU, the excess user tasks will wait for CPU (there are exceptions when the workload is not evenly distributed across CPUs). In this case each user task uses its quantum, then goes into a wait state (waiting for CPU with wait_type SOS_SCHEDULER_YIELD. sys.dm_exec_requests doesnt show this wait type, probably by design to avoid showing user tasks in wait state when they are waiting for CPU. But sys.dm_os_wait_stats will include these waits) until all other runnable user tasks have used their quantum. If one measures how many tasks went into this wait state and for how long while the workload was applied it is possible to calculate the CPUs necessary to scale the workload better.

runnabkle_task_count from the dm_os_schedulers is also a indication for CPU pressure, but it is just a probe one cannot reasonably conclude the number of CPUs necessary for a given workload.

Download Truly Level 400 SQL Server Performance Monitoring and Tuning Webcast 2 Part 1: SQLWebcast2Part1of3.wmv(35.1MB), Part 2: SQLWebcast2Part2of3.wmv(34.9MB) & Part 3: SQLWebcast2Part3of3.wmv(30.6MB) - FREE Download. Webcast 2 released on January 20, 2010 (Box.net download counters as of February 14, 2012: 23,953 downloads, sum of all 3 Parts).

Download Truly Level 400 SQL Server Performance Monitoring and Tuning Webcast 1  Part 1: SQLWebcast1Part1of3.wmv(29.1MB), Part 2: SQLWebcast1Part2of3.wmv(47.5MB) & Part 3: SQLWebcast1Part3of3.wmv(37.4MB) - FREE Download. Webcast 1 released on November 21, 2009 (Box.net download counters as of February 14, 2012: 36,079 downloads, sum of all 3 Parts).

To receive example SQL Scripts used in webcasts: Sign up for Email Updates. Please take a few minutes and pass on the webcast link to your friends and local user community. Feedback from a Microsoft Employee: Your webcast rocked. Your free webcast was very good and informative.

Exception:
There is an exception(for OLTP like workload) where a user tasks doesnt consume all of its quantum(goes into some other wait state before the quantum expires, waiting for I/Os, Locks, Latches etc.) in a stretch, but continues to run in a loop using CPU without using its full quantum. The method mentioned in this article cannot calculate the necessary additional CPUs in this case.. Most common example is short transactions using part of its quantum and starts WRITELOG waits and continues in a loop inserts using implicit transactions in a loop is a typical example.

Additional Information:
If you have attended my SQL 2005 workshop(s) in the past two and a half years you know how to monitor CPU Pressure under various scenarios.

I encourage you to attend the 3 Day Level 400 Microsoft SQL Server 2008 Performance Monitoring & Tuning Hands-on Workshop, where you will gain real practical knowledge.

This level 400 workshop requires participants to have basic prior SQL Server experience, specifically knowledge about Cluster, Heap and Non-cluster index, Index Tuning Wizard and SQL Profiler - to avoid repetition. The workshop covers specifically SQL Server 2008 Performance Monitoring & Tuning topics - doesn't cover Disaster Recovery topics on the side to keep the focus.

Have you been to a workshop where the instructor explains the difference between Clustered and Heap index (or demonstrates SQL Profiler) and you felt like having an another coffee break ?
Have you been to a workshop where the instructor explains deeply about DBCC CHECKDB and then 2 years later you felt like you could have spent that time on a vacation because it didnt make a dent in your career ?

Well this workshop will be an entirely different experience. This truly level 400 workshop gives you an opportunity to really gain practical knowledge and use that knowledge to advance your career.

Microsoft Employees (including senior members of Microsoft Consulting Services, Support and SQL Server Product Development team) have participated in the 3 day Level 400 Microsoft SQL Server 2005 Performance Monitoring & Tuning Hands-on Workshops delivered by R Meyyappan. SAP AGs IT Department DBAs in Walldorf, Germany have participated in Level 300 Microsoft SQL Server 2005 Administration & Tuning for SAP Customers Hands-on Workshops delivered by R Meyyappan.

If you find this article helpful, give me feedback, this motivates me to write some more articles. I also encourage you to attend the 3 Day Level 400 Microsoft SQL Server 2008 Performance Monitoring & Tuning Hands-on Workshop. Register for the upcoming 3 Day Level 400 Microsoft SQL Server 2008 Performance Monitoring & Tuning Hands-on Workshop, for workshop schedule click here. These are hands-on workshops with a maximum of 12 participants and not lectures. For consulting engagements click here.

Charges: 2100 Euro (VAT excl.). For Agenda click here. For Workshop schedule and registration information click here.

Disclaimer and copyright information:
This article refers to organizations and products that may be the trademarks or registered trademarks of their various owners.

Copyright of this article belongs to R Meyyappan / www.sqlworkshops.com. You may freely use the ideas and concepts discussed in this article with acknowledgement (www.sqlworkshops.com), but you may not claim any of it as your own work.

This article is for informational purposes only; you use any of the suggestions given here entirely at your own risk.

Sample Workload:
Create the below table before running the query to generate CPU intensive workload.

Serial Workload: select max(t1.c2 + t2.c2) from tab7 t1 cross join tab7 t2 option (maxdop 1 --example provided by www.sqlworkshops.com

Parallel Workload: select max(t1.c2 + t2.c2) from tab7 t1 cross join tab7 t2  --example provided by www.sqlworkshops.com

Table:
--example provided by www.sqlworkshops.com
create table tab7 (c1 int primary key clustered, c2 int, c3 char(2000))
go
begin tran
declare
@i int
set
@i = 1
while @i <= 5000
begin
insert
into tab7 values (@i, @i, 'a')
set @i = @i + 1
end
commit
tran
go

Webcast Agenda:

Webcast 1: Explains Memory allocation issues with sort. Demonstrates ways to identify sort spills to tempdb. Provides query rewrite procedure to avoid sort spills to tempdb. Demonstrates cases where 1 tempdb date file per core might not be ideal for all implementations.

Webcast 2: Explains high CXPACKET waits are NOT a direct result of delays associated with inefficiencies of parallel processing. Provides example to scale queries over many CPU cores without reducing MAXDOP settings.

Webcast 3: Recommends not using stored procedure or other plan caching mechanism like using sp_executesql and Prepared Statement using ADO.NET or OLEDB based executions for memory allocating queries. Common memory allocating queries are that perform Sort and do Hash Match operations like Hash Join or Hash Aggregation or Hash Union.

With examples provides ways to identify queries performing Hash match operations that spill to tempdb. Using SQL Profiler: Hash Warnings (Hash Recursion and Hash Bailout). Using sys.dm_exec_query_memory_grants: Granted Memory, Used Memory and Maximum Used memory. Explains how SLEEP_TASK wait type is associated with Hash Warnings.

Explains with examples how sp_recompile can block and bring an application to a standstill and recommends using DBCC FREEPROCCACHE (plan_handle) instead.

Explains with examples how stored procedure or other plan caching mechanism affects queries that sort (perform order by). Explains how IO_COMPLETION wait type is associated with Sort Warnings.

Demonstrates why rollbacks and database restore could wait on IO_COMPLETION wait type and if needed how this wait can be reduced.

Explains how parallel query performance is significantly affected by a CPU intensive query executing on one of the CPU cores. Explains the reason for the performance issue and how to identify the issue with SLEEP_TASK wait type. With example explains the reason for the observed query execution time when the child thread executes on the CPU core where the CPU intensive query executes and also when the coordinator executes on the CPU core where the CPU intensive query executes. Recommends ways to avoid this parallel sort query performance issue and also demonstrates that only certain queries will be affected by this and not all parallel queries.

Explains sys.dm_os_waits_stats.signal_wait_time_ms does not indicate system wide CPU pressure, just at a CPU core level.

Explains the prefetch mechanism and how it can affect query performance. Provides ways to force prefecth. Demonstrates cases where Avg. Disk Sec / read, the disk latency, PAGEIOLATCH_SH, Avg. disk queue length is very high and the query executes fast.

Explains why Plan Guides (Plan Freezing!) cannot help with prefetch or in few other cases.

SQL Workshop Feedbacks

Ramesh's consulting customers include Austria: Allgemeines Krankenhaus Linz, BMD Systemhaus, Bundesrechenzentrum, bwin, C07 IT Operations, CMC Markets, derStandard, ERGO Insurance Service, Fabasoft Austria, Greentube, IAEA (UN), Interwetten, INTERSPORT, Kapsch, Linz AG, Mayr-Melnhof Karton, NÖM, ÖBB, OMV, Österreichische Post, RACON West Software, Raiffeisen Bankengruppe, Raiffeisen Informatik, S.Spitz, Salzburg AG, Siemens, TIWAG, UC4, Vinzenz Gruppe, Wiener Zeitung, Wirtschaftskammer Österreich; Bulgaria: MobilTel; Czech Republic: Česká pojišťovna, ČSOB Bank, GTS Novera, Telefónica O2, Vodafone; Denmark: Danfoss, Egmont, NNIT; Finland: Cap Gemini, Kemira, Pohjola; Germany: JAM Software, SAP AG; Netherlands: De Nederlandsche Bank; Norway: Acta, Aker Solutions, Det Norske Veritas(DNV), Komplett; Russia: Mediatel; Serbia: Tarkett; Slovakia: Adastra Corporation, VÚB Banka; Slovenia: Bankart, Iskratel, KDD, Krka, Mobitel, Pošta Slovenije, Sava, SRC.SI; South Africa: De Beers, Kumba, Sasol; Spain: CIE Automotive, Gas Natural, ITP; Sweden: Sony Ericsson; Switzerland: F. Hoffmann-La Roche, Philip Morris International; UAE: Dubai Customs. For past consulting engagement feedbacks click here.

COPYRIGHT AND TRADEMARK NOTICES:

The copyright in all material provided on this site is held by SQLWorkshops.com or by the original creator of the material. Except as stated herein, none of the material may be copied, reproduced, distributed, republished, downloaded, displayed, posted or transmitted in any form or by any means, including, but not limited to, electronic, mechanical, photocopying, recording, or otherwise, without the prior written permission of SQLWorkshops.com or the copyright owner. Permission is granted to download the webcast on this Site for personal, non-commercial use only, provided you do not modify the webcast and that you retain all copyright and other proprietary notices contained in the webcast. You also may not, without SQLWorkshops.com permission, "mirror" any webcast contained on this Site on any other server. This permission terminates automatically if you breach any of these terms or conditions. Upon termination, you must immediately destroy any downloaded webcast. Any unauthorized use of any webcast contained on this Site may violate copyright laws, trademark laws, the laws of privacy and publicity, and communications regulations and statutes.
Microsoft is a registered trademark of Microsoft Corporation in the United States and/or other countries. SQL Server is a trademark of Microsoft Corporation in the United States and/or other countries. This webcast may refer to organizations and products that may be the trademarks or registered trademarks of their various owners.
You may freely use the ideas and concepts discussed in this webcast, but you may not claim any of it as your own work. You may quote freely, with acknowledgements to SQLWorkshops.com.
This article is for informational purposes only; you use any of the suggestions given here entirely at your own risk.

 

© 2010 www.sqlworkshops.com. All rights reserved. Terms of Use