Finding Optimal Number of CPUs for a Given CPU Intensive Workload

CPU Pressure

Download Truly Level 400 SQL Server Performance Monitoring and Tuning Webcast 3 (3 parts) – FREE Download; Firewall might block downloads, if you have issues downloading let us know, we will provide you with alternate download links.

Finding the optimal number of CPUs for a given, long running CPU intensive DSS/OLAP like query/workload

About the article

This is an excerpt from the 3 Day Level 400 Microsoft SQL Server Performance Monitoring & Tuning Hands-on Workshop; I explain the concepts in the workshop with examples. I try to do the best while writing this article, but it is not the same!

This articles discusses a way to find optimal number of CPUs for a given long running CPU intensive DSS/OLAP like queries/workload by measuring CPU pressure. The articles applies to a specific kind of workload, apply the concepts carefully.

Introduction

This article is applicable for finding optimal number of CPUs for long running CPU intensive queries/workload that doesnt frequently wait for other resources (typical DSS/OLAP workload).

It is not applicable if your queries/ workload is often waiting for resources (like I/Os, Locks, Latches etc.) without consuming CPU in a stretch (typical OLTP workload).

This article can also provide information on uneven CPU load across NUMA nodes and uneven CPU load within same NUMA node (load_factor effect).

It is recommended to analyze Windows Performance Monitor Counters for monitoring CPU pressure. Processor utilization greater then 75% to 80% indicates CPU pressure. Using Windows Performance Monitor should be the 1st step, the procedure suggested in this article should be considered as an additional step.

It is very important to find ways to optimize the queries/workload by tuning the database schema before attempting to add additional CPUs.

Description

When a customer asks you: I am running a DSS/ OLAP like SQL job and it takes x amount of time, how can I reduce the time so the SQL job completes sooner, can I add more CPUs ? if yes, how many ? this article will help you answer their question.

When you see CPU pressure, there are 2 options: you can either upgrade to faster CPUs or add additional CPUs. Upgrading to faster CPU will always help. Adding additional CPUs may not always help the SQL job to run faster unless that SQL job can take advantage of additional CPUs. If the customer already has the fastest CPUs available in the market then they have to wait for the next release of faster CPUs. More practical way might be to add additional CPUs if it helps, the below procedure will help you identify if this is the case.

This method calculates total user waits for CPU during the SQL workload and suggests additional CPUs if necessary. If CPU usage is at 100%, but no one waited for CPU during the workload, then adding additional CPU will not help; this is the basics of this calculation.

Current recommendations that are available on this topic calculates signal wait time to wait time ratio to suggest CPU pressure but this cannot help one easily identify number of additional CPUs necessary.

Procedure

When concurrent users apply simultaneous DSS/OLAP like CPU intensive workload, there could be CPU pressure. We can conclude presence of CPU pressure when at any given moment during this time period at least one or more user tasks waited for CPU resource.

In this case one can run the below query to find out how many CPU on an average will help to scale the workload better. It might be more informative to collect the below information in short time intervals (many samples) than just once to understand during which time of the workload application the CPU pressure was the most. Single sample will lead to average additional CPUs necessary for the entire workload duration.

1. Reset Wait Stats

dbcc sqlperf('sys.dm_os_wait_stats', clear) --example provided by www.sqlworkshops.com

2. Apply workload (you can find sample workload query at the end of this article, you need to execute the sample workload query simultaneously in many sessions to simulate concurrent user tasks).

3. Run the below query to find Additional CPUs Necessary it is important to run the query right after the workload completes to get reliable information.

select round(((convert(float, ws.wait_time_ms) / ws.waiting_tasks_count) / (convert(float, si.os_quantum) / si.cpu_ticks_in_ms) * cpu_count), 2) as Additional_Sockets_Necessary

from sys.dm_os_wait_stats ws cross apply sys.dm_os_sys_info si where ws.wait_type = 'SOS_SCHEDULER_YIELD'

--example provided by www.sqlworkshops.com

Don't miss our SQL Server Performance Tuning Hands-on Workshop! Register

SQL Server Performance Tuning Consulting & Health Check

sp_whopro - SQL Server Activity Monitoring and Logging Stored Procedure - Free Download

SQLTest - SQL Server Performance, Load, Stress and Unit Test Tool - Free Download

SQLVideo - SQL Server Performance Tuning and Troubleshooting - Free Youtube Videos

The SQL Server 2008 query:

select * from sys.dm_os_sys_info

select round(((convert(float, ws.wait_time_ms) / ws.waiting_tasks_count) / si.os_quantum * scheduler_count), 2) as Additional_Sockets_Necessary

from sys.dm_os_wait_stats ws cross apply sys.dm_os_sys_info si where ws.wait_type = 'SOS_SCHEDULER_YIELD'

--example provided by www.sqlworkshops.com

This article was written for SQL Server 2005. The concepts are the same in SQL Server 2012 and 2008, but the view sys.dm_os_sys_info has changed a bit. Also watch out for uneven distribution of workload over CPU cores before calculating additional sockets necessary.

Example:

When you have 2 CPUs and you run the sample workload with just 1 or 2 concurrent sessions you will see no recommendation for addition additional CPUs unless there is unbalanced user task distribution across CPUs. On the other hand if you run the workload with 4 concurrent sessions you will notice the query suggests you to add 2 additional CPUs.

If you run with 6 concurrent sessions you will notice the query suggests you to add 4 additional CPUs. If each workload runs in parallel (MAXDOP not 1), then you will notice additional CPU suggestion, you need to be careful in this case. For example with 2 CPUs when you run the workload (in parallel, MAXDOP 0/2) with 2 concurrent sessions, you will notice the suggestion to add 2 additional CPUs this just indicates the workload is more scalable with more CPUs parallel query execution as you can imagine can consume as many CPUs as you have and can consume more!!

The results are not reliable when other applications are running in the system. Also the results might be incorrect on a hyper threading enabled system.

Explanation:

When there are more user tasks concurrently needing CPU than available CPU, the excess user tasks will wait for CPU (there are exceptions when the workload is not evenly distributed across CPUs). In this case each user task uses its quantum, then goes into a wait state (waiting for CPU with wait_type SOS_SCHEDULER_YIELD. sys.dm_exec_requests doesnt show this wait type, probably by design to avoid showing user tasks in wait state when they are waiting for CPU. But sys.dm_os_wait_stats will include these waits) until all other runnable user tasks have used their quantum. If one measures how many tasks went into this wait state and for how long while the workload was applied it is possible to calculate the CPUs necessary to scale the workload better.

runnabkle_task_count from the dm_os_schedulers is also a indication for CPU pressure, but it is just a probe one cannot reasonably conclude the number of CPUs necessary for a given workload.

Exception:

There is an exception(for OLTP like workload) where a user tasks doesnt consume all of its quantum(goes into some other wait state before the quantum expires, waiting for I/Os, Locks, Latches etc.) in a stretch, but continues to run in a loop using CPU without using its full quantum. The method mentioned in this article cannot calculate the necessary additional CPUs in this case.. Most common example is short transactions using part of its quantum and starts WRITELOG waits and continues in a loop inserts using implicit transactions in a loop is a typical example.

Sample Workload:

Create the below table before running the query to generate CPU intensive workload.

Serial Workload:

select max(t1.c2 + t2.c2) from tab7 t1 cross join tab7 t2 option (maxdop 1) --example provided by www.sqlworkshops.com

Parallel Workload:

select max(t1.c2 + t2.c2) from tab7 t1 cross join tab7 t2 --example provided by www.sqlworkshops.com

Table:

--example provided by www.sqlworkshops.com

create table tab7 (c1 int primary key clustered, c2 int, c3 char(2000))

go

begin tran

declare @i int

set @i = 1

while @i < 200000

begin

insert into tab7 values (@i, @i, 'a')

set @i = @i + 1

end

commit tran

go

This and similar topics are covered in my workshops. If you are not able to attend any of our workshops but would like to learn more performance tuning techniques, I strongly recommend you to watch our Free Webcasts.

To receive example SQL Scripts used in the webcasts:  subscribe to our Newsletter.

This article is for informational purposes only; you use any of the suggestions given here entirely at your own risk.

© 2010-2015 www.sqlworkshops.com. All rights reserved.

Back to Top