Minutes from SQL Server Users Group Meeting

Patrick Okeefee Patrick.okeeffe@quest.com

Product Architect for Quest Software’s spotlight on SQL Server product family

What problem are we trying to solve?

We are trying to get the best performance and the most efficiency from SQL Server

We are in this situation because…

Reactive – resource contention (CPU, I/O) is cause problem right now
Proactive – Aim I getting the best efficient from my workload? Will my application scale?

A SQL Server that is idle has no performance problems.

It is only when it executes the SQL in your application (or workload) that problems manifest.

Only by understanding how your application…

Where do I start?

In order to solve this problem we need a simple set of steps to follow:

Optimise application demand
Minimise logical I/O
Optimise physical IO (lather, rinse and repeat)

Most “bang for the buck” effort-wise is to be found in the first two steps. Then, iterate.

Within each of the three steps, there are three practical steps that can be taken…

identify bottlenecks
find the workload that is causing the bottleneck
fix the bottleneck

How do you find the bottlenecks?

customer phones; or
monitoring is better

What kind of bottlenecks should I be looking for?

What data do I collect?

Advice like the following free on the internet – “if counter x says y it means you have memory pressure so you should add more memory”
It’s worth what you paid for it…
Counters, states etc. are a means to an end – what you really care about is how your workload …

Do I have a CPU bottleneck?

Performance Counters
Signal waits > 25% total waits
2000 – dbcc sqlperf

What is a wait?

In a multithreaded server (like SQL Server), data flows from one subsystem to another and resources like disk, memory and CPU are shared.
When one worker thread (the one that is processing some user’s SQL) wants to access some shared resource, other threads have to wait.
Signal wait occurs when a thread has been granted access to the resource it was waiting on and is now waiting for CPU time.

CPU bottleneck Cause #1

Query Execution
We need the Top 5 CPU Consumers on the server
On SQL 2000 use a profile trace or a delta on …
On SQL 2005 query sys.dm_exec_query-stats cross apply sys.dm_exec_sql_text(qs.sql_handle)

CPU Bottleneck Cause #2

Low Plan Reuse
(Batch Requests – SQL Compilations) / Batch Requests
Hard to pin down specifically – more a general problem. Need to:
- Look at how end users are submitting queries
- Look for applications not using prepared statements (code snippets online set bad examples)
Excessive Recompilation (SQL Server 2005)

Memory bottlenecks

Assuming SQL Server is not starved of physical memory (i.e. no swapping), and we are re-using plans – from an application viewpoint, we are mostly interested in buffer cache behaviour
In OLTP applications, buffer cache bottlenecks are closely related to IO bottlenecks
Buffer Manager/Page Life Expectancy > 300 seconds is good

Memory Bottlenecks

Finding the top objects in the buffer cache

On SQL 2000 query syscacheobjects
On SQL 2005 query sys.dm_os_buffer_descriptors

IO Bottlenecks

Physical Disk Performance Counters
- Avg. Disk Queue Length, Avg. Disk Sec/Read etc
- Don’t forget to adjust for RAID
PAGEIOLATCH_* waits
- 2000 – dbcc sqlperf(waitstats)
- 2005 – select * from sys.dm_os_wait_stats

IO Bottleneck Cause #1 – Query Execution

We need the Top 5 IO Consumers on the server:

On SQL 2000 use a profiler trace – store results to a database table and query
- (Looking for workload that does large average numbers of logical reads).
On SQL 2005 use sys.dm_exec_query_status
- Look for queries with high average IO
- this means this query is reading lots of rows
- Ask the question – is this required?
Us equerries on sys.dm_dbindex_operational_stats to identify indexes that when read, required a physical IO
You can then XPath queries on sys.dm_exec_cached_plans to find workload using those indexes

Lock (and other) Contention

Contention usually manifests as blocking
Two common types
- Environment related (waiting on WRITELOG or PAGEIOLATCH_* for example)
- Application related (waiting on LCK_M_U for example)
Detecting blocking
- On SQL 2000 use a query on the sysprocesses table to find spids that are blocked
- On SQL 2005 use sys.dm_os_waiting_tasks
Patterns to look for
- Single long wait
- Large number of waits on single resource – “hotspot”
- Large numbers of waits on large numbers of resources
- All of the above chained together

Takeaways

Spotlight on SQL Server encapsulates all we talked about today…
Dashboard monitoring application

Computer James - A Space for All Things .NET

Tuesday, April 24, 2007