Patrick Okeefee Patrick.firstname.lastname@example.org
Product Architect for Quest Software’s spotlight on SQL Server product family
What problem are we trying to solve?
We are trying to get the best performance and the most efficiency from SQL Server
We are in this situation because…
- Reactive – resource contention (CPU, I/O) is cause problem right now
- Proactive – Aim I getting the best efficient from my workload? Will my application scale?
A SQL Server that is idle has no performance problems.
It is only when it executes the SQL in your application (or workload) that problems manifest.
Only by understanding how your application…
Where do I start?
In order to solve this problem we need a simple set of steps to follow:
- Optimise application demand
- Minimise logical I/O
- Optimise physical IO (lather, rinse and repeat)
Most “bang for the buck” effort-wise is to be found in the first two steps. Then, iterate.
Within each of the three steps, there are three practical steps that can be taken…
- identify bottlenecks
- find the workload that is causing the bottleneck
- fix the bottleneck
How do you find the bottlenecks?
- customer phones; or
- monitoring is better
What kind of bottlenecks should I be looking for?
What data do I collect?
- Advice like the following free on the internet – “if counter x says y it means you have memory pressure so you should add more memory”
- It’s worth what you paid for it…
- Counters, states etc. are a means to an end – what you really care about is how your workload …
Do I have a CPU bottleneck?
- Performance Counters
- Signal waits > 25% total waits
- 2000 – dbcc sqlperf
What is a wait?
- In a multithreaded server (like SQL Server), data flows from one subsystem to another and resources like disk, memory and CPU are shared.
- When one worker thread (the one that is processing some user’s SQL) wants to access some shared resource, other threads have to wait.
- Signal wait occurs when a thread has been granted access to the resource it was waiting on and is now waiting for CPU time.
CPU bottleneck Cause #1
- Query Execution
- We need the Top 5 CPU Consumers on the server
- On SQL 2000 use a profile trace or a delta on …
- On SQL 2005 query sys.dm_exec_query-stats cross apply sys.dm_exec_sql_text(qs.sql_handle)
CPU Bottleneck Cause #2
- Low Plan Reuse
- (Batch Requests – SQL Compilations) / Batch Requests
- Hard to pin down specifically – more a general problem. Need to:
- Look at how end users are submitting queries
- Look for applications not using prepared statements (code snippets online set bad examples)
- Excessive Recompilation (SQL Server 2005)
- Assuming SQL Server is not starved of physical memory (i.e. no swapping), and we are re-using plans – from an application viewpoint, we are mostly interested in buffer cache behaviour
- In OLTP applications, buffer cache bottlenecks are closely related to IO bottlenecks
- Buffer Manager/Page Life Expectancy > 300 seconds is good
Finding the top objects in the buffer cache
- On SQL 2000 query syscacheobjects
- On SQL 2005 query sys.dm_os_buffer_descriptors
- Physical Disk Performance Counters
- Avg. Disk Queue Length, Avg. Disk Sec/Read etc
- Don’t forget to adjust for RAID
- PAGEIOLATCH_* waits
- 2000 – dbcc sqlperf(waitstats)
- 2005 – select * from sys.dm_os_wait_stats
IO Bottleneck Cause #1 – Query Execution
We need the Top 5 IO Consumers on the server:
- On SQL 2000 use a profiler trace – store results to a database table and query
- (Looking for workload that does large average numbers of logical reads).
- On SQL 2005 use sys.dm_exec_query_status
- Look for queries with high average IO
- this means this query is reading lots of rows
- Ask the question – is this required?
- Us equerries on sys.dm_dbindex_operational_stats to identify indexes that when read, required a physical IO
- You can then XPath queries on sys.dm_exec_cached_plans to find workload using those indexes
Lock (and other) Contention
- Contention usually manifests as blocking
- Two common types
- Environment related (waiting on WRITELOG or PAGEIOLATCH_* for example)
- Application related (waiting on LCK_M_U for example)
- Detecting blocking
- On SQL 2000 use a query on the sysprocesses table to find spids that are blocked
- On SQL 2005 use sys.dm_os_waiting_tasks
- Patterns to look for
- Single long wait
- Large number of waits on single resource – “hotspot”
- Large numbers of waits on large numbers of resources
- All of the above chained together
- Spotlight on SQL Server encapsulates all we talked about today…
- Dashboard monitoring application