Tuesday, April 24, 2007

Minutes from SQL Server Users Group Meeting

Patrick Okeefee Patrick.okeeffe@quest.com

Product Architect for Quest Software’s spotlight on SQL Server product family

What problem are we trying to solve?

We are trying to get the best performance and the most efficiency from SQL Server

We are in this situation because…

  • Reactive – resource contention (CPU, I/O) is cause problem right now
  • Proactive – Aim I getting the best efficient from my workload? Will my application scale?

A SQL Server that is idle has no performance problems.

It is only when it executes the SQL in your application (or workload) that problems manifest.

Only by understanding how your application…

Where do I start?

In order to solve this problem we need a simple set of steps to follow:

  • Optimise application demand
  • Minimise logical I/O
  • Optimise physical IO (lather, rinse and repeat)

Most “bang for the buck” effort-wise is to be found in the first two steps. Then, iterate.

Within each of the three steps, there are three practical steps that can be taken…

  1. identify bottlenecks
  2. find the workload that is causing the bottleneck
  3. fix the bottleneck

How do you find the bottlenecks?

  1. customer phones; or
  2. monitoring is better

What kind of bottlenecks should I be looking for?

What data do I collect?

  • Advice like the following free on the internet – “if counter x says y it means you have memory pressure so you should add more memory”
  • It’s worth what you paid for it…
  • Counters, states etc. are a means to an end – what you really care about is how your workload …

Do I have a CPU bottleneck?

  • Performance Counters
  • Signal waits > 25% total waits
  • 2000 – dbcc sqlperf

What is a wait?

  • In a multithreaded server (like SQL Server), data flows from one subsystem to another and resources like disk, memory and CPU are shared.
  • When one worker thread (the one that is processing some user’s SQL) wants to access some shared resource, other threads have to wait.
  • Signal wait occurs when a thread has been granted access to the resource it was waiting on and is now waiting for CPU time.

CPU bottleneck Cause #1

  • Query Execution
  • We need the Top 5 CPU Consumers on the server
  • On SQL 2000 use a profile trace or a delta on …
  • On SQL 2005 query sys.dm_exec_query-stats cross apply sys.dm_exec_sql_text(qs.sql_handle)

CPU Bottleneck Cause #2

  • Low Plan Reuse
  • (Batch Requests – SQL Compilations) / Batch Requests
  • Hard to pin down specifically – more a general problem. Need to:
    • Look at how end users are submitting queries
    • Look for applications not using prepared statements (code snippets online set bad examples)
  • Excessive Recompilation (SQL Server 2005)

Memory bottlenecks

  • Assuming SQL Server is not starved of physical memory (i.e. no swapping), and we are re-using plans – from an application viewpoint, we are mostly interested in buffer cache behaviour
  • In OLTP applications, buffer cache bottlenecks are closely related to IO bottlenecks
  • Buffer Manager/Page Life Expectancy > 300 seconds is good

Memory Bottlenecks

Finding the top objects in the buffer cache

  • On SQL 2000 query syscacheobjects
  • On SQL 2005 query sys.dm_os_buffer_descriptors

IO Bottlenecks

  • Physical Disk Performance Counters
    • Avg. Disk Queue Length, Avg. Disk Sec/Read etc
    • Don’t forget to adjust for RAID
  • PAGEIOLATCH_* waits
    • 2000 – dbcc sqlperf(waitstats)
    • 2005 – select * from sys.dm_os_wait_stats

IO Bottleneck Cause #1 – Query Execution

We need the Top 5 IO Consumers on the server:

  • On SQL 2000 use a profiler trace – store results to a database table and query
    • (Looking for workload that does large average numbers of logical reads).
  • On SQL 2005 use sys.dm_exec_query_status
    • Look for queries with high average IO
    • this means this query is reading lots of rows
    • Ask the question – is this required?
  • Us equerries on sys.dm_dbindex_operational_stats to identify indexes that when read, required a physical IO
  • You can then XPath queries on sys.dm_exec_cached_plans to find workload using those indexes

Lock (and other) Contention

  • Contention usually manifests as blocking
  • Two common types
    • Environment related (waiting on WRITELOG or PAGEIOLATCH_* for example)
    • Application related (waiting on LCK_M_U for example)
  • Detecting blocking
    • On SQL 2000 use a query on the sysprocesses table to find spids that are blocked
    • On SQL 2005 use sys.dm_os_waiting_tasks
  • Patterns to look for
    • Single long wait
    • Large number of waits on single resource – “hotspot”
    • Large numbers of waits on large numbers of resources
    • All of the above chained together

Takeaways

  • Spotlight on SQL Server encapsulates all we talked about today…
  • Dashboard monitoring application

No comments:

Post a Comment