So you’ve got a SQL Server that you want to monitor. What should you monitor? In my opinion, at bare minimum, you want to collect wait statistics. Let’s talk about a basic wait statistics collection.
Our new best friend: dm_os_wait_stats
This is the only place we need to query. It’s a DMV that shows the wait statistics for the entire instance since the instance rebooted.
Setting up a scheduled query to collect wait statistics
We need a table to store the wait statistics. Here’s a basic script for storing wait statistics. I added all the columns from the DMV above, along with a column for server name and the datetime when the data was inserted.
CREATE TABLE wait_statistics_history
(wait_statistics_id BIGINT IDENTITY(1,1),
wait_type NVARCHAR(60),
waiting_tasks_count BIGINT,
wait_time_ms BIGINT,
max_wait_time_ms BIGINT,
signal_wait_time_ms BIGINT,
server_name NVARCHAR(128),
insert_datetime DATETIME DEFAULT(GETDATE()))
GO
CREATE CLUSTERED INDEX cx_wait_statistics_id on
wait_statistics_history(wait_statistics_id);
And here’s a query to insert into that table.
INSERT INTO [dbo].[wait_statistics_history]
([wait_type]
,[waiting_tasks_count]
,[wait_time_ms]
,[max_wait_time_ms]
,[signal_wait_time_ms]
,[server_name]
)
SELECT
wait_type
,waiting_tasks_count
,wait_time_ms
,max_wait_time_ms
,signal_wait_time_ms
,@@SERVERNAME
FROM sys.dm_os_wait_stats
Okay, what now?
Now we have a way to track wait statistics over time. Feel free to wrap that query in a stored procedure, and execute from a SQL agent job. I’d recommend collecting once an hour, but you can always change that threshold.
Also, consider filtering out wait statistics that you don’t care about. There’s a lot of resources online for finding which wait statistics are good vs bad.
Thanks for reading! Stay tuned.
Looks very much what we started collecting a few years ago. This basic framework can be very useful. We’ve added additional history tables for sys.dm_exec_query_stats and useful performance counters like Batch Requests/sec. (Though the counter name says per second, the counter is actually the # of batch requests since the instance started.)
We also created a stored procedure to let us query the deltas between two WSIs (wait stats ID’s) as well as an SSRS report to show them over the past 24 hours. I cant tell you how many times this information has helped us.