Thursday, October 30, 2008

Denormalization

I'm convinced that denormalization is a word used mostly by people who don't know how to normalize a schema in the first place. They generally fall into the pose of being a "real-world' dba as opposed to a ridiculous academic or relational purist. Often, the real-world dba's say they have denormalized a database when they have merely failed to normalize it.

Of course, denormalization is something that is often useful and occasionally necessary. However, my belief is that you should first put the schema into 3rd normal form and only then consider denormalization. Until you find a normal form for the schema how can you know whether you have properly understood the process you are trying to model?

Often denormalizers do not understand the long-term implications of what they are doing and build serious long-term performance problems into their design. Typically, you will find that a lot of cursors, colossal union queries, table-valued functions, etc are necessary to patch together the data in a poorly normalized design.

After a database has been in service for months or years it is almost impossible to make substantial schema changes to correct the problems. Often man-years of development have gone into the applications accessing the database and making all the changes that would be necessary is just not an affordable course of action. Developers are reduced to putting band-aids on band-aids to keep their applications chugging along.

Monday, October 13, 2008

More DMV's - Code Performance

I borrowed the original code from a Microsoft engineer who had posted it on the Microsoft website. I would credit him but I forgot who he was and I can't find the post now. At any rate I have added my touches.

This query helps you identify the code that is giving you the biggest problems. It pulls out execution information for every plan in the procedure cache. It aggregates totals and averages for key performance metrics, logical i/o, cpu, etc. It comes with multiple ORDER BY clauses so you can highlight the biggest users of any particular resource. Uncomment the clause you want to use.

The one I use most is logical i/o. I think that is the truest measure of code efficiency. Execution time will vary depending on a lot of factors not directly related to the efficiency of the code and indexing. If the data is in cache, the query will run a lot faster than if it has to page data off the disk. Blocking or a heavy load on the server will also change the execution time of a statement.

Physical i/o is an accident involving how much of the data happens to be in cache when the statement runs. It fluctuates randomly and is not a reliable metric.

But, given the same code, same data, same indexing and the same query plan, logical page reads will always be the same. That gives you a baseline to start from when you are optimizing. You can try indexing tricks, code modification, or forcing different query plans. If logical i/o is reduced, you know you are making progress.

Here is the code. I have throttled it down to the top 100 because a big system can have a huge number of plans in cache and you are only interested in the heavy-hitters anyway. I'll bet you already know that the procdure cache is flushed at each restart of sql server or execution of 'DBCC FREEPROCCACHE' so I won't mention it.


SELECT TOP 100 t.[text] AS [Batch],
SUBSTRING(t.[text], (qs.[statement_start_offset]/2) + 1,
((CASE qs.[statement_end_offset]
WHEN -1 THEN DATALENGTH(t.[text]) ELSE qs.[statement_end_offset] END
- qs.[statement_start_offset])/2) + 1) AS [Statement]
, qs.[execution_count] AS [Count]
, qs.[total_worker_time] AS [Tot_CPU], (qs.[total_worker_time] /
qs.[execution_count]) AS [Avg_CPU]
, qs.[total_physical_reads] AS [Tot_Phys_Reads],
(qs.[total_physical_reads] / qs.[execution_count]) AS [Avg_Phys_Reads]
, qs.[total_logical_writes] AS [Tot_Logic_Writes],
(qs.[total_logical_writes] / qs.[execution_count]) AS [Avg_Logic_Writes]
, qs.[total_logical_reads] AS [Tot_Logic_Reads],
(qs.[total_logical_reads] / qs.[execution_count]) AS [Avg_Logic_Reads]
, qs.[total_clr_time] AS [Tot_CLR_Time], (qs.[total_clr_time] /
qs.[execution_count]) AS [Avg_CLR_Time]
, qs.[total_elapsed_time] AS [Tot_Elapsed_Time], (qs.[total_elapsed_time]
/ qs.[execution_count]) AS [Avg_Elapsed_Time]
, qs.[last_execution_time] AS [Last_Exec], qs.[creation_time] AS [Creation Time]
FROM sys.dm_exec_query_stats AS qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) AS t
-- ORDER BY [Tot_CPU] DESC
-- ORDER BY [Tot_Phys_Reads] DESC
-- ORDER BY [Tot_Logic_Writes] DESC
ORDER BY [Tot_Logic_Reads] DESC
-- ORDER BY [Avg_Logic_Reads] DESC
-- ORDER BY [Tot_CLR_Time] DESC
-- ORDER BY [Tot_Elapsed_Time] DESC
-- ORDER BY [Count] DESC

Thursday, October 9, 2008

Using Data Management Views - Blocking

Finally, here's something useful:

Data management views and functions in SQL 2005 give you an easy way to look at critical server information but there is not a lot of documentation about using them. Here is an example of using a couple to record blocking information, including the blocker code and the blocked code. I've written it to run in an endless loop taking snapshots of blocking on the server at 5 second intervals. The interval is configurable of course. If you have ever tried digging this information out of a server by using sp_blocker_pss80, you will love the simplicity of this method.

In this example I have included code to create a temp table to store results but you might want to create a permanent table somewhere.

The loop will run from a query window until you kill it, then it will retrieve results and display them. This has the disadvantage of not letting you see results until you are finished. If you want to view blocking interactively you can just execute the SELECT part of the INSERT statement in another window.

Heres the code to create the table:

/*
create table #BlockingCode
(resource_type nvarchar(120)
,[database] nvarchar(128)
,[blk object] bigint
,request_mode nvarchar(120)
,request_session_id int
,waiter_text text
,blocking_session_id int
,blocker_text text
,batchdate datetime default getdate() )
*/


-- here's the code to capture and insert blocking information.

WHILE 1=1 -- endless loop, be careful
BEGIN
insert into #BlockingCode
(resource_type
,[database]
,[blk object]
,request_mode
,request_session_id
,waiter_text
,blocking_session_id
,blocker_text
)
select t1.resource_type
,db_name(resource_database_id) as [database]
,t1.resource_associated_entity_id as [blk object]
,t1.request_mode
,t1.request_session_id -- spid of waiter
,(select text from sys.dm_exec_requests as r --- get sql for waiter
cross apply sys.dm_exec_sql_text(r.sql_handle) where r.session_id = t1.request_session_id) as waiter_text
,t2.blocking_session_id -- spid of blocker
,(select text from sys.sysprocesses as p --- get sql for blocker
cross apply sys.dm_exec_sql_text(p.sql_handle)
where p.spid = t2.blocking_session_id) as blocker_text
from
sys.dm_tran_locks as t1,
sys.dm_os_waiting_tasks as t2
where
t1.lock_owner_address = t2.resource_address
WAITFOR DELAY '00:00:05'
END
GO

--retrieve results
select * from #BlockingCode

This code has the disadvantage of not showing the results until you stop loop execution, then it displays the results collected. If you want to take a quick look at what kind of blocking is going on, just copy the select clause of the insert statement into another query window and execute it when you wish.

Wednesday, October 8, 2008

Object\Relational Mapping and performance

One hears a lot about the impedence mismatch between modern object-oriented programming languages used for application development and 40 year-old SQL that is still used for the heavy-lifting in the database. Object/Relational mapping tools were developed to fix the mismatch. One day they will discover that they are trying to fix the mismatch from the wrong end. Until then, we will have to deal with the sometimes huge performance hit imposed by these tools.

Procedural programers love object/relational tools because they can write data manipulation code in the same object-oriented manner as their other application code. They love them so much that they often overlook the performance implications until it is too late and, even after, they have a hard time giving them up. Here's an illustration:

A while back, I got a call from a client in Silicon Valley. Their web application had suddenly had a dramatic downturn in performance. I jumped on a plane. In a very short time I found that there was one statement that was reading about 40 million 8k data pages every time it executed. It took about 12 minutes to run and was paralyzing the server. I took the SQL it was executing and ran it through query analyzer. It read about 10,000 pages and ran in a couple of seconds.

I went to the head of development and the developer himself with this information so they could straighten out the code. It had been written using an O/R mapping tool by an evangelistic java programmer without anyone else's knowledge. It was the only piece of code in a huge application that used o/r mapping and it had brought the application down.

There's more:

When I am on that client site they put me in the developers email group so I can follow all the conversations. By chance, later on the same day one developer sent an email around mentioning that he had read some things about this cool new O/R tool. The developer who had single-handedly brought down the application with this tool jumped into the conversation announcing that he was already using it and it worked great. He did not say a thing about performance.

Tuesday, October 7, 2008

Starting Here

Who am I and why do I think my thoughts about SQL Server would be useful enough for anyone to want to read them?

I am Kurt Survance, chief consultant and owner of SQL Consulting, Inc. http://sqlconsulting.com. My Microsoft certification card is dated 1993. It carries certifications for MCSE, MCDBA and MCP and is number 2535 in a series that now stretches past one million. You can see that I have been in this game longer than most. However, I would not recommend getting old just so you can be experienced.

I just got back from Redmond where I served on a panel of Subject Matter Experts (SME) reviewing one of the SQL 2008 exams. I am on the other end of the certification game now.

So much for experience. It is important but it's not everything. Eventually I will get around to writing something useful but for a little while I will be busy redesigning the look and feel of the blog page.