Tuesday, December 23, 2008

SQL 2008 cluster installer bug

I just got back from an on-site assignment in Denver. W were migrating to SQL 2008 on a new set of Windows 2008 clustered servers and ran into an installer bug. We completed the install of the first node but the install failed for the second node because of an invalid product SKU. The SKU was valid. It was slipstreamed in by Microsoft when the product was purchased and downloaded. It appeared correctly in the SKU input box just before the error. We tried several times with the same result and eventually called Microsoft.

Microsoft is well aware of the bug and has a hotfix which, unfortunately, didn't work for us either. Microsoft and others have also posted workarounds which all involve running setup from the command prompt with a varying number of parameters, depending on who you are talking to. Microsoft's example has an enormous number of parameters which you can find by following links from these pages: https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=363036

http://social.msdn.microsoft.com/forums/en-US/sqlsetupandupgrade/thread/13a256b0-b95b-46a7-92d5-17a0b763f8dd/

What finally worked for us was very simple: Execute this in the command prompt while logged in as an admin:

setup.exe /ACTION=AddNode /INSTANCENAME="MSSQLSERVER"

By the way, the install process for clustered instances has changed significantly. Now the cluster wizard has been broken into two wizards, one creates a single-node cluster (whatever that means). Then you must go back to the setup start screen and run the Add Node wizard for each node you want to add. You must run it from the node you want to add, not the primary node. SQL Server binaries are not moved to the second node until you do that.

Friday, December 5, 2008

Amateurism and SQL Server

The Marketing department drives everything at Microsoft, especially product design and development. A while back this group decided that SQL Server was an "administration-free" database. Their premise is that my grandmother should be able to design, construct and manage an enterprise data platform without having to bother about learning much of anything.

For example, the table creation wizard allows a completely unqualified person to actually create a table. It defaults to creating a primary key on an identity column using a clustered index.
This does insure that the table 1) has a PK 2) has a clustered index, and 3) the clustered index isn't going to cause page splitting, even though the table creator has little idea what any of this means. This is a good thing for someone with a small startup that needs a database and can't afford professional help. However, I see it in large corporations done by people who are considered professionals.

I have had discussions with many of them and found that because the wizard's default behavior, They believed that a primary key HAD to be clustered. Also because of the wizard's defaults, they have confused an identity with a real primary key. When I ask them what other column or group of columns is unique in the table, they don't know what I'm asking.

While helpful for an amatuer application, these defaults virtually assure that the database will not scale as a company grows. The placement of the clustered index is critical to good performance. Placing it on an identity in every case is a big mistake.

A nonclustered index will handle an identity lookup as well as a clustered index, so why waste the only clustered index on something that can be done as well by a nonclustered index? Clustered indexes can do certain things much better than nonclustered indexes, like range searches returning many records.

It only makes sense to place the clustered index on a column where it will make a difference. However, deciding which column to place it on requires knowledge and creative thought of which the wizard is not capable and apparently many db designers are not either. It is not that they don't know how to handle the issue: it is that they don't even know there is an issue.

Next post, why an identity is not a primary key.

Tuesday, November 25, 2008

Vista rant

If finally bit the bullet and bought a new laptop. I have avoided doing this for years because of the tremendous time investment necessary to load software and configure the system to do the things you need to do. Since I only do this every few years, I generally buy on the cutting edge of technology. It's pretty clear that x64 is the future of PCs so I opted for Vista Business x64 on a loaded Lenovo T61 (4gb, fast drive, etc). I finally have it under control, but it has been a struggle.

There are a lot of 32 bit programs that won't run on Vista x64 and the x64 versions don't exist yet. For example, we use Quickbooks for accounting and I use the Quickbooks timer to record billable time when I am working. I used to use it, apparently. It won't run on Vista at all.

You run into lots of issues like this, especially if you upgrade rather than do a clean install. I am fortunate that mine was a pre-install on the new laptop because I have found in my research that many programs I need and that are supposed to be compatible will only run on a clean install.

If you are thinking about Vista x64, you have to make sure the software you need is available on x64 before you buy.

As a prelude to my rant on Vista itself I want to mention that I am not a knee-jerk Microsoft basher. I make my living supporting a Microsoft product. But I am not an apologist either.

Microsoft has not made Vista an intrinsically more secure operation system. They have merely shifted the responsibility for doing unsafe operations from the OS to the user. Vista was probably designed by their legal department. Every time you start a non-MS program or change any sort of configuration option, you get a pop-up warning; Likewise when you open a website.

Now, I am a geek. I have been using computers since the days of 64k CPM programming. When I can't understand what the message means, how is a casual user supposed to know what to do with the pop-up?

Vista locks your computer down to an unusable degree. If you take all their defaults, you won't be able to do much with your computer.

The item that took the most time to resolve was the change in Remote Desktop. There are a lot of people on the web complaining about it, but I didn't find the answer there. It took trial and error to discover that in many circumstances you must use a full qualified domain name or ip address for the target machine on remote networks over a vpn connection. Just the hostname doesn't work. the vpn opens just fine, but then you cannot see or connect to any machine on the remote network.

Networking in general is a problem. The GUI interfaces and terminology have changed substantially.

Worst of all is the abstraction they have laid over the file system. It requires some effort to see the actual folder hierarchy. I know there are a lot of users out there who don't understand a hierarchical file system. They save documents and can't find them. However, I disapprove of dumbing down the user interface to make it more difficult for an expert user to do what they need to do.

Thursday, October 30, 2008

Denormalization

I'm convinced that denormalization is a word used mostly by people who don't know how to normalize a schema in the first place. They generally fall into the pose of being a "real-world' dba as opposed to a ridiculous academic or relational purist. Often, the real-world dba's say they have denormalized a database when they have merely failed to normalize it.

Of course, denormalization is something that is often useful and occasionally necessary. However, my belief is that you should first put the schema into 3rd normal form and only then consider denormalization. Until you find a normal form for the schema how can you know whether you have properly understood the process you are trying to model?

Often denormalizers do not understand the long-term implications of what they are doing and build serious long-term performance problems into their design. Typically, you will find that a lot of cursors, colossal union queries, table-valued functions, etc are necessary to patch together the data in a poorly normalized design.

After a database has been in service for months or years it is almost impossible to make substantial schema changes to correct the problems. Often man-years of development have gone into the applications accessing the database and making all the changes that would be necessary is just not an affordable course of action. Developers are reduced to putting band-aids on band-aids to keep their applications chugging along.

Monday, October 13, 2008

More DMV's - Code Performance

I borrowed the original code from a Microsoft engineer who had posted it on the Microsoft website. I would credit him but I forgot who he was and I can't find the post now. At any rate I have added my touches.

This query helps you identify the code that is giving you the biggest problems. It pulls out execution information for every plan in the procedure cache. It aggregates totals and averages for key performance metrics, logical i/o, cpu, etc. It comes with multiple ORDER BY clauses so you can highlight the biggest users of any particular resource. Uncomment the clause you want to use.

The one I use most is logical i/o. I think that is the truest measure of code efficiency. Execution time will vary depending on a lot of factors not directly related to the efficiency of the code and indexing. If the data is in cache, the query will run a lot faster than if it has to page data off the disk. Blocking or a heavy load on the server will also change the execution time of a statement.

Physical i/o is an accident involving how much of the data happens to be in cache when the statement runs. It fluctuates randomly and is not a reliable metric.

But, given the same code, same data, same indexing and the same query plan, logical page reads will always be the same. That gives you a baseline to start from when you are optimizing. You can try indexing tricks, code modification, or forcing different query plans. If logical i/o is reduced, you know you are making progress.

Here is the code. I have throttled it down to the top 100 because a big system can have a huge number of plans in cache and you are only interested in the heavy-hitters anyway. I'll bet you already know that the procdure cache is flushed at each restart of sql server or execution of 'DBCC FREEPROCCACHE' so I won't mention it.


SELECT TOP 100 t.[text] AS [Batch],
SUBSTRING(t.[text], (qs.[statement_start_offset]/2) + 1,
((CASE qs.[statement_end_offset]
WHEN -1 THEN DATALENGTH(t.[text]) ELSE qs.[statement_end_offset] END
- qs.[statement_start_offset])/2) + 1) AS [Statement]
, qs.[execution_count] AS [Count]
, qs.[total_worker_time] AS [Tot_CPU], (qs.[total_worker_time] /
qs.[execution_count]) AS [Avg_CPU]
, qs.[total_physical_reads] AS [Tot_Phys_Reads],
(qs.[total_physical_reads] / qs.[execution_count]) AS [Avg_Phys_Reads]
, qs.[total_logical_writes] AS [Tot_Logic_Writes],
(qs.[total_logical_writes] / qs.[execution_count]) AS [Avg_Logic_Writes]
, qs.[total_logical_reads] AS [Tot_Logic_Reads],
(qs.[total_logical_reads] / qs.[execution_count]) AS [Avg_Logic_Reads]
, qs.[total_clr_time] AS [Tot_CLR_Time], (qs.[total_clr_time] /
qs.[execution_count]) AS [Avg_CLR_Time]
, qs.[total_elapsed_time] AS [Tot_Elapsed_Time], (qs.[total_elapsed_time]
/ qs.[execution_count]) AS [Avg_Elapsed_Time]
, qs.[last_execution_time] AS [Last_Exec], qs.[creation_time] AS [Creation Time]
FROM sys.dm_exec_query_stats AS qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) AS t
-- ORDER BY [Tot_CPU] DESC
-- ORDER BY [Tot_Phys_Reads] DESC
-- ORDER BY [Tot_Logic_Writes] DESC
ORDER BY [Tot_Logic_Reads] DESC
-- ORDER BY [Avg_Logic_Reads] DESC
-- ORDER BY [Tot_CLR_Time] DESC
-- ORDER BY [Tot_Elapsed_Time] DESC
-- ORDER BY [Count] DESC

Thursday, October 9, 2008

Using Data Management Views - Blocking

Finally, here's something useful:

Data management views and functions in SQL 2005 give you an easy way to look at critical server information but there is not a lot of documentation about using them. Here is an example of using a couple to record blocking information, including the blocker code and the blocked code. I've written it to run in an endless loop taking snapshots of blocking on the server at 5 second intervals. The interval is configurable of course. If you have ever tried digging this information out of a server by using sp_blocker_pss80, you will love the simplicity of this method.

In this example I have included code to create a temp table to store results but you might want to create a permanent table somewhere.

The loop will run from a query window until you kill it, then it will retrieve results and display them. This has the disadvantage of not letting you see results until you are finished. If you want to view blocking interactively you can just execute the SELECT part of the INSERT statement in another window.

Heres the code to create the table:

/*
create table #BlockingCode
(resource_type nvarchar(120)
,[database] nvarchar(128)
,[blk object] bigint
,request_mode nvarchar(120)
,request_session_id int
,waiter_text text
,blocking_session_id int
,blocker_text text
,batchdate datetime default getdate() )
*/


-- here's the code to capture and insert blocking information.

WHILE 1=1 -- endless loop, be careful
BEGIN
insert into #BlockingCode
(resource_type
,[database]
,[blk object]
,request_mode
,request_session_id
,waiter_text
,blocking_session_id
,blocker_text
)
select t1.resource_type
,db_name(resource_database_id) as [database]
,t1.resource_associated_entity_id as [blk object]
,t1.request_mode
,t1.request_session_id -- spid of waiter
,(select text from sys.dm_exec_requests as r --- get sql for waiter
cross apply sys.dm_exec_sql_text(r.sql_handle) where r.session_id = t1.request_session_id) as waiter_text
,t2.blocking_session_id -- spid of blocker
,(select text from sys.sysprocesses as p --- get sql for blocker
cross apply sys.dm_exec_sql_text(p.sql_handle)
where p.spid = t2.blocking_session_id) as blocker_text
from
sys.dm_tran_locks as t1,
sys.dm_os_waiting_tasks as t2
where
t1.lock_owner_address = t2.resource_address
WAITFOR DELAY '00:00:05'
END
GO

--retrieve results
select * from #BlockingCode

This code has the disadvantage of not showing the results until you stop loop execution, then it displays the results collected. If you want to take a quick look at what kind of blocking is going on, just copy the select clause of the insert statement into another query window and execute it when you wish.

Wednesday, October 8, 2008

Object\Relational Mapping and performance

One hears a lot about the impedence mismatch between modern object-oriented programming languages used for application development and 40 year-old SQL that is still used for the heavy-lifting in the database. Object/Relational mapping tools were developed to fix the mismatch. One day they will discover that they are trying to fix the mismatch from the wrong end. Until then, we will have to deal with the sometimes huge performance hit imposed by these tools.

Procedural programers love object/relational tools because they can write data manipulation code in the same object-oriented manner as their other application code. They love them so much that they often overlook the performance implications until it is too late and, even after, they have a hard time giving them up. Here's an illustration:

A while back, I got a call from a client in Silicon Valley. Their web application had suddenly had a dramatic downturn in performance. I jumped on a plane. In a very short time I found that there was one statement that was reading about 40 million 8k data pages every time it executed. It took about 12 minutes to run and was paralyzing the server. I took the SQL it was executing and ran it through query analyzer. It read about 10,000 pages and ran in a couple of seconds.

I went to the head of development and the developer himself with this information so they could straighten out the code. It had been written using an O/R mapping tool by an evangelistic java programmer without anyone else's knowledge. It was the only piece of code in a huge application that used o/r mapping and it had brought the application down.

There's more:

When I am on that client site they put me in the developers email group so I can follow all the conversations. By chance, later on the same day one developer sent an email around mentioning that he had read some things about this cool new O/R tool. The developer who had single-handedly brought down the application with this tool jumped into the conversation announcing that he was already using it and it worked great. He did not say a thing about performance.

Tuesday, October 7, 2008

Starting Here

Who am I and why do I think my thoughts about SQL Server would be useful enough for anyone to want to read them?

I am Kurt Survance, chief consultant and owner of SQL Consulting, Inc. http://sqlconsulting.com. My Microsoft certification card is dated 1993. It carries certifications for MCSE, MCDBA and MCP and is number 2535 in a series that now stretches past one million. You can see that I have been in this game longer than most. However, I would not recommend getting old just so you can be experienced.

I just got back from Redmond where I served on a panel of Subject Matter Experts (SME) reviewing one of the SQL 2008 exams. I am on the other end of the certification game now.

So much for experience. It is important but it's not everything. Eventually I will get around to writing something useful but for a little while I will be busy redesigning the look and feel of the blog page.