Saturday, May 2, 2009

The Perils of XML Over-Use

Like all technologies, XML has its strong and weak points. It is useful in many situations. However it is often used in places where it is not appropriate. The performance cost of XML mis-use or over-use can be enormous.

I recently looked at the results of a performance comparison between data stored in 2 relational tables and the same data stored in a single xml-format table. The data consisted of approximately 100,000 orders and 1.2 million order line items. In the relational model they were stored as an Orders table and an OrderDetail table. The xml data was created in element format in one case and in attribute format in another. Both relational and xml data had similar indexes.

Below is a table summarizing the results. A cold execution means the query was executed after the buffers were flushed and no data was in cache. A hot execution was done after the data was in cache:

Data FormatCold Execution (ms}Hot Execution (ms)Storage (mb)
XML Element5937337054397
XML Attribute319031407
Relational Tables2501214

As you can see there is a HUGE performance degradation using element-style XML. It is on the order of 37,000% slower when hot execution times are compared with relational data retrieval. Attribute style XML fares better but it is still many times less efficient than data retrieval from relational tables.

As well as having performance issues, both forms of xml take almost twice the disk space of relational data as can be seen the Storage column of the table above.

XML has many legitimate uses in a database environment. It is a universally recognized data interchange format. It is self-documenting. It is ideal for exchanging small packets of information, storing configuration settings, etc. But high performance manipulation of large datasets is not something it does very efficiently.

Here is a URL to the actual test parameters and detailed results:

No comments: