In this last post of the XQuery for the SQL programmer series, I would like to spend a few minutes on performance. The previous post listed a dozen data integration uses cases, deploying an XQuery engine on top of your SQL database. The question is of course how performant such solution can be.
If you have a rather naive implementation retrieving the complete table (or multiple tables) and subsequently perform queries on an in-memory representation, well of course, performance will be unacceptable slow. If it works at all, once you start to query your production database with millions of records.
The tricky part is to have a performant and scalable XQuery engine, that is capable of translating XQuery straight into SQL. And we believe DataDirect XQuery is...We wrote a white paper about translating XQuery to SQL, showing concrete XQuery queries and the corresponding SQL. I would advice to read the document, but in short, the SQL generation is based on the following principles
- Minimize data retrieval
- Leverage the database strengths
- Optimize for each database
- Retrieve data efficiently
- Support incremental evaluation
- Optimize for XML hierarchies
- Give the programmer the last word
And of course, when it comes to answering your data integration challenges, it's a matter of joining and aggregating relational data with other formats in the most optimal way. We have blogged about this topic before, but there is of course much more to say. Looks like I should spend some more blog-time on the performance and scalability aspects of data integration through XQuery.
And remember, performance is one aspect, developer's productivity is also important. Think of all the APIs to master, Java code to write - and maintain! - to combine multiple data sources, while all this can be done in a single XQuery.