Wouldn’t it be nice if there was a place to easily find tips and tricks that help developers optimize their XQuery code in easy-to-follow steps? Look no further! Here is a checklist you can use as a go-to resource to help with XQuery.
MarkLogic runs transactions in one of two modes: query and update. The main difference is that query transactions are read-only and cannot perform update operations, while update transactions can perform update operations.
This is my #1 on the list because I myself accidentally run code in update mode often, which can result in significant performance losses—but as we will see, it’s very easy to check.
Before we get into this, please be aware that update statements are necessary to update data in the database. You absolutely want and need to use them.
However, there are times when you are running code that doesn’t update anything. Such query-only code should run in query mode. Here, we are addressing the case where you accidentally run your query in update mode.
When you run in update mode MarkLogic will use reader and writer locks as needed. It can be easy to accidentally run a transaction in update mode. Doing so could have dire consequences to performance. Take for example this query:
for $doc in fn:doc() where $doc/root/selected = true() return $doc
fn:doc(), when given no parameters, returns everything in the database. If this code gets run in update mode, then MarkLogic will assume that you plan to update documents, so it locks every single document returned by fn:doc(). And because write locks are exclusive, all multithreaded requests are forced to fight for locks. Lock contention is normal and expected, but it can degrade performance. Although a naive example, it illustrates the point. If you accidentally run code in update mode when you don’t really want to, then you can cause performance problems.
Lucky for us, there is a nifty trick we can employ to throw an error if we are running in update mode when we want to be in query mode.
let $assert-query-mode as xs:unsignedLong := xdmp:request-timestamp() for $doc in fn:doc() where $doc/root/selected = true() return $doc
They key here is the “let” statement:
let $assert-query-mode as xs:unsignedLong := xdmp:request-timestamp()
xdmp:request-timest returns an xs:unsignedLong
when running in query mode. But it returns empty-sequence()
when running in update mode. This let statement, if run from an update transaction, will fail to cast empty-sequence()
to xs:unsignedLong
, and thus throws an exception.
If you run your query and the exception is thrown, then you know that you are running in update mode. Now you have to track down the reason and try to fix it so your code can run in query mode.
Here are a few scenarios that could lead you to end up accidentally in update mode:
An example of some code that will cause update mode:
let $assert-query-mode as xs:unsignedLong := xdmp:request-timestamp() return if (fn:false()) then xdmp:document-insert("/test.xml", <test>This never gets called</test>) else "query mode stuff"
As you can see, the document-insert is never called, but MarkLogic still sees it right there in the code path and thus runs in update mode.
There is an entire chapter devoted to transactions in the MarkLogic User’s guide; I recommend that everyone read it at least once a year.
Step #2 on my list is very easy to overlook, but luckily, is a quick and simple fix.
Perhaps one of MarkLogic’s most famous APIs is cts:search. Every MarkLogic developer has used it at some point. If you are that developer and you find your search seems to be running slower than you’d like, perhaps you are accidentally running a filtered search.
What’s a filtered search?
Scott Parnell said it well in his blog post A goal without a plan is just a wish from November 2016:
By default,
cts:search
resolves queries in two phases. The first phase performs index resolution on the D-nodes. This initial result may contain false positives depending on the index configuration and the query. The second phase performs filtering of the results on the E-nodes, which examines the matched documents and removes false positives. If a query can be resolved completely from the indexes, then filtering is not required.
Assuming you have your indexes configured properly and you want to run unfiltered, let’s move forward with checking your cts:search
.
Looking at the docs for cts:search you will see this function signature:
cts:search( $expression as node()*, $query as cts:query?, [$options as (cts:order|xs:string)*], [$quality-weight as xs:double?], [$forest-ids as xs:unsignedLong*] ) as node()*
In order for cts:search to run unfiltered, you must specify “unfiltered” as an option in the third parameter.
cts:search(fn:doc(), $query, ("unfiltered"))
What gets most of us is that the default for cts:search is “filtered”. By not supplying the “unfiltered” parameter we are relying on the default, filtered behavior.
When code is not performing to your expectations, you should ALWAYS profile it. Don’t even bother optimizing it until you do. You will most likely waste your time fixing the wrong thing.
Lucky for us, MarkLogic makes it somewhat easy to do this.
From the MarkLogic Docs:
Query Console is an interactive web-based query development tool for writing and executing ad-hoc queries in XQuery, Server-Side JavaScript, SQL and SPARQL. Query Console enables you to quickly test code snippets, debug problems, profile queries, and run administrative XQuery scripts.
QConsole lives at http://yourserver:8000/qconsole/.
It includes a handy profile button to tell you where your query is spending the most time, which is great news if you can easily run your query.
Simply type in the code you wish to run, press the profile tab, then press the run button.
The results in the table are sorted from longest running to shortest (Shallow %). Look for unusually long running statements and see how you can optimize them.
Let’s say that you can’t just run your code in QConsole for whatever reason. Fear not, there are functions to help you.
Simply wrap your code like so:
prof:enable(xdmp:request()), for $x in (1 to 10) let $y := $x + 1 return $y, prof:disable(xdmp:request()), prof:report(xdmp:request())
The output of prof:report() is an xml representation of the profile report. If your code is buried deep within other code, you can either log the report with xdmp:log or insert it into a document with xdmp:document-insert() to look at later.
If you like to get fancy, you can create a reusable module with a function to log it for you.
(: name this file something like /path/to/profiler.xqy :) xquery version "1.0-ml"; module namespace profiler = "http://marklogic.com/profiler"; declare function profiler:profile($func) { let $req := xdmp:request() (: start profiling :) let $_ := prof:enable($req) (: call the code to profile :) let $result := $func() (: stop profiling :) let $_ := prof:disable($req) (: create the profile report :) let $report := prof:report($req) (: save report somewhere :) let $_ := xdmp:document-insert("/profile-output/" || fn:string($req) || ".xml", $report) (: log it too :) let $_ := xdmp:log(xdmp:describe($report, (), ())) return (: return the output, if any, from your profiled code :) $result };
That is fancy! How do I use it?
import module namespace profiler = "http://marklogic.com/profiler" at "/path/to/profiler.xqy"; profiler:profile(function() { (: put your code in here :) for $x in (1 to 10) let $y := $x + 1 return $y })
Now when you run your code, the profiler report will be logged to the ErrorLog and inserted at /profile-output/${requestid}.xml. It’s not as easy to read as the QConsole output, but it contains all the same information.
MarkLogic run queries in two phases. The first and fastest phase uses indexes to narrow the number of results. These initial results may contain false-positives depending on the index configuration and the query. The second phase performs filtering of the results. This second phase involves retrieving the documents from disk and verifying that they match the query. Documents that do not match (false-positives) are removed from the result set. Using the disk I/O when retrieving the documents is what usually causes queries to run slow.
In many cases you can rely solely on indexes to avoid the second phase. MarkLogic exposes many functions in the cts
and xdmp
namespaces for doing exactly this.
A common example is getting the number of documents that match a query.
fn:count(/customer[@paid=fn:false()])
This can perform poorly because it has to retrieve every customer matching the paid=”false” attribute from disk. To speed it up, use indexes instead.
xdmp:estimate(/customer[@paid=fn:false()])
Note that we use xdmp:estimate. The key to this being fast is that it runs unfiltered. It doesn’t perform the second phase of query resolution. That’s why its name is estimate. It’s relying on your indexes to give you a count. To ensure the resulting value is correct, you need to properly configure your indexes.
If after steps 1-4 you find that your performance bottlenecks are in your cts:search code, you will want to start looking at index optimization. I will refer you to Scott Parnell’s blog post A goal without a plan is just a wish from November 2016 which covers this in more detail.
This checklist demonstrates some of the ways that MarkLogic developers go about finding and fixing performance issues. For more information on the topics discussed in this article, read some of these:
Subscribe to get all the news, info and tutorials you need to build better business apps and sites