Walking Among the JSON Trees: How to Recursively Transform JSON

August 11, 2016 Data & AI, MarkLogic

Many of us know how to recursively transform an XML structure using the typeswitch expression. In fact, the Transforming XML Structures With a Recursive typeswitch Expression chapter of the Application Developer’s Guide has a great example. There is even a blog post about recursively transforming XML by Dave Cassel. But what if you want to recursively transform JSON? I set out to write a reusable function to do just that. Allow me to “walk” you through it.

JSON Object Model

First, let’s review some basics. Understanding the JSON object model is crucial to our understanding of how to transform JSON. The MarkLogic Application Developer’s Guide explain it quite well. It boils down to a handful of node types for JSON: document-node, object-node, array-node, number-node, boolean-node, null-node, and text. I encourage anyone working with JSON to read the Working With JSON chapter for a full understanding of the concepts. These node types will allow us to use the typeswitch expression to recursively walk our JSON objects.

Typeswitch Expression

From the XQuery 3.0 W3c spec: The typescript expression chooses one of several expressions to evaluate based on the dynamic type of an input value. A super barebones typeswitch for JSON would look like this:

typeswitch($n)
  case document-node() return
    $n
  case object-node() return
    $n
  case array-node() return
    $n
  case number-node() return
    $n
  case boolean-node() return
    $n
  case null-node() return
    $n
  case text() return
    $n
  default return
    $n

Given a node $n, the typeswitch will execute the code in the case statement matching the node’s type.

Adding Recursion

Let’s add some recursion to the mix.

declare function local:walk-json($nodes as node()*, $o as json:object?) {

  for $n in $nodes
  (: if this node has a name then turn it into a string. This is the json key.
   : We use the ! operator to conditionally assign the string. If no node name exists
   : then $key will be the empty sequence
   : See https://developer.marklogic.com/blog/simple-mapping-operator for more details on !
   :)
  let $key := fn:node-name($n) ! fn:string(.)
  return
    typeswitch($n)
      case document-node() return
        (: if it's a document node then start with the root node :)
        local:walk-json($n/node(), $o)
      case object-node() return
        (: create an in-memory json object :)
        let $value := json:object()
        (: recursively walk every child of this object and put
           them into our json object :)
        let $_ := local:walk-json($n/node(), $value)
        return
          (: any non-root object will have a name :)
          if ($key and fn:exists($o)) then
            ( map:put($o, $key, $value), $o )
          (: return the new object :)
          else
            $value
      case array-node() return
        (: create an in-memory json array to hold the values :)
        let $value := json:to-array(local:walk-json($n/node(), ()))
        return
          if (fn:exists($o)) then
            ( map:put($o, $key, $value), $o )
          else
            $value
      case number-node() |
           boolean-node() |
           null-node() |
           text() return
        let $value := $n
        return
          if (fn:exists($o) and fn:exists($value)) then
            ( map:put($o, $key, $value), $o )
          else
            fn:data($value)
      (: this is our failsafe in case we missed something :)
      default return
        $n
};

(: invoke it like so :)
for $doc in fn:doc()
return
  local:walk-json($doc, ())

So what exactly is happening here?

We iterate over the node or nodes coming in:

for $n in $nodes

Then we use a typeswitch to branch the code that handles each type of JSON node:

return
  typeswitch($n)

For document-node, we simply pass through the 1st child back to the function:

case document-node() return
  (: if it's a document node then start with the root node :)
  local:walk-json($n/node(), $o)

For object-node, we construct a new object and then recursively add the key/value pairs back into it:

case object-node() return
  (: create an in-memory json object :)
  let $value := json:object()
  (: recursively walk every child of this object and put
     them into our json object :)
  let $_ := local:walk-json($n/node(), $value)
  return
    (: any non-root object will have a name :)
    if ($key and fn:exists($o)) then
      ( map:put($o, $key, $value), $o )
    (: return the new object :)
    else
      $value

For an array-node, we create a new array and add back the recursively transformed values:

case array-node() return
  (: create an in-memory json array to hold the values :)
  let $value := json:to-array(local:walk-json($n/node(), ()))
  return
    if (fn:exists($o)) then
      ( map:put($o, $key, $value), $o )
    else
      $value

For the rest of the value types, we can do the exact same thing. Just add it into our object:

case number-node() |
     boolean-node() |
     null-node() |
     text() return
  let $value := $n
  return
    if (fn:exists($o) and fn:exists($value)) then
      ( map:put($o, $key, $value), $o )
    else
      fn:data($value)

And then we have our failsafe: simply returning the node:

(: this is our failsafe in case we missed something :)
default return
  $n

A few tricky parts you might notice…

( map:put($o, $key, $value), $o )

This is tricky for two reasons. Number one is that the json:object() method is really returning a map. So we use the map:* functions to operate on it. The second reason this code is tricky is because of the way it is written as a sequence with $o. The map:put call returns the empty sequence. Sequences collapse in XQuery, and thus:

( map:put($o, $key, $value), $o )

is equivalent to

( (), $o )

is equivalent to

( $o )

is equivalent to

$o

So this code is effectively returning $o after first adding a value to the map.

The Visitor Pattern

At this point we have a function that walks the tree and simply makes a copy of the node. But how do we make changes? We can use the Visitor Pattern to call into a function to make the changes. This will let us isolate our alteration code from our tree walking code. Here is the code with the visitor pattern in place.

xquery version "1.0-ml";

declare option xdmp:mapping "false";

declare %private function local:_walk-json($nodes as node()*, $o as json:object?, $visitor-func as function(*)) {

  (:
   :  This closure handles some of the boilerplate of calling the visitor
   :  It merely exists to keep the rest of this function cleaner
   :)
  let $call-visitor := function($key, $value) {
    let $response-map := map:new((
      map:entry("key", $key),
      map:entry("value", $value)
    ))
    let $_ := $visitor-func($key, $value, $response-map)
    return $response-map
  }

  for $n in $nodes
  (: if this node has a name then turn it into a string. This is the json key.
   : We use the ! operator to conditionally assign the string. If no node name exists
   : then $key will be the empty sequence
   : See https://developer.marklogic.com/blog/simple-mapping-operator for more details on !
   :)
  let $key := fn:node-name($n) ! fn:string(.)
  return
    typeswitch($n)
      case document-node() return
        (: if it's a document node then start with the root node :)
        local:_walk-json($n/node(), $o, $visitor-func)
      case object-node() return
        (: create an in-memory json object :)
        let $oo := json:object()

        (: recursively walk every child of this object and put
           them into our json object :)
        let $_ := local:_walk-json($n/node(), $oo, $visitor-func)

        (: give our visitor function a chance to alter the key or value :)
        let $r := $call-visitor($key, $oo)
        let $key := map:get($r, "key")
        let $value := map:get($r, "value")
        return
          (: any non-root object will have a name :)
          if ($key and fn:exists($o) and fn:exists($value)) then
            ( map:put($o, $key, $value), $o )
          (: return the new object :)
          else
            $value
      case array-node() return
        (: create an in-memory json array to hold the values :)
        let $aa := json:to-array(local:_walk-json($n/node(), (), $visitor-func))

        (: give our visitor function a chance to alter the key or value :)
        let $r := $call-visitor($key, $aa)
        let $key := map:get($r, "key")
        let $value := map:get($r, "value")
        return
          if (fn:exists($o) and fn:exists($value)) then
            ( map:put($o, $key, $value), $o )
          else
            $value
      case number-node() |
           boolean-node() |
           null-node() |
           text() return

        (: give our visitor function a chance to alter the key or value :)
        let $r := $call-visitor($key, $n)
        let $key := map:get($r, "key")
        let $value := map:get($r, "value")
        return
          if (fn:exists($o) and fn:exists($value)) then
            ( map:put($o, $key, $value), $o )
          else
            $value
      (: this is our failsafe in case we missed something :)
      default return
        $n
};

(: Main entry point for walking the json tree :)
declare function local:walk-json($nodes as node()*, $visitor-func as function(*))
{
  local:_walk-json($nodes, (), $visitor-func)
};


let $doc := fn:doc('/uri/of/some.json')
return
  (: notice how we pass in a closure that takes 3 parameters
   : $key is the key of the json property. it might be the empty sequence
   : $value is the value of the json property. it might be the empty sequence
   : $output is a map to hold the final key and value.
   :
   : The idea is that your visitor function is called when each node is visited.
   : You can...
   : 1. do nothing. that means return ()
   : 2. alter the key
   : 3. alter the value
   : 4. prevent the key/value pair from ending up in the output by putting () in value
   :
   : The following example makes changes to the JSON to illustrate the point
   :)
  local:walk-json($doc, function($key, $value, $output) {
    if ($value instance of json:object) then
      ()
    else if ($value instance of json:array) then
      ()
    else if ($value instance of number-node()) then
      ()
    else if ($value instance of boolean-node()) then
      ()
    else if ($value instance of null-node()) then
      ()
    else
      ()
  })

What’s going on with this new version?

This version adds the $call-visitor closure. $call-visitor is simply a convenience function to allow us to build a map and call the user supplied visitor function.

(:
 :  This closure handles some of the boilerplate of calling the visitor
 :  It merely exists to keep the rest of this function cleaner
 :)
let $call-visitor := function($key, $value) {
  let $response-map := map:new((
    map:entry("key", $key),
    map:entry("value", $value)
  ))
  let $_ := $visitor-func($key, $value, $response-map)
  return $response-map
}

for each node type we have added a call to $call-visitor. This gives the user supplied visitor function a chance to alter the key or value in the map. We then assign the values from the map into $key and $value and use them just like we did before.

let $r := $call-visitor($key, $oo)
let $key := map:get($r, "key")
let $value := map:get($r, "value")

The example visitor function above is returning () and thus is merely making a copy of the JSON node. To transform the JSON node you would want to edit the “key” and “value” entries in the $output map.

Here is an example usage that modifies the JSON during the walk.

let $doc := fn:doc('/path/to/some.json')
return
  local:walk-json($doc function($key, $value, $output) {
    if ($value instance of json:object) then
      (: upcase all object keys :)
      map:put($output, "key", fn:upper-case($key))
    else if ($value instance of json:array) then
      (: reverse every array :)
      map:put($output, "value", json:to-array(fn:reverse(json:array-values($value))))
    else if ($value instance of number-node()) then
      (: negate any number :)
      map:put($output, "value", -$value)
    else if ($value instance of boolean-node()) then
      (: invert any boolean :)
      map:put($output, "value", fn:not(xs:boolean($value)))
    else if ($value instance of null-node()) then
      (: omit nulls by returning the empty sequence for the value :)
      map:put($output, "value", ())
    else
      map:put($output, "value", $value)
  })

Where Do I Get This in a Usable Form?

If you want to play around with this in QConsole, you can grab the QConsole Workspace file. Simply open up QConsole and choose WorkSpace => Import, then browse to the json-tree-walker.xml file that you downloaded.

I also created a Github Project containing this library. You can alternatively use Joe Bryan’s awesome mlpm utility to install the json-tree-walker module into your code.

Paxton Hare