Data & AI MarkLogic

Avoiding Eval with First-Class Functions

by Justin Makeig Posted on May 24, 2016

The Evils of Eval

Most dynamic languages allow you to evaluate a string of code, for example eval, in JavaScript or Python. Eval is powerful (and mandatory) if you’re building an IDE. However, the benefits are usually greatly outweighed by the risks.

Evaluated code is much more difficult to write than inline code. In JavaScript, you have to escape things like quotes and line breaks and your editor probably won’t help you with syntax highlighting or type-ahead. Code hidden in strings also makes the code much more difficult to read, not to mention debug. However, these are minor compared to the potential security problems that eval introduces.

Injection attacks are a type of security vulnerability when data supplied by a user is interpreted or executed in a malicious or unexpected way. SQL injection is one of the most common occurrences (“Little Bobby Tables” anyone?), but any code that is evaluated is susceptible.

For example, the following code from a naïve calculator application takes a mathematical expression and returns the answer.

function calculate(expression) {
  return eval(expression);
}
'The answer is ' + calculate(request['expression']);

This works great for expressions like 1 + 1 or even Math.acos(3 * Math.PI). However, what if the user passed in System.shutdown() or database.clear() or users.findByID(1234).creditAccount(9999999999, '£')? The calculate() function would blindly execute these as well, with potentially dire consequences. Even if a user does not know what specific functionality is available in the target evaluation context, it is not very difficult to guess or get up to no good with just the core language. To implement our calculator safely, we should implement our own expression parser that can sanitize and validate inputs to make sure they are valid math expressions and not arbitrary code.

Evaluating Code in a Different Context in MarkLogic

MarkLogic provides built-in APIs to evaluate code. This is most useful as a means to run code in a context different than the request from which it was called, for example in a different transaction, as another user, or asynchronously on the task server.

This is useful in many ways:

Query, update, or insert documents into another database, for example, to write a schema into the Modules database or move documents from a staging to a separate production database.
Orchestrate multiple transactions in a single request. By default MarkLogic queues up all database updates and applies them atomically at the end of a request. If you need to store or view intermediate results, you’ll need to execute those in a separate transaction.
Run a query at a particular database timestamp. By specifying a an explicit timestamp to a query you can effectively get a consistent snapshot of the database, even across separate transactions.

Take a look at the options to xdmp.eval() for other ways to affect the context of evaluated code.

Like JavaScript’s built-in eval, xdmp.eval() takes a string of JavaScript and configuration options and runs the passed in code in the context of the options. For all of the reasons above, xdmp.eval() is generally to be avoided. A better option is to use xdmp.invoke(). Unlike xdmp.eval(), with xdmp.invoke() you specify a path to an existing module. Like xdmp.eval(), you can use the $vars argument to safely pass in dynamic parameters to the stored module. That’s a much safer way to parametrize evaluated code than building strings to eval. However, unlike xdmp.eval(), there’s no chance that an invoked module will unsafely evaluate an input. xdmp.invoke() uses the same set of context options that xdmp.eval() uses, so you can invoke a module in a separate transaction or as a different user.

Enter xdmp.invokeFunction()

Unfortunately, it’s not always feasible or convenient to isolate your dynamic code into its own main module. xdmp.invokeFunction() allows you to invoke any in-context function, even anonymous ones that you build on the fly. Think of it as a MarkLogic-enhanced version of Function.prototype.apply(). Moreover, xdmp.invokeFunction() allows you to separate the concerns of what the function does from the context in which it’s evaluated. This makes for cleaner code and easier testing.

Take, for example, the following trivial illustration. The xdmp.transaction() function gives the ID of the current transaction. Because the xdmp.invokeFunction() call specifies that the second call to xdmp.transaction() be run in a separate transaction you’ll get a different ID.

[
  xdmp.transaction(),
  xdmp.invokeFunction(xdmp.transaction, { isolation: 'different-transaction' })
]

The first call returns the transaction assigned to the current request. The second, using xdmp.invokeFunction() explicitly calls the xdmp.transaction() function in a different transaction. Note the use of xdmp.transaction sans parentheses. xdmp.transaction() calls the xdmp.transaction function. xdmp.transaction, no parens, is a reference to the function itself. The actual identifiers in the output below are not important. The fact that they’re different because of the evaluation context is important.

[
  "4394203566847635840", 
  "8340410512199485627"
]

Beyond xdmp.invokeFunction()

xdmp.invokeFunction() is the best way to run code in a different context with Server-Side JavaScript in MarkLogic. However, it requires that you pass it a zero-arity function, i.e. one that has no inputs, and always returns a ValueIterator, even if the invoked function returns an atomic value. With the magic of first-class functions in JavaScript, we can provide a friendlier version.

/**
 * Return a function proxy to invoke a function in another context.
 * The proxy can be called just like the original function, with the
 * same arguments and return types. Example uses: to run the input 
 * as another user, against another database, or in a separate 
 * transaction. 
 *
 * @param {function} fct     The function to invoke
 * @param {object} [options] The `xdmp.eval` options. 
 *                           Use `options.user` as a shortcut to 
 *                           specify a user name (versus an ID). 
 *                           `options.database` can take a `string` 
 *                           or a `number`.
 * @param {object} [thisArg] The `this` context when calling `fct`
 * @return {function}        A function that accepts the same arguments as
 *                           the originally input function.
 */
function applyAs(fct, options, thisArg) {
  return function() {
    var args = Array.prototype.slice.call(arguments);
    // Curry the function to include the params by closure.
    // `xdmp.invokeFunction` requires that invoked functions have
    // an arity of zero.
    var f = function () {
      // Nested ValueIterators are flattened. Thus if `fct` returns a ValueIterator
      // there’s no way to differentiate it from the ValueIterator that 
      // `xdmp.invokeFunction` (or `xdmp.eval` or `xdmp.invoke` or `xdmp.spawn`)
      // returns. However, by wrapping the returned Sequence in something else—
      // an array here—we can “pop” the stack to get the actual return value.
      return [fct.apply(thisArg, args)]; 
    };
    
    options = options || {};
    // Allow passing in database name, rather than id
    if('string' === typeof options.database) { options.database = xdmp.database(options.database); }
    // Allow passing in user name, rather than id
    if(options.user) { options.userId = xdmp.user(options.user); delete options.user; }
    // Allow the functions themselves to declare their transaction mode
    if(fct.transactionMode && !(options.transactionMode)) { options.transactionMode = fct.transactionMode; }

    return fn.head(xdmp.invokeFunction(f, options)).pop();
  }
}

applyAs() takes a function and the same options argument as xdmp.invokeFunction() and returns a new function that behaves just like the input, but will be invoked in the context determined by the options. Thus, downstream consumers don’t need to be aware that the function is being invoked in a different context and can call the function as if it were the original function. For example, the (contrived) insert() function below takes a URI and string message, saves a document to the database, and returns a string.

function insert(uri, message) {
  xdmp.documentInsert(uri, { message: message }, xdmp.defaultPermissions(), xdmp.defaultCollections());
  return message;
}

var myInsert = applyAs(insert, { database: 'Modules', transactionMode: 'update-auto-commit' });

myInsert('/hello.json', 'Hello, world!');

myInsert() has the same “signature” as the insert function but hides its evaluation context, simplifiying usage, very similar to applying around advice in aspect-oriented programming.

This approach is a lot cleaner and has a clearer separation of the logic and the orchestration than something like the following:

function myInsert(uri, message) {
  return fn.head(
    xdmp.invokeFunction(function() {
      xdmp.documentInsert(uri, { message: message }, xdmp.defaultPermissions(), xdmp.defaultCollections());
    }, { database: '3616783675111452341', transactionMode: 'update-auto-commit' })
  );
}

Summary

To summarize, it’s almost always a bad idea to eval strings of code. This leaves you open to injection attacks and makes code more difficult to read and write. Instead, use xdmp.invokeFunction() in MarkLogic Server-Side JavaScript to run a function in another context, such as in a separate transaction, against another database, or as another user. First-class functions in JavaScript can help you write a better xdmp.invokeFunction() that can be used to wrap existing functions, hiding the change of context from consumers.

Stay safe out there.

Justin Makeig

View all posts from Justin Makeig on the Progress blog. Connect with us about all things application development and deployment, data integration and digital business.

Related Tags

MarkLogic

MarkLogic

Semaphore

OpenEdge

DataDirect

Sitefinity

Telerik

Kendo UI

Corticon

DataDirect

MOVEit

Chef

Flowmon

Kemp LoadMaster

WhatsUp Gold