As you may know, Santa’s big operation up North has always been ahead of the game. Santa already adopted NoSQL and semantics, but he is always looking for better ways to run his shop. This year, the big guy wanted to revamp Secret Santa, his early crowdsourcing project.
Santa wanted this to scale out beyond a simple group, so he turned to us — his elf innovators — for answers. The original NoSQL team, Team XML Old Skool, put in an effort. The data experts from Team Javascript, the fastest growing skillset up North, came up with a solution that made sure the data was secure every step of the way. Here is how we reworked Secret Santa.
It all started at the North Pole Christmas party. My elfmates and I had a bit too much Nog-SQL and started doodling on the white-board (because engineers with pointy ears know how to have fun). ACID is the main ingredient of our Nog-SQL, so we wanted to use it in our next Secret Santa release.
Getting to minimum viable product (MVP) was the agile way to test our approach, as we like to keep up our agility here at the North Pole where there are slippery ice patches and frolicking elephants to avoid. So we loaded a small sample set of givers and receivers:
declareUpdate(); const givenNames = ['Peter', 'Paul', 'Harry', 'Jill', 'Misha', 'Brian', 'Morgan', 'George', 'Addison', 'Charlie', 'Bronwin', 'CC', 'Prudence', 'Emily']; const surnames = ['Tanner', 'Tanner', 'Tanner', 'Tanner', 'Tanner', 'Johanssen', 'Johanssen', 'Johanssen', 'Johanssen', 'Smith', 'Smith', 'Smith', 'Smith', 'Jones']; const results = []; for (let i = 0; i < givenNames.length; i++) { const uri = '/person' + (i + 1) + '.json'; const doc = {'person': {'givenName': givenNames[i], 'surname': surnames[i] }}; const metadata = {collections : ['giver','receiver']}; xdmp.documentInsert(uri, doc, metadata); }
Notice that we made a bunch of JSON documents in a loop and inserted each one. JSON is a very familiar hierarchical document format that MarkLogic supports fully, which can be understood by elves all over the world. You might wonder about why we made one document per person, instead of putting all the people into one document, or putting groups of people in a handful of family documents. We could have chosen departments, teams, companies, locations or any other way of organizing them; so why’d we do it this way?
MarkLogic is a document database optimized for managing lots of small documents, and in our Secret Santa, we really want to manage each person in our database as a separate entity with separate security polices, metadata, and replication processes. We wanted to avoid naughty security breaches, so we wanted to ensure only authorized Secret Santas could assign recipients to gift-givers.
Our MVP is going to process all the givers in one pass:
//here we go let's start with the list of givers. That's a stable list we can load once const giverList = fn.subsequence(fn.collection('giver'), 1); //process every gift giver for (const x of giverList) { results.push(workItem(x)); };
We took advantage of the fact that each person document went into two collections, “giver”, and “receiver.” The function fn.collection
lets us grab the list of all the documents in the ‘giver’ collection. Then it’s simple to pass each one to workItem()
.
In workItem()
, we assign each giver a list of eligible recipients:
function workItem( giver ){ //who's a good recipient for giver? const receiverCandidates = queryRecipients(giver); … }
The list of eligible recipients has a constraint. We only allow gift givers to give to people outside their immediate families in our sample set. If you’re a Tanner you can give to a Johanssen or a Jones, but not another Tanner.
//find the candiates from the pool of recipients who aren't in the giver's family group const candidateReceivers = fn.subsequence(cts.search(cts.andNotQuery(cts.collectionQuery('receiver'),
We use a document search function, cts.search()
, with a composable cts.andNotQuery()
to find people in the receiver collection who don’t have the same family name as the giver (i.e., that’s the ‘Not’ part of the andNotQuery) .
Remember that once a giver gets a recipient, that recipient gets moved out of the pool of available recipients:
xdmp.documentRemoveCollections(xdmp.nodeUri(receiver), ['receiver']);
We used this update function to take a secret Santa participant off the recipient list.
So we’re done, right? What a convenient API! How smart we were to insert our people into collections! Each time we go through the loop, the ‘receiver’ collection shrinks by one. In effect, the next document search will never find it.
Uh-oh…This is where Santa’s helpers ran into a deadlock. We wanted to move people documents out of the ‘receiver’ collection so that the next pass through our loop of gift givers had a list of unclaimed recipients to choose from.
The loop was doing a search each time through based on collection membership and we were trying to move people out from under it. We realized that we needed to commit people to ‘receiver’ collection membership before we went through and processed the next person looking to play Santa, but we didn’t know how. So we tried different things. We refactored. We waited. We killed long-running queries. We drank a lot of java.
When you run JavaScript in QConsole, you start with a statement declareUpdate()
. You figure that entitles you to call xdmp.commit
whenever you need to, but unless you do a few other tricks, your QConsole script runs as a single transaction and none of your commits happen until the end.
Since it’s Christmas, I’ll tell you how to deal with this.
We have this code, which is the first thing we have to worry about because it does a read query:
const receiver = findEligibleReceiver(giver);
And this is the other piece we have to worry about because it updates the database:
//remove the recipient so they don't get two gifts removeReceiver(xdmp.nodeUri(receiver));
The remove has to happen before the next findEligibleReceiver
on the next pass through the loop. We remove our receiver by taking it out of the receiver collection, which updates our receiver document. If we don’t, our next gift-giver might think that our receiver is still available.
It turns out that if you’re running under declareUpdate
, you are letting JavaScript decide which timestamps to use for your queries and updates. And because commits aren’t guaranteed until after the last statement, you might query again before your update is committed.
So leave out the declareUpdate()
statement at the top of your code because you’re going to get hands-on. You’re going to manage each statement’s transaction context so that each time you search or update you’re going to make sure it’s on the most current view of the database.
Use this nifty technique to declare some functions inside a variable that are going to do your findEligibleReceiver
, and your collection deletion, and perform the commit explicitly, so following statements will see it. It looks like this:
function secretSantaList(uri,giver,doc,collections) { return { //this has to have a consistent view of the database //search the recipient collection for recipients //whose surname does not match the giver's queryRecipients: function queryRecipients () { return fn.subsequence(cts.search(cts.andNotQuery(cts.collectionQuery('receiver'), cts.jsonPropertyWordQuery('surname', giver.root.person.surname))), 1);}, removeFromCollection: function removeFromCollection() { xdmp.documentRemoveCollections(uri, collections);} }; };
To make this work, you declare it:
let myInvoke = secretSantaList (null,giver,null,null);
You then pass that myInvoke
variable (which is a function block as we used to say, which still blows my mind) to a MarkLogic API, xdmp.invokeFunction
.
//invoke our query using the function block const candidateReceivers = xdmp.invokeFunction(testInvoke.queryRecipients, { update:"true", commit: "auto" });
An important thing to note is that the function our function block’s queryRecipients
member points to is a zero arity fuction — it has no arguments. The ‘giver’ object was bound when you did the let
statement to make the myInvoke
variable.
Similarly, you use that xdmp.invokeFunction to move your person out of the ‘receiver’ collection and now, sadly, they will not get more than one present:
//remove the recipient so they don't get two gifts myInvoke = secretSantaList (xdmp.nodeUri(receiver),null,null,['receiver']); xdmp.invokeFunction(myInvoke.removeFromCollection, { Update:”true", commit: "auto" });
Our Secret Santa MVP demonstrates how to scale out a use case which, when kept small, can be handled with any number of in-memory data-structures and bit-shifts. But when you need to serve large, globally distributed groups, and the question is whether or not someone’s getting a present this year, you can leverage the power of a NoSQL document store like MarkLogic. We have provided some code snippets you can run in QConsole. If you do, you’ll notice that our simple algorithm proves our point about document searches and commits. A finished list of matches would take a more complex computation with multiple passes, like the Stable Marriage algorithm for instance, which produces the happiest arrangement of a set of brides and grooms.
Whatever method you use to pair your givers and receivers, the trick is managing your transactions with the invokeFunction
and the virtual function that does the explicit commit.
My elfmates and I on Team Secret Santa wish you the most consistently joyful holidays with persistent feelings of security and happiness!
Frank first joined MarkLogic in 2006 after a ten year career as a Computer Scientist at Adobe Systems, building collaboration, XML, and data-driven features for Creative Suite. At MarkLogic he was a Senior Principle Consultant, working for customers like Pearson, HMH, Publishers Press, McGraw-Hill and Congressional Quarterly. He left MarkLogic to serve as CTO at Spectrum Chemical & Laboratory Products, where he led an Oracle EBS migration, and an e-commerce website re-architecture that used MarkLogic for content-marketing. After Spectrum, he was Executive Director of Technology and UX at Kaplan Publishing, where he built a mobile content delivery platform for 200,000 students. In 2011, he rejoined MarkLogic and took a Solutions Director role, where he enjoys a mix of development, architecture, and sales projects. He tweets at @xmlnovelist.
Subscribe to get all the news, info and tutorials you need to build better business apps and sites