As you know by now, DataDirect XQuery provides an easy and efficient way to aggregate data available in a variety of data sources and formats, like Relational Databases, XML documents, Web service responses, flat files, EDI files and so on.
But, as you also well know, the world of data storage and access is very messy; there are (and always will be) protocols, data stores and formats that are not supported out of the box even by the most sophisticated data integration tool. For example, lately a few DataDirect XQuery users have asked about the possibility to access LDAP directory services to create reports that include information available in part in RDBMS and in part in LDAP directories. DataDirect XQuery doesn't support LDAP directories out of the box; but it does support the possibility to extend the variety of supported data sources through several extension methods, like custom URI resolvers, Java extension functions or custom collection URI resolvers.
So, I started thinking about what would be the best way to expose LDAP directory access from XQuery, and I came up with the following requirements for the example that I wanted to make available to our users: - Access to the LDAP directories must be highly scalable: the same way we rely on XML streaming processing and sophisticated SQL generation and result sets consumption, we need to make LDAP access work in a streaming fashion (in fact, one of the users interested in this functionality was planning to process hundreds of thousands of LDAP records from within XQuery; better do that in a streaming fashion!) - Access to the LDAP directories must be available both as a custom URI resolver (it's very natural for the user to think in terms of doc("ldap://localhost:10389?...") URIs when accessing LDAP resources), and as Java extension functions (which can provide more flexibility when parameters are dynamically specified)
The real work was to create a StAX interface able to consume the results returned by LDAP search operations; exposing that interface as either a custom URI resolver or a Java extension function was a very simple job.
The result of this process is attached here. If you want to try it out, you'll just need to make sure your classpath includes the folder where you expand the ZIP file. Then, a simple XQuery like this will start returning you data that is stored in your LDAP directory service (I've tested these examples using Apache Directory Suite): [cc lang="xquery"] doc("ldap://localhost:10389?auth=simple &principal=uid=admin,ou=system&pwd=secret &name=ou=users,ou=system&filter=cn=*") [/cc]
For me, this is the result I get: [cc lang="xquery"]
01803 781 555-555 ipedruzz@datadirect.com ivanpedruzzi organizationalPerson pedruzzi Ivan Pedruzzi
01880 781 555-666 cinnocent@datadirect.com carloinnocenti organizationalPerson innocenti Carlo Innocenti
...[/cc]
As you would imagine, the data looks like XML; and at this point you can handle it as if it was stored in any normal XML document referenced through the doc() function. So, suppose for example you want to merge personal data stored in LDAP with other information stored in relational database; for example, suppose you are a telcom company and you want to retrieve all details about your subscribers who own cellphones with GPS capabilities in a specific ZIP code: [cc lang="xquery"]
{ for $subscribers in collection("subscribers")/subscribers , $phones in collection("phones")/phones where $phones/id = $subscribers/phone and $phones/GPS = "yes" and $subscribers/zipcode = "01880" return
{ doc(concat("ldap://localhost:10389? auth=simple&principal=uid=admin,ou=system& pwd=secret&name=ou=users,ou=system&filter=uid=", $subscribers/id) )//email/text() }
{ concat($phones/brand," ",$phones/model) }
} [/cc]
A similar approach applies if you want to expose an LDAP directory through Java extension functions; the extension function mechanism is more flexible, and it also gives us a chance to cache some factory/connection objects to the LDAP service; so, it may be also more efficient depending on the problem you are trying to solve. The mechanism I wrote as an example relies on a single Java class with one constructor and one method; the constructor is used to create the factory object used by the call() method to actually perform the LDAP search. From an XQuery point of view, things look like this: [cc lang="xquery"] (: declare the namespace implementing the Java extension functions :) declare namespace ldap= "ddtekjava:com.ddtek.ldap.ldap"; (: declare the constructor and method functions as they are seen from XQuery :) declare function ldap:ldap($contextFactory as xs:string) as ddtek:javaObject external; declare function ldap:call($this as ddtek:javaObject, $server as xs:string, $port as xs:string, $pwd as xs:string, $auth as xs:string, $principalName as xs:string, $nameToFilter as xs:string, $filterExp as xs:string) as document-node(element(*, xs:untyped)) external;
(: create an "ldap" class object using a specific factory class :) declare variable $ldap:= ldap:ldap("com.sun.jndi.ldap.LdapCtxFactory"); (: execute the call() method specifying the required arguments :) ldap:call($ldap, "localhost","10389","secret","simple","uid=admin,ou=system", "ou=users,ou=system","cn=*")[/cc]
The result of this operation is equivalent to the execution of the doc() function above using the custom URI format; if I wanted to use the Java extension function to merge RDBMS and LDAP data, I would use an approach equivalent to what we saw before: [cc lang="xquery"]declare variable $ldap := ldap:ldap("com.sun.jndi.ldap.LdapCtxFactory");
{ for $subscribers in collection("ldap.dbo.subscribers")/subscribers, $phones in collection("ldap.dbo.phones")/phones where $phones/id = $subscribers/phone and $phones/GPS = "yes" and $subscribers/zipcode = "01880" return
{ ldap:call($ldap, "localhost","10389", "secret","simple", "uid=admin,ou=system", "ou=users,ou=system", concat("uid=", $subscribers/id))/ldap/item/email/text() }
{ concat($phones/brand, " ", $phones/model) }
} [/cc]
Both approaches will consume the results returned by the LDAP searches in a streaming fashion, thanks to the StAX interface used to interface the custom URI resolver and Java extension function to DataDirect XQuery. Feel free to dig in the attached sources to learn more about how that works, and/or to improve/change the behavior of this example.
So, an interesting example, but what's the point I'm trying to make here? As I mentioned before, data today is available in such a wide variety of stores and formats, and accessible through such a variety of different protocols and APIs that it's virtually impossible for data integration tools to support all of them out of the box. But as long as the data integration tool allows you to extend its behavior through flexible and scalable mechanisms like the one described here, and as long as the data model internally supported by the tool (XML, in this case) is powerful enough to accommodate virtually any kind of physical data model you need to access, then you can still leverage the power, flexibility, scalability and performance that the aggregation tool offers. And that's certainly true for DataDirect XQuery.