A recent IBM DeveloperWorks article describes using XQuery, JTidy and DOM to query and transform HTML documents. StylusStudio has provided this functionality for sometime and includes JTidy as one of the built-in adapters. Mapping 'JTidied' documents is as easy as mapping from XML by using this URL format:
adapter:HTML?http://www.example.com/cool.htmlStylus provides a range of adapters including dBase, CSV and many others. It is even possible to leverage these libraries direct from the Java platform by writing a custom adapter.