Discoverability and Metadata

November 13, 2013 Data & AI, MarkLogic

Why Are STM Pubs Scared?

Last month at Frankfurt Book Fair, David Worlock chaired a thought-provoking panel. The panelists were Dr Sven Fund, CEO of De Gruyter; Jason Markos, Director of Knowledge Management for Wiley; Andreas Blumauer, CEO of Semantic Web Company and Dr Timo Hannay, CEO of MacMillan Digital Science.

David’s initial premise was that in STM publishing today people are thinking about knowledge not books, journals, articles etc. That includes, he said, knowledge pathways; methods for recording what we think and ways by which we develop our originality. The ensuing presentations and discussion were to focus on how this impacts metadata and discoverability and what their respective importance is. The elephant in the room however seemed to be more about what publishers will need to do to survive in the age of open access; free content and digital information flows.

So what are the fear-factors impacting STM in their mission to adopt new business models? It was Timo Hannay who used the phrase “Reasons to be Fearful” in his slides. Timo pointed out that there are multiple different metadata types (eg Bibliographic, Bibliometric, Usage, Review and Index) but that STM publishers are somewhat ambivalent about metadata because they see it as a double-edged sword. It is a necessity but they don’t want others creating it for their content and they certainly don’t want those third parties to make money from it. The idea he was more concerned with however was that “someone else’s metadata might be a substitute for my content.”

Sven Fund picked up the theme of metadata worries but from a different angle – the areas of management organization and complexity were the issues he focused on. He said that De Gruyter has 5-6 people focused on metadata full time but that some customers are still not happy with it and that part of the problem is the division in the industry around the management of content and data. How should this be addressed – not merely from a technical feasibility perspective but in terms of getting things done? He suggested thereafter that agreeing on a standard for metadata and then using it would be a huge step forward – this is a cultural rather than a technology discussion.

David Worlock asked whether the adoption of semanticswould be as slow as the adoption of XML and Jason Markos honed in on the underlying cause of concern by saying that semantic adoption had been slow thus far because no one had figured out how to make money from it. And making money is the central point – publishers are being driven into finding new business models that will enable them to monetize knowledge. This is really where the technology comes in – Timo Hannay said publishers needed to get better at IT because it is the driving force of the age and Andreas Blumauer suggested that publishers should focus on delivering semantic web software clients that enable the end user. The really dire warning was that if publishers don’t do that then Google will do it for them.

This connected back to Jason Markos’s opening keynote where he made the point that the Discoverability and Metadata world had become much more complex because companies like Google are now encrypting their key words. This means that you can’t tell what search led your consumer to your content – Google has locked up the metadata.

Despite the concerns expressed during the discussion, the overall feeling was cautious optimism. There seemed to be consensus around the idea that the publisher’s role is to add value to content through more and better structure and classification to enhance usefulness. In addition the creation of great metadata was core to the publishing process and publishers should be open to other people adding value to their content as well.

Andreas Blumauer used 3 themes in his slot that I believe also conveniently summarise the areas the session identified for publishers to consider when attempting to minimize the “fear factor”:

  • Discoverability
    • How do you make sure your information goes to where it really needs to?
  • Linkability
    • How context aware is your content – how linkable is it?
  • Serendipity
    • There are limits to what algorithms and machines can do. The human brain is a powerful thing and we shouldn’t stop using it.

Applicability and implementation are key to helping everyone move forward. The ensuing question and answer session continued these themes. The conclusion appears to be that there is no reason to be afraid if the industry takes a long term view and takes small steps towards reinventing itself and opening up the ecosystem.

Kate Tickner