I think, therefore I classify

July 18, 2012

Stevan Harnad, source WikipediaWell, what a fascinating talk on Monday on the cognitive basis of categorization. The occasion was the ISKO UK/BCS IRSG‘s day conference “I think, therefore I classify“. Among a whole array of interesting presentations – some ten altogether – I found Stevan Harnad’s the most exhilarating. Stevan is Professor in the Department of Psychology at the Université du Québec à Montréal and also Affiliate Professor in Electronics and Computer Science at University of Southampton.

Prof. Harnad reminded us that categorization is something we do from the minute we are born. It is our way of understanding and interacting with the world around us and underlies our very existence throughout our lives. It’s true to say that if we did not classify and categorize things we encounter, then we would not survive. I first understood the fundamental importance of classification after reading George Lakoff’s books “Women, Fire and Dangerous Things” and (with Mark Johnson) “Metaphors We Live By”.

Classification is so fundamental to the human condition that it’s understandable that it does not appear in the primary or secondary education syllabi. It’s innate; we do it without thinking. But where this subconscious function needs to be surfaced and examined in more detail (for example in LIS courses at the FE/HE level), why does it tend to be relegated to a mere footnote if indeed it is mentioned at all?

Sandra Knapp’s fascinating talk on biological classification – “Order out of chaos – classification and naming in biology” – helped me to appreciate that, to the man/woman-in-the-street, understanding the detail of classification techniques is irrelevant. Just as he/she has little interest in the detail of biology, so she/he can get along just fine without an appreciation of the detail of labelling things so they can be found when wanted.

So much for them in-the-street. But the issue remains that there are communities of professionals who would undoubtedly benefit from familiarity with the intricacies of classification, namely any individual or organization that needs to receive, store, retrieve and make sense of information. And that’s almost everyone!

It’s ironic therefore that classification techniques seem to be retreating into a small number of specialized domains where the immense value they add is hidden: libraries and other resource collections, e-commerce product filters, and perhaps what will be recognized as the apotheosis of classification/categorization, the semantic web. It is doubly ironic that LIS professionals are minority contributors to all but the first-mentioned.

“Raw Data Now!”

January 28, 2010

Meaning‘Data’ is not synonymous with ‘meaning’. Although in all the recent fuss about Sir Tim Berners-Lee’s attempt to overturn the UK Civil Service’s ingrained culture of secrecy, this might easily be overlooked.

The announcement of data.gov.uk is to be welcomed, but it is only the first step on a long and complex road. The fears expressed by the data custodians, that data might be interpreted differently from the way intended, just shows how much we are still governed by vested interests who act ‘in our own good’. Sorry, give us the data, and let us make our own interpretations, good or bad.

So, data.gov.uk is a good thing. But it could turn into the veritable Pandora’s Box without some kind of agreed framework within which data are interpreted and evaluated. I am indebted to the KIDMM community for flagging-up the fact that a European focus group has been working on this very problem for some time.

The all-Europe Comité Européen de Normalisation (CEN), is a rather shadowy organisation which seems to work on standards issues in the background, and then suddenly spring into the limelight with a proposal for a new ISO standard. One of their workshops – Discovery of and Access to eGovernment Resources (CEN/ISSS WS/eGov-Share) –  appears to have done precisely this with (I assume) a proposal to the SC34 working group (ISO/IEC JTC1/SC34/WG3). This working group is concerned with producing standard architectures for information management and interchange based on SGML, and their current focus is the Topic Maps standard Topic Maps (ISO/IEC 13250).

Well, you know me. Any mention of Topic Maps and I’m anybody’s. So when I hear of an initiative which has developed a proposal which specifies a protocol for the exchange of information about semantic descriptions which conforms to the Atom Syndication Format and the Topic Maps Data Model, and moreover, which works with semantic descriptions represented in XTM 1.0, XTM 2.0 and RDF/XML, then, well, Nirvana!

Thanks to KIDMM, if you’re interested (and you should be!), then this is where you can find the full specification of the protocol SDShare: Protocol for the Syndication of Semantic Descriptions.

Let us know what you think of it, and of its potential in making sense of the vast amounts of data due to be released on the Web.

Open Source Software: A Serious Option At Last?

January 24, 2010

Open SourceI am increasingly impressed by what open source (OS) software communities are offering. Not just in terms of the sheer range of applications, but by their quality too. That’s an observation vindicated by the recent award of DoD 5015.02 Records Management Certification  to Alfresco, according it the kudos of being the first open source product to demonstrate compliance with the strict DoD 5015.02 STD specification for records management. That’s a significant achievement even Microsoft can’t match.

If you visit the Mecca of OS developers, Sourceforge, you’ll find hundreds, if not thousands, of little niche applications of the sort often found on computing magazine cover-CDs, which will be of great use to some, but of no interest to most. But bear with them. Like any jumble sale or bric-a-brac market, you have to plough through the dross to find the jewels. One particular jewel I am playing with at the moment is VirtueMart.

VirtueMart is an OS online e-commerce application, allowing anyone to set up an online sales presence at an incredible level of detail and functionality. It runs under the OS Joomla! CMS, which in itself is a jewel. Although one has to give an equal plug to Mambo, the original OS CMS project from which Joomla! forked some five years ago. Both VirtueMart and Mambo utilise the LAMP development and deployment environment – Linux, Apache, MySQL, PHP – although I’m using the Windows variant WAMP.

Why is this relevant to those interested in KO? Well, because I can’t think of any more complex real-world application requiring solid KO expertise than an e-commerce site. VirtueMart has to support and integrate:

  • vendor indentity and brand
  • product classes, categories, instances and descriptions
  • manufacturer information
  • site visitors
  • existing customers
  • product reviews by customers
  • multiple payment methods
  • discount & coupon schemes
  • ordering & order status reporting
  • multiple tax regimes
  • shipping methods & rates

All of these entities (if that’s the right term) have numerous attributes which need to be configurable, depending on what you’re selling. The VirtueMart developers, all of whom have given of their time and expertise freely, have done a really impressive job. Might they have done even better, I wonder, if KO professionals had been prepared to donate their expertise?


In the course of sifting through Sourceforge, I discovered a number of applications relevant to KO. I shall be featuring these over the next few weeks in our KOOLTools section, as and when I have the time to test them. Bookmark it.

Trying to please everyone

September 18, 2009

One of the enduring attractions of our profession (that’s information management, knowledge management, records management, information science, knowledge organization – whatever you want to call it) for me, is that it impacts upon everything. Yes, literally, everything. When we build a taxonomy, relate descriptors in a thesaurus or assign keywords, we are mediators among a multiplicity of points-of-view, creeds and catechisms. But while that heterogeneity, that multicultural dimension, is often the root of our sense of fulfilment, contention can lie just below the surface.

To focus on one problem in particular, how can we know whether a taxonomy we build is ‘true’ – or perhaps ‘authoritative’? Is there such a thing as ‘universal truth’? Do we all see things the same way? Or, to put it another way, how do we distinguish between – and accommodate – the subjective and the objective?

For instance, when we build a taxonomy, or a navigation scheme for a web site, how can we capture the viewpoint of the majority, whilst also allowing for the individual – even idiosyncratic – point-of-view? Thus do philosophy and politics enter an otherwise cosy world.

It’s a problem addressed recently by Fran Alexander of the Department of Information Studies, University College London, who mounted a highly stimulating poster at ISKO UK’s conference on 22-23 June 2009. The poster provides an interesting first-sight of the complex nexus among business sector objectives, attendant socio-economic-environmental constraints, and the influence exerted by the relative subjectivity/objectivity of the domain.

The degree to which a conceptual framework is held in common, the coherence of interpretation of that framework among its stakeholders, and the terminological system designed to represent it, all depend upon a process of intersubjective creation of shared meaning within a defined socio-cultural context. In other words, politics. Taxonomy is therefore partly political, partly individual and partly pragmatic.

Melville Dewey deserves his place in the history of KO for his balanced accommodation of all three dimensions at the time he devised the DDC. But we’re over 130 years further on now, and the mix of political, personal and practical elements required to reflect current understanding of the world (or organization) has changed immensely. Dewey’s innocent assumptions drawn from the Weltanschauung of his time, appear at least inappropriate, sometimes biased and often incorrect in a 21st century context.

In a rather adept (and certainly persuasive) essay in the latest issue of Knowledge Organization*, Richard Davies asks ‘Should Philosophy Books Be Treated As Fiction?’. He makes the point that, in the terms used here, the intersubjective creation of meaning in the domain of philosophy has barely occurred; rather the opposite in fact, each philosopher seeming bent upon distinguishing his/her approach from predecessors. This occurs, although to a lesser degree, in most other domains as well, amongst them the 15 or so covered by Fran Alexander’s research.

Fran’s conclusion is that “The mediation of subjectivity/objectivity is becoming increasingly relevant in a ‘user-centric’ age.”. So, an awareness of the degree of ‘objectivity’ of a taxonomy project is becoming vital to its functional effectiveness, and this is inevitably governed to some extent by political considerations and the degree to which the role of the taxonomist is perceived to have a political dimension by those who provide the support for such projects.

This is an interesting piece of research and I urge you to take closer look at Fran’s poster, and to allow it to stimulate your own thoughts on the issues involved.

* Davies, Richard. Should Philosophy Books Be Treated As Fiction? Knowledge Organization, 36(2/3), 121-129.

Death of the document?

June 29, 2009

isko_loves_waveWith not even a soupçon of the quagmire I was entering, I recently looked up the definition of ‘document’. In case you didn’t know, the glib dictionary definitions hide a debate that has, well, not exactly raged, but rather limped on for nearly twenty years now. I don’t know, but I guess that it was the arrival of the digital ‘document’ with the first word processors in the early 1980s which sparked it in the first place.

It turns out that there’s no one definition of ‘document’ that everyone’s happy with. We can all agree what a cup is, or a bus, but not, it seems, a ‘document’. And to cap it all, a recent paper in the Journal of Documentation (Frohmann, Berndt. Revisiting “what is a document?”, JDoc 65(2), 2009) tells us that we shouldn’t bother anyway. Shame, I’d been planning to investigate where the ‘document’ stands in the light of Web 2.0, much as Steve Bailey and James Lappin are doing for records. And then what happens? Google announces the death of the document.

How so? Well, instinctively, we humans don’t welcome change. We are ruled by nostalgia – or rather, inertia. Come any new technology, we always try to replicate the old model within it, failing to see that it offers scope for completely new ways of doing things. Web 2.0 is just the catch-all term for a number of such new ways – new models of communication and interaction – Blogs, Wikis, Facebook, Twitter, LinkedIn and now, Google Wave. All of them are document-agnostic.


The team that developed Google Maps moved on to look at the various ways in which ICT supports the ways we communicate and share information. They range from the historic, fixed snapshot (documents, including email and blogs) through the quasi-dynamic SMS and IM to real-time telephony. In all of them, the concept of the link begins to eclipse the concept of the discrete document.

Google Wave integrates the best features of email and IM to move a significant step forward toward the ideals of the Semantic Web. The plus is that discrete, siloed documents are no longer the focus of communication. Rather, documents become just one element in a conversation. And a conversation, one might note, in which any kind of editor function has been eliminated. It remains to be seen how that disintermediation helps or hinders effective information sharing.


Wave offers four main innovative features which take it way beyond conventional email. The first tackles the problem of ‘threading’. A Wave starts with a message, just as in normal email, discussion lists, forums and blogs. However, Wave allows participants’ comments or replies to be embedded in-line in the original message adjacent to the text to which they refer. The logic of the would-be conversation is no longer fragmented across multiple, separate messages, linked only by a tenuous ‘thread’ which is easily broken. The advantages of this consolidation apply to attachments too, which are a pain to find again in anything but the shortest thread. A Wave therefore becomes a multi-participant conversation, complete with associated resources, attached or linked.

Wave’s second key feature builds upon the quasi real-time echoing of participant keyboard input familiar from IM applications. Google’s step forward in this case is to echo updates to all participant screens in as near real-time as current technology allows. No longer do you have to watch that scribbling pencil for seconds that feel like minutes; characters appear virtually as the writer types. This live, as-you-type updating works well with simultaneous multiple editing too.

Thirdly, Wave authors are allowed to specify the scope of participation, from public, to group, to private, and whether each member has read only, authoring or editing rights. The group and private categories can be expanded or contracted at any time.

Lastly, and perhaps the most significant feature of all, participants who join the conversation late don’t lose out. When they join a conversation in progress, they can simply click a button to see each and every change made to the original message up to that point, in a kind of slow-motion automated playback of a wiki page history. The Wave Playback facility could prove to be the silver bullet that records managers have been looking for to bring email under control and to tame the anarchic tendencies of Web 2.0. But it could equally be used also as a point-by-point versioning system where that’s useful.

Google have made the most of the opportunities provided by current technology by including further features, such as context-aware corrections as-you-type (‘Spelly’), detection and insertion of links as you type (‘Linky’), and ‘Polly’, a gadget for conducting surveys and polls. Particularly impressive is ‘Rosy’, a robot drawing on Google Translate which can translate in real-time, as you type, from any of 40 languages. There’s easy linking to Google Maps too, as you might expect, and yet more.

The original Wave video (1h20) can be found on YouTube, while Smarterware have chopped it up into eight 30-60 second chunks for those who can’t afford 80 mins. online. Alternatively, there’s an excellent summary of Wave on Mashable.

But by now you’re asking, ‘OK, nice, but so what?’

Changing how we work

Wave combines previously separate communication applications into an integrated communication space far better resembling what third generation knowledge management sophists revere – the conversation. It enables a whole new level of real-time disintermediated collaborative communication where the document is just one part of a greater whole – the conversation. What’s more, another of Wave’s robots – ‘Bloggy’ – allows Wave content to be published to blogs, or via the Wave API (Application Programming Interface), for whole Waves to be embedded in a blog, or in any Web page come to that.

As if that weren’t enough, Google are making the Wave source code, its XML-based communications protocol and its External API open source. That opens the floodgates for developers around the globe to create extensions and gadgets of any kind imaginable. There is already a Twitter extension –‘Twave’ – which integrates Twitter feeds within a Wave, incoming or outgoing. Although Google obviously hope that most Wave developments will be hosted by them, they are acknowledging the corporate perspective by allowing anyone to run their own Wave server. How that fits with their advertising-based business model remains to be seen.

Implications for KO

Possibly the single most significant thing about Wave is that Google are recognising the potentially unlimited development resources available through the open source community. And that’s where KO might just find a new lease of life. We’re all familiar with the ongoing debate, a little less polarized now than it was four years ago, on formal taxonomies versus folksonomic tagging à la del.icio.us or Technorati. Wave, it seems, has adopted a flat tagging approach similar to Twitter hashtags. However, there’s lots of room between the two for rapprochement, as evidenced by the emergence of RDF-style machine tags (triple tags) on Flickr a while back, or by Wikipedia’s extensive category tree. Open Intelligence, a knowledge-sharing site set up by ISKO UK member Jan Wyllie, is pioneering a faceted tag system which may just provide some clues to where KO might be going in the Web 2.0 world.

It would seem not unreasonable therefore to pose the question whether someone (ISKO UK?) might sponsor some research into how established KO techniques may be applied to findability in Google Wave? It could make for a challenging doctoral dissertation. Then, someone with the necessary technical savvy just might develop a Wave extension allowing tags to be selected from a thesaurus. An attractive prospect, methinks.

Let’s not play catch-up yet again. Let’s get involved!


August 1, 2008
Information Architecture

Colleagues have brought the following vacancies to my attention:

  1. Knowledge/Information Architect
  2. Senior Information Architect

Read the rest of this entry »

SKOS-based Semantic Terminology Services

April 22, 2008

{distilled from a posting by Doug Tudhope (University of Glamorgan) to the SKOS mailing list}
The STAR project has developed a pilot set of semantic web services, based upon the SKOS Core data model for thesauri and related knowledge organization systems. The services currently provide term look up across the thesauri held in the system, along with browsing and semantic concept expansion within a chosen thesaurus. This allows search to be augmented by SKOS-based vocabulary and semantic resources (assuming the services are used in conjunction with a search system).

In combination with a search system, the services allow queries to be expanded (automatically or interactively) by synonyms or by expansion over the SKOS semantic relationships. Expansion is based on a measure of ‘semantic closeness’. Anyone interested is welcome to inspect the API or download a client demonstrator, which is currently configured to operate with a subset of English Heritage thesauri.

The work is ongoing and the researchers welcome any feedback or interest in collaboration on developing an API to support a rich range of SKOS use cases.

Downloads and further details can be found at the Hypermedia Research Unit web site.