How the Semantic Web Will Change Information Management: Three Predictions

October 25, 2008
Semantic Web Stack

Semantic Web Stack

I have doubts about “the wisdom of the crowd” as promoted by various writers on social networking tools. But I have no such reservations when it comes to the enhanced wisdom of smaller groups of connected individuals. Yes, ‘two heads are better than one’ often, and three, four or more can be even better in the right circumstances.

In my Knowledge Architecture workshop which I run for Aslib, I make sure that delegates hear me talk for five or so minutes about the Semantic Web (SemWeb, or Web 3.0 if you must), and in particular the role which ontologies will play. I do this by showing them a block diagram (above right) of the structure of the SemWeb as devised by the W3C. But now, I realise that it’s just not enough to describe the structure, and that I need to explain how it will fundamentally change how we can access Web-based information.

ISKO UK member Silver Oliver has recently had an article published in Freepint’s FUMSI network with the title I have used for this post. Despite the millions of words which must have been written to explain what the SemWeb is about, Silver’s explanation is the best account I have yet come across of how SemWeb KO techniques will change our approach to information management.

Read it – or regret it! I’ll certainly be including a reference to Silver’s article in future runs of my workshop.

Advertisements

Wikipedia’s approach to categorization

September 22, 2008

I was intrigued by Silver’s posting asking for information on Wikipedia’s approach to categorization. Since I was busy at the time, I hoped that someone else would respond, but no-one has. So in a brief spare moment, I have tried to work out what they’re doing myself.

Let’s say that it’s not obvious! There is plenty of documentation here and here and elsewhere. Perhaps the most significant clue to what they’re doing lies in the latter page. They say:

“Each Wikipedia article can appear in more than one category, and each category can appear in more than one parent category. Multiple categorization schemes co-exist simultaneously. In other words, categories do not form a strict hierarchy or tree structure, but a more general directed acyclic graph (or close to it; see below).”

The ‘see below’ refers to an image showing a representative sample of the category structure and this is where we get somewhat contentious. This image looks to me like a mish-mash of hierarchical and associative relationships (some of which are questionable IMHO) which is far closer to the realm of ‘real world’ perceptions than the neat, clinically precise representations of classic KO. Is this an example perhaps of ‘Freely Faceted Classification’ as described to us by our Italian colleague Claudio Gnoli at our Ranganathan Revisited event in November 2007? Or is it something else?

Taking a specific example, I used the CategoryTree tool to explore a section of the Wikipedia category structure. I specified ‘en.wikipedia.org’ as the Wiki and ‘transportation’ as the category in order to examine how ‘trains’ are represented. I made the facile (but not unwarranted) assumption that ‘trains’ would appear somewhere as a lower-level category of ‘Transportation’. Indeed it does, being reported as Transportation > Rail Transport > Trains.

What you note in passing though is interesting. ‘Transportation’ itself has parent categories ‘Industries’, ‘Technology by type’ and ‘Travel. Fair enough, I suppose, given that we embrace polyhierarchy and acknowledge the need to provide for multiple access routes to specific concepts. However, ‘Rail Transport’ and ‘Public Transport’ occur adjacent to each other at the same level. Hmmm. Some overlap of categories methinks, since I’m not aware of any form of rail transport which isn’t also public (except freight, but that’s out-of-scope for our purposes). But then, if you examine the sub-categories of ‘Public Transport’, you find that the principle of differentiation is quite different and at a higher level.

Screen shots of the CategoryTree hierarchies I examined are provided below so that anyone interested can peruse them before perhaps investigating the question for real online.

Conclusion? The Wikipedia categorization system reflects but does not consistently apply the principles of KO as expounded in the formal literature. It is nevertheless interesting because it might well represent what results when folksonomy meets formal KO and agrees to a compromise.

If anyone has the time and patience to analyze this interesting phenomenon further and comment on it, then I for one would be grateful. And I’m sure Silver Oliver would too, since he and his colleagues at the BBC have invested considerable effort in building a system which utilizes Wikipedia topics as subject identifiers for their own internal use. Obviously, they would like to know if Wikipedia’s categories can be utilized ‘as is’ or whether they need to embark on a categorization exercise of their own.


Faceted directory of Clipmark clips

September 22, 2007

I have just posted a faceted directory of my Clipmarks posts on http://www.trendmonitor2.wordpress.com. I did this by making up a faceted “meta-language” to describe each clip which produced a kind of abstract in the most general terms. I tried to be as consistent as I could from just memory. I then analysed my tags into facets. It is now possible to see quickly what I am following (for better or worse). It also gives a small taste of what much bigger faceted directories of this kind of material might look like.