“Raw Data Now!”

January 28, 2010

Meaning‘Data’ is not synonymous with ‘meaning’. Although in all the recent fuss about Sir Tim Berners-Lee’s attempt to overturn the UK Civil Service’s ingrained culture of secrecy, this might easily be overlooked.

The announcement of data.gov.uk is to be welcomed, but it is only the first step on a long and complex road. The fears expressed by the data custodians, that data might be interpreted differently from the way intended, just shows how much we are still governed by vested interests who act ‘in our own good’. Sorry, give us the data, and let us make our own interpretations, good or bad.

So, data.gov.uk is a good thing. But it could turn into the veritable Pandora’s Box without some kind of agreed framework within which data are interpreted and evaluated. I am indebted to the KIDMM community for flagging-up the fact that a European focus group has been working on this very problem for some time.

The all-Europe Comité Européen de Normalisation (CEN), is a rather shadowy organisation which seems to work on standards issues in the background, and then suddenly spring into the limelight with a proposal for a new ISO standard. One of their workshops – Discovery of and Access to eGovernment Resources (CEN/ISSS WS/eGov-Share) -  appears to have done precisely this with (I assume) a proposal to the SC34 working group (ISO/IEC JTC1/SC34/WG3). This working group is concerned with producing standard architectures for information management and interchange based on SGML, and their current focus is the Topic Maps standard Topic Maps (ISO/IEC 13250).

Well, you know me. Any mention of Topic Maps and I’m anybody’s. So when I hear of an initiative which has developed a proposal which specifies a protocol for the exchange of information about semantic descriptions which conforms to the Atom Syndication Format and the Topic Maps Data Model, and moreover, which works with semantic descriptions represented in XTM 1.0, XTM 2.0 and RDF/XML, then, well, Nirvana!

Thanks to KIDMM, if you’re interested (and you should be!), then this is where you can find the full specification of the protocol SDShare: Protocol for the Syndication of Semantic Descriptions.

Let us know what you think of it, and of its potential in making sense of the vast amounts of data due to be released on the Web.



David and Goliath 2.0?

May 1, 2009

There is superficial search and there is deep search. While Google is great at the first, it’s not so good at the second. There are some enterprise search applications which can claim the centre-ground between the deep and the superficial, but most of the runners in that particular race fall somewhere along the way and barely even glimpse the finishing line. Not that it matters any more, apparently, because if search analyst Stephen Arnold is right, search is dead.

Stephen Wolfram

Stephen Wolfram

Arnold is right that the domain of knowledge discovery is ripe for an orthogonal change – a disruptive intervention as complexity theorists would call it. Enter US-based British mathematician Stephen Wolfram. Wolfram is no stranger to orthogonal change, having published in 2002, a monster of a book entitled A New Kind of Science (NKS).

NKS essentially proposed that accepted scientific method be augmented by an inverted approach, whereby hypothesis is not solely tested by experimentation, but where experimentation may also generate hypothesis. At 1280 pages, it took me months to read, despite its author writing very lucidly about complex mathematical concepts (maths was never my strong point).

In NKS, Wolfram presents (in narrative and over 1000 illustrations) the results of years of computational experimentation with ‘simple programs’. Simple programs are typified by cellular automata – grids of cells, each of which can exist in some defined ‘state’ with finite values (+ or -, on or off, 1-2-3-4-5 etc.) in any number of dimensions, accompanied by certain rules regarding how adjacent cells may interact in time. Wolfram devised hundreds of such cellular automata and associated interaction rules, then explored, through his Mathematica computation engine, how each of them developed – or not – over time.

Wolframs depiction of his Rule 150

Wolfram's depiction of his Rule 150

Result of running Rule 150 over many iterations

Result of running Rule 150 over many iterations

He discovered that a significant proportion of them can produce surprisingly complex and sustainable patterns of results (illustrated in the book, as right), some resembling patterns discovered decades earlier by complexity pioneers such as Lorenz and Mandelbrot.

Wolfram was much criticized at the time NKS was published for not employing ‘proper’ scientific method in his research. That’s a bit like criticizing Einstein for straying outside the boundaries of Newtonian physics, it seems to me. He was also criticized for not having any immediate applications for his discoveries.

Well, seven years on, Wolfram appears to be striking back at his critics with the imminent launch of Wolfram Alpha, a ‘computational knowledge engine’ combining Mathematica with principles he first described in NKS.

What’s a ‘computational knowledge engine’? Well, PCMag (29 April, 2009) in the US reported:

“Wolfram Alpha has trillions of pieces of curated data,” Wolfram said. “We’re getting data from both free data and licensed data – some of it is very static. A lot is data from feeds that come into our system, and we’re running through this partially automated, partially human process, correlating data and verifying data. It’s set up so it’s organized and clean and computable.

Wolfram says that there are four main components to Wolfram Alpha (WA): data curation, internal algorithm and computation, linguistic understanding, and automated presentation. The first two components sound a bit like what Google does, and some commentators have gone as far as claiming that WA might even out-Google Google. However, WA appears to be a different kind of application altogether – a knowledge aggregator and synthesizer with real-time presentational graphics. The Washington Post (April 24, 2009) said:

When it was first unveiled in March, Wolfram Alpha, a new type of search engine created by computer scientist Stephen Wolfram, got a lot of buzz. Naturally, some people threw out the “Google killer” title; but it seems to be a different beast, as it’s all about knowledge search. That is to say, you ask a question, and you get an answer; with Google, you ask a question and you get a link to a bunch of documents. That may sound a bit bland, and simplistic, but the select few who have seen it, seem to think it works really well and could be a game changer.

There is considerable cynicism surrounding the WA announcement, and perhaps Google deserves to enjoy a brief whiff of schadenfreud before WA launches publicly in May. We’ve also yet to hear what the Semantic Web community thinks about WA and how it relates (if at all) to what they are trying to achieve. Until we know more about how Wolfram Alpha works and what kind of results it can produce over what domains of discourse, it’s difficult to form an opinion. You can find out whether all the fuss is warranted by keeping an eye on the Wolfram Alpha Blog and monitoring the responses in the specialist media.






Follow

Get every new post delivered to your Inbox.