The Deepest Beauty of Nature

My past two posts have been firmly ensconced in the “hu” side of the lunghu dyad, so it’s time to venture back into the “lung” domain for a while.  That means commentary about topics other than rowing; in this case, a not-so-random walk from semantic tagging in Web 2.x into the realm of US – international relations.

This post started out to be something quite innocuous –praise for the work of Thompson Reuters in advancing the art of information provision and dissemination– but the particular example of Reuters’ work that I had intended to use as illustration ended up leading me in an entirely unexpected direction.   Serendipity, I guess.

The trigger for all this was a Reuters story that appeared last week concerning the disappearance and eventual repatriation of Iranian scientist Shahram Amiri.   The past two years there’s been something magical about spooks and summertime, summertime, sum-sum-summertime:  last year at just about this time we were treated to the enchanting saga of the disappearance and re-emergence of the lumber-laden MV Arctic Sea on its cruise to Cyprus/Lebanon nowhere.  2010’s summer storyline has involved a Russo-American spy swap, closely followed by tales of a Persian man of mystery.   Perhaps it’s precisely because of the way in which 2009’s Arctic Sea story unfolded (or re-folded, if you prefer) that Reuters editors elected to take the approach they adopted with the Amiri case.  Or maybe the copy their reporter filed was just so muddled, confusing, convoluted and incoherent that they decided to superimpose an external structure on the story just to provide some form of organization for an otherwise unruly mess.

The form Reuters selected for the Amiri article was that old journalism standby, the rhetorical Q & A piece — but with an innovative twist necessitated by the murky subject matter.  As you can see here, the response to each rhetorical question about a contentious aspect of the Amiri story is divided into subsections labeled “fact,” “unconfirmed,” “assertion,” or “speculation.”  In this case, Reuters probably also should  have added a category labeled “blatant spin,” but that might have been considered to indicate editorial bias of some kind.

Needless to say, I like some parts of this approach to presenting (dis)information.   Not so much that I employ it in my own blogging (here, it’s the reader’s job to separate faction from friction), but I like it.   I’d prefer lighter doses of the Q & A/call-and-response/declamation-and-chorus story superstructure, but I really like the explicit labels denoting the extent to which text statements depart from the centroid point of  “fact.”   Much most mass media “journalism” too often steers a course diametrically opposed to Reuters’, vigorously stirring fact, assertion, speculation and spin into a frothy, foamy confection with little nutritional value.   Just imagine a(ny) Fox News segment with video subtitles labeling the statements made on-air:  once you’ve finally contained your uncontrollable laughter, you’ll have to admit that Fox couldn’t possibly do it while continuing to claim “fair and balanced” news coverage.   ‘Nuff said:  I look forward to the YouTube mash-up.

But I digress.   The initial point of this post was to use Reuters’ Amiri article as a jumping-off point to discuss text processing and information evaluation, whether by a human reader or by a specialized software system.   Reuters has been at the forefront of Web 2.x innovation in journalism through its OpenCalais project, an open source software initiative that automates entity-identification/semantic tagging of online information (whether it’s Reuters’ content or someone else’s).   The result is structured (meta)data embedded within narrative text, which is a datatype traditionally considered by computer scientists to be “unstructured.”

At the moment, semantic tagging is primarily focused on named entities (nouns), with the goal of more precisely identifying which particular instance of an entity is meant by the author (disambiguation, in wiki-speak), and/or on identifying ontological relationships between entities.   Semantic tagging is not yet commonly used to denote qualitative aspects of text.   But momentum is gathering on many fronts.   An interesting recent paper from Stevanak and Carr at the Colorado School of Mines[!] points to methods of semantic text analysis that have a lot of promise.  Their research focused on differentiating fiction text from non-fiction on the basis of the internal semantic networks within each text formed by the order, proximity and sequence of the individual words themselves.

Similar approaches to deploying complex network theory in the service of information processing/analysis can and will focus on deeper levels of meaning embedded within text and oral narrative:  chains and webs of semantic ontologies; convergence with/divergence from some metric of accuracy; relevance of the text to information goals of the reader, or something else altogether.

The Reuters Amiri example falls far short of any of this, but that’s to be expected given where we are right now.   Distinguishing fact from assertion with a text label isn’t semantic tagging in the Web 2.x sense of the term because it doesn’t entail assigning actual data codes to text elements as a means to an automated information processing end.   And it’s not exactly an information quality metric either, since the Reuters label categories don’t precisely map to any of the roughly twenty or so dimensions of IQ.   But it’s a start, a rough-and-ready attempt at a level of explicit R & V attribution that’s rarely present in media reporting … and also missing from quite a bit of intell analysis writing.

It was this last observation that sent me off on an extended examination of the Amiri OsInt itself.  As is often the case with intell analysis, I reflexively began by evaluating available information in the context of what I thought I already knew about the overall operating environment.   Things took an interesting turn …

Next time:  Column … half left … march!


Tags: , , , , ,

%d bloggers like this: