Information Flow: A Web of Constraints on the Giant Global Graph

It’s been almost two years since December 2, 2008 when I published the first use case for Open Government: Linked Open Data. It’s great to see the wide-spread interest that’s emerged as well as the early adoption that has begun to take place. There was a time when it wasn’t clear that it would. In those two years both the US and UK governments have incorporated Linked Data into their datagov approaches, RDFa-like languages have been adopted at Google and Facebook, and membership in Semantic Web Meetups has skyrocketed. The broader technology community is getting its first exposure to Linked Data.

When Tim Berners-Lee published the W3C Linked Data design issue in July of 2006 he introduced four rules that foster a “post semantic” Web. The term Semantic Web had become less associated with the value proposition of broad societal adoption and more closely associated with narrowly defined enabling technologies. To make a long story short, some rebranding was in order. The Linked Data meme rebrands the Semantic Web. More importantly the Linked Data rules redefine the necessary level of commitment to participate. They encourage a wider audience and allow for wider choice in the enabling technologies, both on the up and downside.

The rebranding continued in November of 2007 with introduction of the term Giant Global Graph (GGG). The GGG is a non-technical term that represents the societal value of the enabling technologies in this “post semantic” Web. These technologies are the “tools which allow us to break free of the document layer” while we “cede control for greater benefit.” John Sowa has proposed these tools will not be limited to RDF and SPARQL. And when the government releases data back to the public, it bootstraps ceding control on the GGG.

In Working Ontologist, Dean Allemang and Jim Hendler introduce the AAA Slogan: “Anyone can say Anything about Any topic.” But that doesn’t mean the technologies underlying the GGG are one inconsistent RDF graph. Linked Data encourages information to flow through a web of constraints on the GGG. These constraints preserve semantics locally on the GGG. RDF semantics is one such constraint, but not the only one. SQL semantics might be another; Common Logic another; smaller logics like SHIOND, or LJT are others. Together the constraints form a web of their own. And this web of constraints are the regularities through which information flows. Think of the GGG as WEBS of data, not THE web of data.

So, how will this all happen? There are signs that its happening already: Drupal‘s rise in popularity. Some signs are good: publishing RDFa straight from a CSV. Some anti-patterns are emerging: extracting otherwise useful data, converting it into RDF, serializing it across the wire in a highly inefficient form, disregarding inference, stripping the RDF out and publishing it as a Web page. Our general understanding of what’s possible remains limited. Is our only choice a Semantic Web Architecture [based on] A Stack or Two Towers? I don’t think so. Are Datalog rules our best option and do we have to drop logic from the Semantic Web Layer Cake? I think not.

It’s all quite early just yet, but I think its possible to suggest a few recommendations for both the public and private sectors – some social, some technical – to allow information to flow across the GGG based on a web of constraints:

1. Identify publishing and linking roles for Linked Data initiatives. They are not the same. Be able to describe the activities you assign to each of those roles;
2. Acknowledge that we’re working at the edge. Where we need specifications for publishers and linkers, develop edge specifications. Edge specifications reassemble existing vocabularies like Dublin Core and XBRL and provide new terms only when needed. Avoid the urge to develop yet another core specification.
3. Recognize that regularities are what allows information to flow. Model theories, boolean valuations and proof interpretation are what creates these regularities. Without regularities we’re just hacking up something on a global scale. And that’ll delay achieving the potential of the GGG;
4. Use RDFS semantics instead of vocabulary, or terminology based approaches. These approaches retain the limitations of original intentionality. There are some great patterns for linking open data serendipitously in Working Ontologist. Create your own patterns.
5. Avoid death by layering. There were some known risks to the layer cake approach when it was devised and they’ve proven themselves unavoidable. LBase describes an embedded semantics a lot like what some smart folks are advocating for Common Logic;
6. Avoid anti-patterns like intervention and curation. Enterprise Information Management is a valid, separate discipline than Linked Data. The value proposition of Linked Data implies something a little short of serendipity. If you find yourself tempted to do lots of curation, its not Linked Data.
7. Study Institutions and functional languages like Haskell. Functional languages are very convenient for transforming among languages, logics, models and theories. Institutions formalize and standardize those transformations.