Linking Open Data: An Emerging Practice Area for the Semantic Web
I had a month to study Linked Open Data (LOD) recently. The LOD community is working through some really great issues. The overall goal of better information sharing is pretty well known, but what’s different about the the LOD approach is that LOD is consistent with Internet design principles. That’s in sharp contrast to the common upper ontology approach with which I’ll assume many folks are already familiar.
There’s an important connection between LOD and President Obama’s Transparency and Open Government Directive (TOGD). Bear with me while I explain. First, the Internet design principles share a common set of values with TOGD. See Tim Berners-Lee’s The World Wide Web and the Web of Life. Second, LOD is the only way to achieve the three goals in the TOGD – transparency, participation and collaboration – at Internet scale. Why Internet scale? Because it will take the Internet to provide a broad enough reach into today’s society to make the TOGD real.
There are some exciting initiatives right now around social production and X-Prizes to enable the President’s goals. These initiatives are really great but less far reaching than LOD. The limitations of the approaches to are quickly becoming apparent to members of the communities supporting them. For more background on LOD and Open Government, see the Open Government: Linked Open Data Use Case I wrote for the W3C eGov working group back in November 2008. Also, Tim just released a web design note you might enjoy called Putting Government Data OnLine.
So, I thought I’d share some of what I experienced as I worked through a few key issues in LOD.
Once I got past the CoolURIs and the LOD Design Issues, it became apparent that LOD is an emerging practice area for the Semantic Web. Think methodology and design patterns, or even better: a pattern language.
Although W3C activities recognize best practices, I suspect the perception in W3C is that specifying methodologies is too restrictive. But LOD implies a lot about behavior. I think the subject is better presented as a gerund, Linking Open Data. The term Linked Open Data objectifies and depersonalizes the behavior by describing the outcome. And the use of past tense makes it sound like something that already happened that one can consume: a technology product.
As I worked through the literature, thinking in terms of design patterns became very useful. Data typing by usage as well as type and property propagation through RDFS inferencing as described in Dean Allemang’s Semantic Web for the Working Ontologist were essential design patterns to effectively link open data. See my Mashup Cookbook next month for an example of how I used these design patterns to link two data sets that on face value appear unrelated.
A pattern language would serve nicely to clarify some key concepts for LOD practitioners. Just enough structure to formalize a pattern and just enough patterns to reveal an LOD life cycle. A key element of an LOD life cycle implies separation of concerns among publishers and linkers. I believe this separation of concerns greatly increases the likelihood of LOD’s success. Publishers don’t have to know RDF, but LOD needs just about everyone to publish. Publishers would want to know how to create valid rdf-in-xhtml pages and support required mime types when they publish. The Drupal community is already capable of serving as effective publishers. I envision linkers with deeper knowledge and broader responsibilities. Linkers need to plan at Internet scale over a long period of time and have a very good understanding of RDF semantics.
Another key element of this practice area is the development of edge specifications. Ontologists and modelers are typically drawn to specifying a core specification for each new project. In the case of LOD, I was naturally drawn to specifying at the edge. Although the example I provide is very much incomplete, this edge specification selects from a number of already available vocabularies whose use in the LOD research I was engaged were much more relevant and useful than specifying yet another core. This also served to better engage stakeholders from the various communities who’s data might become linked. The design of RDFa in XHTML provided the structure around which the vocabularies that determined the edge were organized and the SPARQL queries executed against the RDF triples that were dereferenced were derived from a set of use cases.
I’m concerned that the LOD community is overly optimistic about some fundamental issues in semantics that I’ve written about previously that just haven’t gone away. See this thread from Pat Hayes on using owl:sameAs too freely. The LOD community is deeply engaged in URI dereferencing and content negotiation. I suspect this community is trying to avoid some very complex issues in interpretation implied by semiotics. But, I believe the current approach to URI dereferencing and content negotiation are effectively a network hop based approach to recursive representation which avoids rather than resolves these hard issues. Dereferencing and content negotiation are overly complex to introduce to the customers I serve.
Just two random thoughts: While other folks were minting URIs, I became a URI counterfeiter. And I believe that LOD will lead us to talk more about WEBS of instead of THE web of data.
I think Linking Open Data represents a new and exciting practice area for the semantic web. And its essential for President Obama’s TOGD to permeate society. I’m currently continuing my customer support work on Linking Open Data for data.gov. I’m also checking out the inferencing available in the Conceptual Resource Engine (Corese) Project based on John Sowa’s Conceptual Graphs and of course Peirce’s Existential Graphs to better link open data.