1896: The Year We Did Linked Data Right

“The system of expressing propositions which is called Existential Graphs was invented by me late in the year 1896, as an improvement upon another system published in the Monist for January 1897.”

-Charles Peirce 1906

If you didn’t attend this year’s International Semantic Web Conference (ISWC 2009) the good news is that you now have the chance to see the Pat Hayes invited keynote called BLOGIC or Now What’s in a Link?.

Pat Hayes ISWC Keynote, Blogic: Now What's in a Link

Pat’s keynote is a must watch video. His keen insights into Web Portability, Names and Identification, the Horatio Principle, SameAs and Death by Layering will shape the future of the Web for the better. After Pat’s talk I had the pleasure of speaking with Dame Wendy Hall and Wendy believed Pat’s talk was precisely what Web Science is about.

In this post I won’t reiterate the main points of Pat’s talk. Be sure to watch the video. Regular readers on The Phaneron will recognize similar points already made here over the past few years. I will take the opportunity to elaborate on what I believe to be a few important lessons from Pat’s talk that are revealed through his incidental comments. These incidental comments speak volumes about what many of us experience throughout our careers whether in the workplace or working with standards organizations.

The lessons are these:

  1. The question we’re asking has, in some surprising cases, already been answered. As computer scientists we have little opportunity to dedicate the time required to study the important work of the giants that came before us. Philosophy, logic and mathematics have a very large body of literature that takes years to truly understand. Peirce is just one of many whose writings contain answers to questions that we could otherwise ponder for decades only to arrive at the same answer. As Pat says to Tim in the video, Peirce solved the same problem as RDF with Existential Graphs in 1896. And he did it right!
  2. Our feelings really are useful indicators of when something is or isn’t right. Pat talks about having a sense that Bnodes weren’t quite right during the specification of RDF. In hindsight Pat had the right intuition and he has proposed a backward compatible solution based on Peirce’s Sheets of Assertion. There’s a Myers-Briggs story to be told that’s especially meaningful to me. I’m an INFJ. Over the last few years I’ve come to recognize many occasions where my feelings were good indicators of the truth. Although a cursory reading of Myers-Briggs may lead one to believe feeling (F) and Thinking (T) are in opposition, they are not. Our feelings are as good indicators of the truth as is logic.
  3. We often overlook advice only to learn later of its immense value. This happens when we’re just not ready to learn something. What I learned from Pat’s story that “John Sowa showed me Peirce, then he showed me Peirce again, then he showed me Peirce again” is that sometimes we’re just not ready to hear the advice we’re offered. I’ve been overlooking the advice to take Common Logic seriously for a few years now and Pat’s talk convinced me its time to take a serious look.

There’s another relevant point from Pat’s talk that should not go unmentioned. Pat’s talk was given at ISWC, but could equally well have been given to the Object Management Group. The OMG recently issued a request for proposals for a MOF to RDF Structural Mapping in Support of Linked Data. The contents of the RFP imply that the OMG faces many of the same challenges as W3C. The OMG would do well to study W3C’s lessons learned from RDF.

Observations from OWL Experiences and Directions 2009

I attended OWL Experiences and Directions (OWLED) at the International Semantic Web Conference (ISWC). The sessions were very good and I had the opportunity to meet some of the greats in the OWL 2 community. What a treat !

The RDF/OWL community continues to work through important issues in realism and nominalism and I thought it useful to relate an important conversation during one of the sessions. I asked Peter Patel-Schneider how the description logic community came to use the terms Concept and Role. Peter told me there was a perceived need in the 80s within the community to differentiate the language of description logics from the language of object oriented programming. The sentiment was simply that the terms used by the community should seem more related to logic. It was nothing deeper than that. The object oriented programming community had already chosen the terms Class and Property. So there’s no evidence the description logic community understood the significance of these terms as they relate to realism and nominalism.

I was also part of a conversation in which someone was asked to reflect on the philosophical implications of ontology imports. Of course I was intrigued and listened carefully to the response. A question this general is difficult to answer but it, as well as the very nice presentation on the Simple Knowledge Organization System (SKOS) given by Rinke Hoekstra, confirmed my intuition that government, industry and the Semantic Web community is seeking a deeper understanding of how realism and nominalism apply to implementation and interoperability. A better understanding of model theory can help, but controlled vocabularies, ontology importation and more generally ontology alignment within and across domains can also be better informed by semiotics.

Figure 1.

Since OWL ED and the International Semantic Web Conference (ISWC) 2009, there’s been a useful exchange on the SKOS archive called Using DBPedia Resources as skos:Concepts. Pat Hayes brings to lite some important issues in his comments which begin here.  You can find my comments related to figure 1, the interpretant triangle here.

The evidence from OWL Experiences and Directions as well as the SKOS archive indicate that that better understanding the significance of realism and nominalism will remain important for many in 2010.

After some reorientation, I am becoming more comfortable working with higher order logic and Haskell. The Isabelle/HOL distribution is very nicely done and the setup on my OpenSolaris machine was really quite painless. Contrary to some opinions about Haskell literature, there’s a lot of great information available. Unlike Visual Basic, Java or C# you won’t be spoon-fed by a publisher or vendor. I highly recommend the AJT Davie’s Introduction to Functional Programming Systems Using Haskell. It’s one of those little books that provide the reader with enough of a perspective of the subject to cause the reader to develop their own approach and understanding. Thanks to Mr. Goodchord, one of my guitar teachers, who many years ago for helped me realize how useful these little books can be. The real joy is in developing the deep understanding that comes by educating oneself on a rich and fruitful subject like functional programming the value of which can last a lifetime.

Open Government Linked Open Data

It was about this time a year ago when it became clear to me that that the Obama campaign’s fact sheet called Connecting and Empowering All Americans Through Technology and Innovation implied developing a government collaboration platform based on Linked Data. Shortly after the election in November and while serving as an invited expert on W3C’s electronic government interest group I wrote the Open Government: Linked Open Data use case that called for “new set of information technology architecture principles that align open government with citizen engagement in a networked society enabled by linked open data.”

Now a year later I am excited to recommend the Building Semantic Web Applications for Government half-day tutorial at International Semantic Web Conference (ISWC) 2009. This tutorial builds on the work that Dean Allemang, Chief Scientist at TopQuadrant, lead for our team the the General Services Administration as well as TopQuadrant’s emerging Ontologies for Electronic Government. In Dean’s own words “The charter of government is to be responsive to the people. As such, government information belongs in the hands of the governed.”

Today, Linked Data is the ONLY Internet-scale collaboration platform for open government. As I have written previously the long-term success of Linked Data for Open Government depends on an information life-cycle where publishing and linking co-evolve with minimal intervention. In the sort-term and to paraphrase John Sheridan of the UK’s Office of Public Sector Information, shiny objects and tiny cost allow the groundswell of innovation to reach a tipping point.

While you’re at ISWC09 be sure to attend Dean’s half-day tutorial on Monday October 26 from 2:00pm to 6:00pm.

Linked Data: Interpretants and Interpretation

Linked data got some attention over the past year. Both leading technologists and policy makers are coming to recognize there’s an opportunity to enhance the Web to help achieve society’s goals. As I recently wrote in Linking Open Data: An Emerging Practice Area for the Semantic Web, Web design principles and today’s open government initiatives in both the US and the UK share common values. And when public policy and technology principles align, great things can happen.

At O’Reilly Media’s recent Gov2.0 Summit Beth Noveck explained three key areas of President Obama’s Open Government Directive: transparency, participation and collaboration. What Beth said that was especially relevant to Linked Data was to relate collaboration to platforms. Beth’s examples such as iTunes were compelling, but as we know from Sir Tim Berners-Lee’s appointment to advise policy makers in the UK Cabinet Office on public information delivery, Linked Data is THE platform for internet-scale collaboration.

But this post isn’t about technology policy, it’s about interpretants and interpretation. If we expect Linked Data to be most effective I believe it essential to develop a much richer approach to interpretants and interpretation on the Web. So in this post I’ll: 1) elaborate on the Triangle of Meaning (the Triangle) first clarifying its terminology by introducing the interpretant, then developing its edges; 2) suggest a few refinements to the language used in W3C’s Architecture of the World Wide Web and Cool URIs for the Semantic Web that are especially relevant to Linked Data; 3) explain the effect of the Triangle on interpretation in current RDF Model Theory; and 4) propose further elaboration of the Triangle using Category Theory, Haskell and Higher Order Logic using Isabelle/HOL to advance the state of Linked Data. Sound ambitious? Read on brave traveler, but don’t forget to bring a towel.

You’ll recall my recent post titled RDFS Idioms for the Working Semiotician in which I propose a useful idiom in the semiotic domain using the Typing Data by Usage and Mutual SubPropertyOf patterns from Dean Allemang and Jim Hendler’s Semantic Web for the Working Ontologist to infer that an Icon which is an instance of a Sign of an Object is the equivalent of an Icon which is an instance of its Conception.

Figure 1.

Triangle of Meaning with PropertiesThe term Conception implies interpretation of the Sign by a human or animal, but Linked Data also requires interpretation by machines. In his later work, Peirce uses the term Interpretant: “I define a sign as something, A, which brings something, B, its interpretant, into the same sort of correspondence with something, C, its object, as that in which itself stands to C. In this definition I make no more reference to anything like the human mind than I do when I define a line as the place within which a particle lies during a lapse of time.” Figure 1 illustrates the Triangle based on this definition. It also elaborates on the definition by describing its edges. Each edge is comprised of two inverse functions. The inverse functions form outer and inner paths. The clockwise outer path traces the metaphysics of the Triangle. The counter-clockwise inner path traces an existent.

I’ll return to the edges of the Triangle shortly. For now I’ll use the elaborated Triangle to suggest refinements to the language used in W3C’s Architecture of the World Wide Web (AWWW) and Cool URIs for the Semantic Web (CUSW). I believe the following analysis will serve to inform a long standing discussion among members of W3C about URIs and resources.

There’s no doubt the URI serves as useful syntax for identification on the Web. But, the term resource does not serve us well. Because URIs serve various purposes on the Web, we need to name them according to their use. AWWW and CUSW already do some of that, but it can be done better. Here are a few important refinements stated in terms of the Triangle:

Information resources are really Objects: bits and bytes that exist in the machine. To precisely express their extent, they would be better called Information Objects. But before I continue here’s what I mean when I use the term extent. Extent defines the boundaries where an interpretant, sign or object can exist. Extent can be either machine, external world or consciousness. So the extent of Information Objects is machine.

Their metaphysics and existence is represented in the machine by both the outer and inner paths of the Triangle. Non-Information Resources are Objects too, but they cannot be materialized inside the machine. Their extent is external world. They can only be represented in the machine, so it’s the inner path of the Triangle that represents their extent.

303 redirects do nothing to change the extent of objects. Their extent is the external world. There’s no way to overcome the inapplicability of the outer path of the triangle to materialize objects in the machine. A redirect to a description of an object is simply another Sign or representation of the Object. Science fiction intentionally blurs this distinction. That makes great entertainment, but fuzzy thinking. You’ll recall Neo in The Matrix. Neo is shown to be reading Baudrillard’s Simulacra and Simulation (SS). Baudrillard warns against what he calls the Precession of the Simulacra in SS. The precession is more dangerous than fuzzy thinking. Failing to understand this distinction disconnects us from reality and truth. More importantly RDF model theory is based primarily on Alfred Tarski’s Semantic Conception of Truth (SCT). SCT defines truth in terms of two criteria: material adequacy and formal correctness. The truth on which material adequacy is based is really quite fragile. Below I describe an approach to extend good old fashioned model theory with a novel approach to interpretation.

The term Information Object should replace Information Resource, then we can call plain old objects just that: Objects. Also, this replacement allows us to drop the awkward term Indirect Reference which actually means represents which is precisely what the sign does for the object. There’s nothing indirect about the trash can on your desktop. In our consciousness we interpret the meaning of the trash can icon because of its likeness and similar function to the one outside our house.

The description logic community has a long standing practice of using the term Concept in both constructors and language classification. However, this community neither differentiates concepts from signs or from objects. Nor does it distinguish concepts from interpretants. Of course the extent of concept is the Consciousness. Signs exist in the machine and the external world. Interpretants exist in the machine and the Consciousness.

Now that I’ve introduced interpretants and suggested how to refine the language in AWWW and CUSW, how does the Triangle apply to an interpretation in RDF? In RDF, as in classical logic, interpretation is well defined. An interpretation for a language L is a structure on a domain and a function that preserves truth between symbols in the language and the objects to which they refer. By this definition interpretation means denotation.

To advance Linked Data we need to define an interpretation that extends denotation, or truth with meaning. In first order model theory an interpretation has a set of symbols which are constants, predicates and functions called a signature. To extend truth with meaning we define the signature to include these same symbols - constants, predicates and functions - AND we add the nodes of the Triangle - interpretant, sign and object according to their proper extent AND we add the edges of the Triangle. That will take a LOT of work, but the result will be an updated signature and an interpretation that provides meaning to model theory on the Semantic Web.

To wrap up this post, there are additional characteristics of the Triangle to be explained to satisfy Peirce’s definition. To paraphrase, A brings something B into correspondence with something C in which itself stands to C. So, the additional characteristics are likely commutativity of the Triangle and the composition of the relations that make up the edges. Specifically ,each edge is the composition of the two complementary edges of the Triangle. Category Theory provides a useful mechanism to explore these characteristics. Over the next few months I will develop exercises in Haskell to demonstrate these characteristics of the Triangle. Isabelle/HOL looks like a great prover and it now comes with a new utility called Haskabelle that translates Haskell into ML.

That pretty much covers interpretants and interpretation. Stay tuned for interim results in Category Theory, Haskell and Isabelle/HOL.

This will take a while.

RDFS Idioms for the Working Semiotician

I’m fresh back from Spain where I enjoyed the last few weeks on my honeymoon. I had some time to read Dean Allemang and Jim Hendler’s Semantic Web for the Working Ontologist (SWWO). Great book, highly recommended! SWWO provides some nice insights into RDFS patterns and I wanted to develop a worked example in RDFS that complements the previous example here in OWL-DL.

Like the previous example, the problem is to specify an ontology in the semiotic domain where the reasoner infers meaning from the model. By introducing the semiotic domain into our model we get more than interpretation based on RDF model theory and classical logic. We get meaning it in its fullest sense because the reasoner infers membership of an individual in the class conception from its the representation of an object by a sign. Unlike the prior example, this time I avoid inferring Sign as a sub class of Conception by using a few of Dean’s neat patterns and restricting the reasoning to RDFS.

To introduce the semiotic domain into the model I specify Object, Sign and Conception as sub classes of rdfs:Resource, then I specify relations between pairs of these resources as follows:

sem:Object sem:resolves sem:Conception

sem:Sign sem:represents sem:Object

sem:Conception sem:signifies sem:Sign

This most simple expression of the semiotic domain says that a sign represents an object, the conception signifies the meaning of the sign and the object resolves the conception. For more on what is commonly known as the Triangle of Meaning, see Ogden and Richards here.

To introduce the first idiom, Typing Data by Usage (see page 98), I assert the triples

sem:represents rdf:type rdf:Property

sem:represents rdfs:domain sem:Sign

then assert the individual

sem:Icon sem:represents sem:Likeness

from which the reasoner infers, that

sem:Icon rdf:type sem:Sign

We know from Peirce’s What is a Sign? that Icons are just one example of a Sign. Icons resemble, or represent, the likeness of an object. As is appropriate, I assert that Icon has the property represents and the RDFS reasoner appropriately infers that Icon is an instance of Sign. Nice!

The next idiom uses the Mutual SubPropertyOf pattern (see page 118). I justify the use of this pattern below, but first, here are the triples themselves.

sem:represents rdfs:subClassOf sem:signifies

sem:signifies rdfs:subClassOf sem:represents

So how do I justify Mutual SubPropertyOf? Consider Peirce’s relate and correlate. Peirce describes an interpretant as a mediating representation of correspondence between the relate and correlate where correspondence includes both concurrence and opposition. To keep the inferences withing RDFS, here I make the strong commitment of equivalence, just one of the possible relations implied by Peirce’s interpretant. For more on interpretants, see On a New list of Categories Sections 9 - 14.

All we need to do now is apply the Typing Data by Usage pattern again. This time I assert Conception as the domain of signifies.

sem:signifies rdfs:domain sem:Conception

As expected the RDFS reasoner infers the following new triples:

1. Through Mutual SubPropertyOf the equivalence relation expressed as owl:equivalentProperties.

sem:represents owl:equivalentProperty sem:signifies

sem:signifies owl:equivalentProperty sem:represents

2. Also through Mutual SubPropertyOf the reasoner infers the domains of the respective subProperties:

sem:signifies rdfs:domain sem:Sign

sem:represents rdfs:domain sem:Conception

3. Most importantly, the reasoner infers both that Icon signifies Likeness and that Icon is an instance of Conception.

sem:Icon sem:signifies sem:Likeness

This is our first glimpse of an important result. Icon, an instance of Sign by the property represents, now signifies the meaning of Likeness.

And finally

sem:Icon rdf:type sem:Conception

This inference demonstrates a fundamental result: the meaning of a Sign is inferred when Icon, an instance, is inferred to be a member of the class Conception.

This example was developed in TopBraid Composer. After a brief orientation Composer has become a very important part of my modeling toolkit. I highly recommend it! Though because these results are standard RDFS inferencing your Sesame and other open source tools should do the trick as well. You can find the sample ontology here.

So, how can this be used? Recall from SWWO that the Simple Knowledge Organization System models meaningby simply asserting a preferred and alternate symbol for a concept. The idioms defined above offer the advantages over SKOS of using RDFS inferencing to infer the meaning of a sign through the semiotic domain which SKOS does not. In fact, most knowledge representation approaches today do not effectively differentiate Sign and Conception. So, these RDFS idioms can be used to advance the state of vocabulary and thesaurus management. Also, information technology architects are often called on to manage models in different languages at Enterprise scale. The idioms above are also useful in Enterprise-wide model management and the discovery of relationships among disparate model throughout the enterprise.  This result is also significant to mashups in general and Linked Data specifically, so I have added a recipe to my mashup cookbook, check the mashup cookbook for recipe #2!

Linking Open Data: An Emerging Practice Area for the Semantic Web

I had a month to study Linked Open Data (LOD) recently. The LOD community is working through some really great issues. The overall goal of better information sharing is pretty well known, but what’s different about the the LOD approach is that LOD is consistent with Internet design principles. That’s in sharp contrast to the common upper ontology approach with which I’ll assume many folks are already familiar.

There’s an important connection between LOD and President Obama’s Transparency and Open Government Directive (TOGD). Bear with me while I explain. First, the Internet design principles share a common set of values with TOGD. See Tim Berners-Lee’s The World Wide Web and the Web of Life. Second, LOD is the only way to achieve the three goals in the TOGD - transparency, participation and collaboration - at Internet scale. Why Internet scale? Because it will take the Internet to provide a broad enough reach into today’s society to make the TOGD real.

There are some exciting initiatives right now around social production and X-Prizes to enable the President’s goals. These initiatives are really great but less far reaching than LOD. The limitations of the approaches to are quickly becoming apparent to members of the communities supporting them. For more background on LOD and Open Government, see the Open Government: Linked Open Data Use Case I wrote for the W3C eGov working group back in November 2008. Also, Tim just released a web design note you might enjoy called Putting Government Data OnLine.

So, I thought I’d share some of what I experienced as I worked through a few key issues in LOD.

Once I got past the CoolURIs and the LOD Design Issues, it became apparent that LOD is an emerging practice area for the Semantic Web. Think methodology and design patterns, or even better: a pattern language.

Although W3C activities recognize best practices, I suspect the perception in W3C is that specifying methodologies is too restrictive. But LOD implies a lot about behavior. I think the subject is better presented as a gerund, Linking Open Data. The term Linked Open Data objectifies and depersonalizes the behavior by describing the outcome. And the use of past tense makes it sound like something that already happened that one can consume: a technology product.

As I worked through the literature, thinking in terms of design patterns became very useful. Data typing by usage as well as type and property propagation through RDFS inferencing as described in Dean Allemang’s Semantic Web for the Working Ontologist were essential design patterns to effectively link open data. See my Mashup Cookbook next month for an example of how I used these design patterns to link two data sets that on face value appear unrelated.

A pattern language would serve nicely to clarify some key concepts for LOD practitioners. Just enough structure to formalize a pattern and just enough patterns to reveal an LOD life cycle. A key element of an LOD life cycle implies separation of concerns among publishers and linkers. I believe this separation of concerns greatly increases the likelihood of LOD’s success. Publishers don’t have to know RDF, but LOD needs just about everyone to publish. Publishers would want to know how to create valid rdf-in-xhtml pages and support required mime types when they publish. The Drupal community is already capable of serving as effective publishers. I envision linkers with deeper knowledge and broader responsibilities. Linkers need to plan at Internet scale over a long period of time and have a very good understanding of RDF semantics.

Another key element of this practice area is the development of edge specifications. Ontologists and modelers are typically drawn to specifying a core specification for each new project. In the case of LOD, I was naturally drawn to specifying at the edge. Although the example I provide is very much incomplete, this edge specification selects from a number of already available vocabularies whose use in the LOD research I was engaged were much more relevant and useful than specifying yet another core. This also served to better engage stakeholders from the various communities who’s data might become linked. The design of RDFa in XHTML provided the structure around which the vocabularies that determined the edge were organized and the SPARQL queries executed against the RDF triples that were dereferenced were derived from a set of use cases.

I’m concerned that the LOD community is overly optimistic about some fundamental issues in semantics that I’ve written about previously that just haven’t gone away. See this thread from Pat Hayes on using owl:sameAs too freely. The LOD community is deeply engaged in URI dereferencing and content negotiation. I suspect this community is trying to avoid some very complex issues in interpretation implied by semiotics. But, I believe the current approach to URI dereferencing and content negotiation are effectively a network hop based approach to recursive representation which avoids rather than resolves these hard issues. Dereferencing and content negotiation are overly complex to introduce to the customers I serve.

Just two random thoughts: While other folks were minting URIs, I became a URI counterfeiter. And I believe that LOD will lead us to talk more about WEBS of instead of THE web of data.

I think Linking Open Data represents a new and exciting practice area for the semantic web. And its essential for President Obama’s TOGD to permeate society. I’m currently continuing my customer support work on Linking Open Data for data.gov. I’m also checking out the inferencing available in the Conceptual Resource Engine (Corese) Project based on John Sowa’s Conceptual Graphs and of course Peirce’s Existential Graphs to better link open data.

Recipes from the Mashup Cookbook

Cookbooks are a tradition in software and mashups are all the rage today, so I couldn’t resist starting a mashup cookbook. But, before I show some of my recipes and what I have in the oven, I thought I’d explain the role mashups and social production play in solving a problem too hard and too expensive to solve under conditions controlled by corporations or the government.

An issue we face today is producing a sufficiently complex mirrror world to match the needs of an information society. By complex I mean a mirror world that nicely reflects the world in which we live. Consider the problem of ontology which has perplexed even the smartest of us since as far back as history can tell. Our need to understand the world causes us to attribute order to our experience. Whether it’s through Alexander’s Nature of Order, or Peirce’s Order of Nature, it’s our nature to understand the world through order. We use ontologies to represent that order. More times than not, we forget that the order through which we perceive the world diverges from our experience. When we forget this we mistake the order through which we understand the world for the world itself. We confuse mechanism with organism and models with the world. Alan Watts says it best in this 1969 lecture to a group of engineers from IBM. Tim Berners-Lee says Fractal Web, Fractal Society. Today, the mirror world distorts the world in which we live in some fundamental ways that mashups and social production are changing. Where corporations and governments typically provide large applications or large data sets with historical data, mashups combine information already available in unique ways. And social producers are autonomous resources that complement the capabilities of controlled resources. By complementing controlled resources with social producers, the mirror world better reflects the world in which we live by including otherwise missing resources from an information society as a whole.

Before I get out the ingredients and check the oven, I want to offer a few words of caution about some assumptions underlying power laws, crowdsourcing and long tail approaches in vogue today. Social producers will have to get quite a bit smarter than they are today so that coordination costs of social production do not exceed coordination costs under control. The cost of social producers is not free and the cost of low-knowledge producers in an information society is high because the information resources they produce have higher coordination costs. And the maintenance of these resources represents an opportunity cost. As I have previously written in Information Sharing and Tomorrow’s Knowledge Workers, knowledge transfer from the academic research community is essential in an information society. The assumption that the smart folks sit by themselves over at places like DARPA and universities and that technology is transferred to passive consumers with limited ability to understand the hard stuff doesn’t foster the growth of knowledge workers in an information society. An information society implies both technology transfer and knowledge transfer. Today, slow knowledge transfer distorts the mirror world and there’s no indication that social producers have an understanding of the hard problems of information sharing. Only a handful of experts have demonstrated they truly understand the hard problems. For many of the hard problems the solutions are even harder. And where knowledge is a differentiator, there’s no indication that autonomy and altruism as values that drive production will outweigh the innate selfishness in our genes. There’s plenty more to explore here, but that’s enough for now.

Anyway, back to the cookbook and mashups. Mashups are the right scale for social production to make a difference in the short term. So what makes a mashup yummy? My stomach’s growling for three ingredients: 1) Combining information resources that reveal information that is not otherwise available and that allow for more informed decision making. This can be at the level of indication, determination or inference. Note that John W. Tukey introduced this important distinction in his work on Exploratory Data Analysis and this work has been nicely extended by Edward Tufte. 2) Meaning, or semantics of various flavors. The semantics of truth as cooked up by Alfred Tarski in his Semantic Conception of Truth which serves as the bouillon of RDF model theory. Gently fold in some of Quine’s Two Dogmas of Empiricism and just a dash of Peirce’s On a New List of Categories. 3) Fresh ingredients. Only the most current information makes for suitable ingredients in a tasty mashup. Feeds and Tweets are all the rage these days and at the quantities these ingredients are served (20 items) on the Internet, these quantities are just right for where they”ll taste best in the recipe. So, what does Rick have in the oven? Here’s a screen shot of the data visualization.

data visualization screen shot

Cookbooks don’t seem to fit well in the blog format. Hopefully the screenshot should wet your appetite enough to check out the cookbook which you can find here.

Bon Appetit !

Tweets: New Vehicles for Meme Replication

Are you, like most newbies, asking yourself what’s up with Twitter? There’s no doubt that Twitter has captured everyone’s imagination, but what’s really happening on Twitter? And what’s really different about Twitter? The short answer is very fast replication of culture in the form of memes through a highly efficient vehicle called Tweets. Think Tweeme.

Richard Dawkins introduced the term meme in his book The Selfish Gene in 1976. A meme is an atomic unit of cultural information that is imitated and changed. Dawkins, a biologist, recognized that the evolution of a species takes place both across generations and within cultures. Within cultures, evolution takes place when memes, like genes, replicate. Dawkins writes that cultural evolution take place at a much faster time scale than genetic evolution. And three characteristics of a meme that affect its survival are its fecundity, or cultural richness, its copying fidelity and its longevity.

So what are some recognizable memes? Well, that depends on your culture. Gamers and hackers would recognize pwn. Pwn is thought to be derived from the word own and rhymes with poon. Gamers pwn, or own, their opponent when they’ve compromised their opponent’s system. Pwn is suspected to have been created through a typo because the letters p and o are adjacent on the querty keyboard. Pwn is a meme where imperfect copying fidelity lead to cultural evolution. Whereas in biology the template for replication is the gene itself, with memes, the template for replication is cultural understanding. Occurrences of a meme are its replicas.

Since it was introduced in 1976, meme has fulfilled its own promise and has become, yes, you guessed it, a meme in itself. A culture formed around memes, the word is used with a variety of senses and its definition has evolved from the few examples Dawkins provides. Most notably, Susan Blackmore, in Artificial, Self Replicating Meme Machines, writes about techno-memes, or temes. Blackmore, like Dawkins, claims that temes, like all replicators are selfish. By selfish Dawkins and Blackmore impute a behavior to temes, memes and genes that personifies the characteristics of survival in a competitive environment where survivial implies replication among a pool of competitors.

I’ll return to temes shortly, but first two more defintions: Pools, as in gene pool, would be the set of all genes available to a species. Vehicles are individual units within the species that serve as a mechanism for the survivual of genes and memes. Dawkins’ notable contribution to biology, some say the most significant since Darwin, is that humans are vehicles, or survival machines for genes. And genes are the atomic units of natural selection.

Blackmore’s temes do not seem well defined and a close examination reveals some fuzzy singularity-like thinking, but a careful examination of memes on Twitter will lead us to understand Tweemes and why Tweets are highly efficient vehicles for cultural replication. Blackmore’s temes are, or are becoming, self replicating. Although we can readily accept that digital technologies are highly efficient for copying and have high copying fidelity, self replication implies more than copying. There’s no evidence that memes change themselves and here we need to separate technology from culture. And it’s essential to remember that Dawkins admittedly imputes behavior to genes and memes as a rhetorical trick to best communicate survival under competition.

Humans change memes, like the typo that created pwn, and there’s no evidence that technology does so of its own accord. Singularity theories are valid in that under induction we are unwise to exclude possibilities, but various technologies lumped under artificial intelligence operate according to design. As an example of design constraints that would limit the capabilities proposed by Blackmore, follow the recent email thread here, by Tim Berners-Lee on design and use in the Semantic Web. Finally the degenerate case of design by permutation does not imply that technology self-replicates and without intelligence implies a combinatoric explosion.

So if humanity is a gene pool and the internet is a meme pool, then Twitter is a meme pool with special characteristics that accelerate cultural evolution. Tweets are vehicles that transmit embedded memes that we can call Tweemes. Let’s compare these characteristics with Dawkins’ fecundity, copying fidelity and longevity: Tweets are transmitted on mobile devices. That means Crackberry addiction is way up and there’s no time at all before followers get your interrupt signal and their Pavlovian response triggers an immediate check for undirected messages. @replies allow followers to copy or mutate a Tweeme, but as above there’s no self-replication, only replication by your followers’ design. Think semi-intelligent design, although that is clearly giving the benefit of the doubt in most cases. @replies for all allows one to see the @replies of one’s followers to Twits who you don’t even follow thereby accelerating the transmission and reflection of Tweemes across the cultural equivalent of evolution across three generations within just minutes. This means that surviving Tweemes are very selfish, given an environment that tends towards very low longevity given the attention span of most Twits.

Even newbs can have fun with Tweemes by using Twemes. Note that Tweemes differ from Twemes. Twemes use the hash tag (#) convention that allows you to search Twitter so you can narrow your search for Tweemes. Search a bit, mutate a few Tweemes, turn off your @replies only setting and then follow any Twit you don’t know and whose @reply shows up in your feed. There’s nothing geekier n00bies could do to celebrate Darwin’s 200th birthday!

Ok noob, now you know what’s up with Twitter. Twitter’s a meme pool that accelerates cultural evolution. Tweets are highly efficient vehicles for transmission of cultural information. And Tweemes are very selfish memes that survive on n00bs.

Peirce’s Semiotics in the Alignment of Formal Specifications Using Shared Concepts

In To Pragmaticism and Beyond I describe an ambitious plan to develop an emergent theory of meaning. I begin that plan here by describing how introducing semiotics into the alignment and unification of domain specific ontologies, also called local ontologies, serves to better specify implicit relationships among those ontologies.

Post-hoc ontology alignment has recently and will continue to gain prominence in response to design principles of complex systems. Approaches to post-hoc alignment include automated mapping using a variety of approaches most notably the IF-Map approach and the alignment and unification approach described in Robert Kent’s Information Flow Framework. Other approaches include alignment of the object and metalanguages based on matching, mapping, distance measures and Galois connections.

All very cool stuff, however the results of ontology alignment and unification are often perceived as underwhelming for two reasons: 1. there’s insufficient information in domain ontologies on which to base alignment because they were conceived and specified separately; and 2. the model theory on which the alignment is based is grounded in a semantic theory of truth, not meaning. I already describe #2 in State of the Semantic Web:Representation and Realism and I’ll return to #2 in a later post. Regarding #1, standardization may cause our representations to converge locally. Public initiatives such as Dublin Core Metadata Initiative is a good example as are the ongoing Linked Data efforts related to Semantic Web based repositories such as DBPedia and Science Commons as are private initiatives such as MetaWeb’s Freebase. Given this list of standardization initiatives even casual observers ask the obvious question: How then would one align ontologies from Dublin Core, DBPedia, Science Commons and Freebase? Unfortunately, any standardization approach reveals the paradox that because the scope and reach of universality is unachievable in standardization, especially ontology specification, standardization implies opposition and mediation.

Faced with this paradox of standardization, ontologists can perform post-hoc annotations in which humans interpret the meaning of local terms and specify global ontologies that establish relations between local ontologies. This approach can be very effective presuming the availability of resources and cash. A casual observer would recognize again that this approach does not scale, but post-hoc alignment offers benefits over certain limitations implied by a Common Upper Ontology where local ontologies can only be defined once the global ontology has been defined.

One approach that has gained interest recently is the introduction of a global semiotic ontology based on the work of Charles Sanders Peirce. Peirce’s semiotics includes a) a system of signs that describes how language functions in human understanding; b) a system of triadic relations which exist within the system of signs; and c) an approach to categorization through which one can abstract the function of tokens in an object language into a useful types in a metalanguage. By wide interest I mean leading thinkers such as Joseph Goguen, John Sowa, Robert Kent and many others. Presumably this approach offers no advantages or disadvantages in automation. The value is in Peirce’s specification of the semiotic domain, its function in explaining how humans understand the world and its corollary in building more sophisticated models of machine understanding.

To explain this approach I’ll work out an example in OWL, but there’s no dependency on OWL in using this approach. I just so happened to be working with OWL a few years ago when I first developed the example. I encourage you to work out an example in your favorite language and post a comment.

Shared Concept Alignment of Functional Decomposition The figure illustrates the approach. Consider two specifications: first, the Federal Enterprise Architecture Business Reference Model defined by the Office of Management and Budget and second, the Business Enterprise Architecture defined by the Office of the Secretary of Defense. The naive Enterprise Architect as the question: How are these specifications aligned?

Simply put, BusinessArea and MissionArea are both functional areas specific to the domain of discourse. The green circle FunctionalArea acts as a symbol that represents the blue circle FunctionalDecomposition which is a concept shared between the specifications. In the example, the concept serves as an interpretant, the symbol as its representation and the terms BusinessArea and MissionArea as objects. The large blue ellipse indicates the boundary that separates the terminological space from the conceptual space.

The astute observer will recognize that using OWL SameAs is a degenerate case of the approach defined here. The SameAs relation established between BusinessArea and MissionArea serves as a clear example of Peirce’s thirdness or mediation. In fact, Tim Berner’s Lee describes and approach that he calls shared concept here. Tim’s example focuses strongly on the value of the URI - what a surprise - and importing Dublin Core. Feel free to share how you think the approaches compare.

The files that I specified to prove this approach are:

http://www.rickmurphy.org/fea-osd.owl

http://www.rickmurphy.org/fea-brm.owl

http://www.rickmurphy.org/dod-bea.owl

http://www.rickmurphy.org/categories.owl

I used Swoop and Pellet as the reference implementation, but you can probably use tools like Top Braid Composer and others. In terms of instructions, just point Swoop here. This is the fea-osd ontology, one of two that contains assertions and axioms derived from Peirce’s semiotics. The other is the categories ontology based on Peirce’s On a New List of Categories. Before you run Pellet, notice the categories import and the assertion that FunctionalDecomposition is a subclass of UniversalConception from categories. Then select FunctionalArea and notice the axiom that says that FunctionalArea is defined as the intersection of FunctionalDecomposition and the property called representation restricted to the value symbol. As above, this axiom means that FunctionalArea is the symbol representing the concept FunctionalDecomposition. When you turn on the Pellet reasoner the standard reasoning services (classification, realization, subsumption and consistency checks) begin to execute. When execution terminates, the type of FunctionalArea is inferred as a subclass of FunctionalDecomposition. In addition to demonstrating the value of Peirce’s semiotics, this example also serves to illustrate the use of a Curry-Howard correspondence of sorts for the Semantic Web.

So, I hope you enjoy the approach I’ve shown here. It demonstrates the value of Perice’s semiotics in ontology alignment and unification. It shows that a) OWL SameAs is a degenerate case of using a global semiotic domain ontology based on Peirce’s thirdness or mediation; b) the semiotic domain provides deeper insights into how machine understanding can model human understanding; and c) how to use type inferencing with the Pellet description logic reasoner. There’s much more to be done than what this small example demonstrates. This example is just the beginning of developing a semiotic domain. The example should be extended and I look forward to your comments.

I’ll be discussing this approach at the Washington, DC Semantic Web Meetup on December 11, 2008 and I hope to see you there !

To Pragmaticism and Beyond: The Emergence of Meaning in Complex Systems

I’ve been active on the ontolog-forum for the past month where I’ve been engaged in dialog with the great folks there discussing, in part, topics related to my recent post called the State of the Semantic Web: Representation and Realism. The upshot of the discussion is that there’s a consensus that a model theory based on Tarski’s Semantic Conception of Truth does not provide a theory of meaning. It’s worth exploring the threads that start here and following through at least to here.

There’s much more to be said about what this means to Tim Berners-Lee and the great work going on in Web Science, but it’s fair to say what was originally and with the best of intention described as the web of meaning should more accurately be described as a web of truth.

Before I continue with the central topic of this post, I’ll reference a secondary point in the ontolog dialog: a) linked data representation of the world as a 303 redirect needs to be further developed and b) the RDF model theory should be revised to account for what’s implied by differentiating information and non-information resources in the linked data initiative. I’ll come back to this in a later post.

The central topic for this post is to sketch out a program for developing a new theory of meaning. This new theory of meaning will, surprisingly, extend Tarski’s Semantic Conception of Truth. Of course, at one post per month in my blog, it will take a few years to elaborate this new theory of meaning. I’ll develop the theory both informally and formally. In short, my hypothesis is this: a) semantics provides a theory of representation and truth; b) semiotics provides a theory signs and their interpretation; c) pragmaticism defines the effect of a conception on other objects as the whole of the conception; and d) meaning emerges in a complex system through the convergence of relational properties and domain knowledge.

So, the sketch of this program is as follows:

a) develop a structural model of the relational properties in both Tarski’s meta and object languages using the triangle of signification as suggested by Roland Barthes in Mythologies

b) illustrate emergent properties of the complex system between the meta and object language

c) develop domain knowledge based on Peirce’s ten trichotomies

d) illustrate convergence of the complex system over time

e) develop a proof of the hypothesis using inference rules in a deductive system

f) provide a revised definition of interpretation using semiotics

g) extend the relations used in determining meaning from equality to equivalence, isomorphism and adjointness

Hopefully you’re intrigued by this sketch. There’s much work to be done and the goal is to take the sketch from a thought experiment to a formal theory and worked example or two using lambda calculus, category theory and a computational infrastructure in Haskell.

Next Page »