Linking Open Data: An Emerging Practice Area for the Semantic Web

I had a month to study Linked Open Data (LOD) recently. The LOD community is working through some really great issues. The overall goal of better information sharing is pretty well known, but what’s different about the the LOD approach is that LOD is consistent with Internet design principles. That’s in sharp contrast to the common upper ontology approach with which I’ll assume many folks are already familiar.

There’s an important connection between LOD and President Obama’s Transparency and Open Government Directive (TOGD). Bear with me while I explain. First, the Internet design principles share a common set of values with TOGD. See Tim Berners-Lee’s The World Wide Web and the Web of Life. Second, LOD is the only way to achieve the three goals in the TOGD - transparency, participation and collaboration - at Internet scale. Why Internet scale? Because it will take the Internet to provide a broad enough reach into today’s society to make the TOGD real.

There are some exciting initiatives right now around social production and X-Prizes to enable the President’s goals. These initiatives are really great but less far reaching than LOD. The limitations of the approaches to are quickly becoming apparent to members of the communities supporting them. For more background on LOD and Open Government, see the Open Government: Linked Open Data Use Case I wrote for the W3C eGov working group back in November 2008. Also, Tim just released a web design note you might enjoy called Putting Government Data OnLine.

So, I thought I’d share some of what I experienced as I worked through a few key issues in LOD.

Once I got past the CoolURIs and the LOD Design Issues, it became apparent that LOD is an emerging practice area for the Semantic Web. Think methodology and design patterns, or even better: a pattern language.

Although W3C activities recognize best practices, I suspect the perception in W3C is that specifying methodologies is too restrictive. But LOD implies a lot about behavior. I think the subject is better presented as a gerund, Linking Open Data. The term Linked Open Data objectifies and depersonalizes the behavior by describing the outcome. And the use of past tense makes it sound like something that already happened that one can consume: a technology product.

As I worked through the literature, thinking in terms of design patterns became very useful. Data typing by usage as well as type and property propagation through RDFS inferencing as described in Dean Allemang’s Semantic Web for the Working Ontologist were essential design patterns to effectively link open data. See my Mashup Cookbook next month for an example of how I used these design patterns to link two data sets that on face value appear unrelated.

A pattern language would serve nicely to clarify some key concepts for LOD practitioners. Just enough structure to formalize a pattern and just enough patterns to reveal an LOD life cycle. A key element of an LOD life cycle implies separation of concerns among publishers and linkers. I believe this separation of concerns greatly increases the likelihood of LOD’s success. Publishers don’t have to know RDF, but LOD needs just about everyone to publish. Publishers would want to know how to create valid rdf-in-xhtml pages and support required mime types when they publish. The Drupal community is already capable of serving as effective publishers. I envision linkers with deeper knowledge and broader responsibilities. Linkers need to plan at Internet scale over a long period of time and have a very good understanding of RDF semantics.

Another key element of this practice area is the development of edge specifications. Ontologists and modelers are typically drawn to specifying a core specification for each new project. In the case of LOD, I was naturally drawn to specifying at the edge. Although the example I provide is very much incomplete, this edge specification selects from a number of already available vocabularies whose use in the LOD research I was engaged were much more relevant and useful than specifying yet another core. This also served to better engage stakeholders from the various communities who’s data might become linked. The design of RDFa in XHTML provided the structure around which the vocabularies that determined the edge were organized and the SPARQL queries executed against the RDF triples that were dereferenced were derived from a set of use cases.

I’m concerned that the LOD community is overly optimistic about some fundamental issues in semantics that I’ve written about previously that just haven’t gone away. See this thread from Pat Hayes on using owl:sameAs too freely. The LOD community is deeply engaged in URI dereferencing and content negotiation. I suspect this community is trying to avoid some very complex issues in interpretation implied by semiotics. But, I believe the current approach to URI dereferencing and content negotiation are effectively a network hop based approach to recursive representation which avoids rather than resolves these hard issues. Dereferencing and content negotiation are overly complex to introduce to the customers I serve.

Just two random thoughts: While other folks were minting URIs, I became a URI counterfeiter. And I believe that LOD will lead us to talk more about WEBS of instead of THE web of data.

I think Linking Open Data represents a new and exciting practice area for the semantic web. And its essential for President Obama’s TOGD to permeate society. I’m currently continuing my customer support work on Linking Open Data for data.gov. I’m also checking out the inferencing available in the Conceptual Resource Engine (Corese) Project based on John Sowa’s Conceptual Graphs and of course Peirce’s Existential Graphs to better link open data.

Recipes from the Mashup Cookbook

Cookbooks are a tradition in software and mashups are all the rage today, so I couldn’t resist starting a mashup cookbook. But, before I show some of my recipes and what I have in the oven, I thought I’d explain the role mashups and social production play in solving a problem too hard and too expensive to solve under conditions controlled by corporations or the government.

An issue we face today is producing a sufficiently complex mirrror world to match the needs of an information society. By complex I mean a mirror world that nicely reflects the world in which we live. Consider the problem of ontology which has perplexed even the smartest of us since as far back as history can tell. Our need to understand the world causes us to attribute order to our experience. Whether it’s through Alexander’s Nature of Order, or Peirce’s Order of Nature, it’s our nature to understand the world through order. We use ontologies to represent that order. More times than not, we forget that the order through which we perceive the world diverges from our experience. When we forget this we mistake the order through which we understand the world for the world itself. We confuse mechanism with organism and models with the world. Alan Watts says it best in this 1969 lecture to a group of engineers from IBM. Tim Berners-Lee says Fractal Web, Fractal Society. Today, the mirror world distorts the world in which we live in some fundamental ways that mashups and social production are changing. Where corporations and governments typically provide large applications or large data sets with historical data, mashups combine information already available in unique ways. And social producers are autonomous resources that complement the capabilities of controlled resources. By complementing controlled resources with social producers, the mirror world better reflects the world in which we live by including otherwise missing resources from an information society as a whole.

Before I get out the ingredients and check the oven, I want to offer a few words of caution about some assumptions underlying power laws, crowdsourcing and long tail approaches in vogue today. Social producers will have to get quite a bit smarter than they are today so that coordination costs of social production do not exceed coordination costs under control. The cost of social producers is not free and the cost of low-knowledge producers in an information society is high because the information resources they produce have higher coordination costs. And the maintenance of these resources represents an opportunity cost. As I have previously written in Information Sharing and Tomorrow’s Knowledge Workers, knowledge transfer from the academic research community is essential in an information society. The assumption that the smart folks sit by themselves over at places like DARPA and universities and that technology is transferred to passive consumers with limited ability to understand the hard stuff doesn’t foster the growth of knowledge workers in an information society. An information society implies both technology transfer and knowledge transfer. Today, slow knowledge transfer distorts the mirror world and there’s no indication that social producers have an understanding of the hard problems of information sharing. Only a handful of experts have demonstrated they truly understand the hard problems. For many of the hard problems the solutions are even harder. And where knowledge is a differentiator, there’s no indication that autonomy and altruism as values that drive production will outweigh the innate selfishness in our genes. There’s plenty more to explore here, but that’s enough for now.

Anyway, back to the cookbook and mashups. Mashups are the right scale for social production to make a difference in the short term. So what makes a mashup yummy? My stomach’s growling for three ingredients: 1) Combining information resources that reveal information that is not otherwise available and that allow for more informed decision making. This can be at the level of indication, determination or inference. Note that John W. Tukey introduced this important distinction in his work on Exploratory Data Analysis and this work has been nicely extended by Edward Tufte. 2) Meaning, or semantics of various flavors. The semantics of truth as cooked up by Alfred Tarski in his Semantic Conception of Truth which serves as the bouillon of RDF model theory. Gently fold in some of Quine’s Two Dogmas of Empiricism and just a dash of Peirce’s On a New List of Categories. 3) Fresh ingredients. Only the most current information makes for suitable ingredients in a tasty mashup. Feeds and Tweets are all the rage these days and at the quantities these ingredients are served (20 items) on the Internet, these quantities are just right for where they”ll taste best in the recipe. So, what does Rick have in the oven? Here’s a screen shot of the data visualization.

data visualization screen shot

Cookbooks don’t seem to fit well in the blog format. Hopefully the screenshot should wet your appetite enough to check out the cookbook which you can find here.

Bon Appetit !

Tweets: New Vehicles for Meme Replication

Are you, like most newbies, asking yourself what’s up with Twitter? There’s no doubt that Twitter has captured everyone’s imagination, but what’s really happening on Twitter? And what’s really different about Twitter? The short answer is very fast replication of culture in the form of memes through a highly efficient vehicle called Tweets. Think Tweeme.

Richard Dawkins introduced the term meme in his book The Selfish Gene in 1976. A meme is an atomic unit of cultural information that is imitated and changed. Dawkins, a biologist, recognized that the evolution of a species takes place both across generations and within cultures. Within cultures, evolution takes place when memes, like genes, replicate. Dawkins writes that cultural evolution take place at a much faster time scale than genetic evolution. And three characteristics of a meme that affect its survival are its fecundity, or cultural richness, its copying fidelity and its longevity.

So what are some recognizable memes? Well, that depends on your culture. Gamers and hackers would recognize pwn. Pwn is thought to be derived from the word own and rhymes with poon. Gamers pwn, or own, their opponent when they’ve compromised their opponent’s system. Pwn is suspected to have been created through a typo because the letters p and o are adjacent on the querty keyboard. Pwn is a meme where imperfect copying fidelity lead to cultural evolution. Whereas in biology the template for replication is the gene itself, with memes, the template for replication is cultural understanding. Occurrences of a meme are its replicas.

Since it was introduced in 1976, meme has fulfilled its own promise and has become, yes, you guessed it, a meme in itself. A culture formed around memes, the word is used with a variety of senses and its definition has evolved from the few examples Dawkins provides. Most notably, Susan Blackmore, in Artificial, Self Replicating Meme Machines, writes about techno-memes, or temes. Blackmore, like Dawkins, claims that temes, like all replicators are selfish. By selfish Dawkins and Blackmore impute a behavior to temes, memes and genes that personifies the characteristics of survival in a competitive environment where survivial implies replication among a pool of competitors.

I’ll return to temes shortly, but first two more defintions: Pools, as in gene pool, would be the set of all genes available to a species. Vehicles are individual units within the species that serve as a mechanism for the survivual of genes and memes. Dawkins’ notable contribution to biology, some say the most significant since Darwin, is that humans are vehicles, or survival machines for genes. And genes are the atomic units of natural selection.

Blackmore’s temes do not seem well defined and a close examination reveals some fuzzy singularity-like thinking, but a careful examination of memes on Twitter will lead us to understand Tweemes and why Tweets are highly efficient vehicles for cultural replication. Blackmore’s temes are, or are becoming, self replicating. Although we can readily accept that digital technologies are highly efficient for copying and have high copying fidelity, self replication implies more than copying. There’s no evidence that memes change themselves and here we need to separate technology from culture. And it’s essential to remember that Dawkins admittedly imputes behavior to genes and memes as a rhetorical trick to best communicate survival under competition.

Humans change memes, like the typo that created pwn, and there’s no evidence that technology does so of its own accord. Singularity theories are valid in that under induction we are unwise to exclude possibilities, but various technologies lumped under artificial intelligence operate according to design. As an example of design constraints that would limit the capabilities proposed by Blackmore, follow the recent email thread here, by Tim Berners-Lee on design and use in the Semantic Web. Finally the degenerate case of design by permutation does not imply that technology self-replicates and without intelligence implies a combinatoric explosion.

So if humanity is a gene pool and the internet is a meme pool, then Twitter is a meme pool with special characteristics that accelerate cultural evolution. Tweets are vehicles that transmit embedded memes that we can call Tweemes. Let’s compare these characteristics with Dawkins’ fecundity, copying fidelity and longevity: Tweets are transmitted on mobile devices. That means Crackberry addiction is way up and there’s no time at all before followers get your interrupt signal and their Pavlovian response triggers an immediate check for undirected messages. @replies allow followers to copy or mutate a Tweeme, but as above there’s no self-replication, only replication by your followers’ design. Think semi-intelligent design, although that is clearly giving the benefit of the doubt in most cases. @replies for all allows one to see the @replies of one’s followers to Twits who you don’t even follow thereby accelerating the transmission and reflection of Tweemes across the cultural equivalent of evolution across three generations within just minutes. This means that surviving Tweemes are very selfish, given an environment that tends towards very low longevity given the attention span of most Twits.

Even newbs can have fun with Tweemes by using Twemes. Note that Tweemes differ from Twemes. Twemes use the hash tag (#) convention that allows you to search Twitter so you can narrow your search for Tweemes. Search a bit, mutate a few Tweemes, turn off your @replies only setting and then follow any Twit you don’t know and whose @reply shows up in your feed. There’s nothing geekier n00bies could do to celebrate Darwin’s 200th birthday!

Ok noob, now you know what’s up with Twitter. Twitter’s a meme pool that accelerates cultural evolution. Tweets are highly efficient vehicles for transmission of cultural information. And Tweemes are very selfish memes that survive on n00bs.

Peirce’s Semiotics in the Alignment of Formal Specifications Using Shared Concepts

In To Pragmaticism and Beyond I describe an ambitious plan to develop an emergent theory of meaning. I begin that plan here by describing how introducing semiotics into the alignment and unification of domain specific ontologies, also called local ontologies, serves to better specify implicit relationships among those ontologies.

Post-hoc ontology alignment has recently and will continue to gain prominence in response to design principles of complex systems. Approaches to post-hoc alignment include automated mapping using a variety of approaches most notably the IF-Map approach and the alignment and unification approach described in Robert Kent’s Information Flow Framework. Other approaches include alignment of the object and metalanguages based on matching, mapping, distance measures and Galois connections.

All very cool stuff, however the results of ontology alignment and unification are often perceived as underwhelming for two reasons: 1. there’s insufficient information in domain ontologies on which to base alignment because they were conceived and specified separately; and 2. the model theory on which the alignment is based is grounded in a semantic theory of truth, not meaning. I already describe #2 in State of the Semantic Web:Representation and Realism and I’ll return to #2 in a later post. Regarding #1, standardization may cause our representations to converge locally. Public initiatives such as Dublin Core Metadata Initiative is a good example as are the ongoing Linked Data efforts related to Semantic Web based repositories such as DBPedia and Science Commons as are private initiatives such as MetaWeb’s Freebase. Given this list of standardization initiatives even casual observers ask the obvious question: How then would one align ontologies from Dublin Core, DBPedia, Science Commons and Freebase? Unfortunately, any standardization approach reveals the paradox that because the scope and reach of universality is unachievable in standardization, especially ontology specification, standardization implies opposition and mediation.

Faced with this paradox of standardization, ontologists can perform post-hoc annotations in which humans interpret the meaning of local terms and specify global ontologies that establish relations between local ontologies. This approach can be very effective presuming the availability of resources and cash. A casual observer would recognize again that this approach does not scale, but post-hoc alignment offers benefits over certain limitations implied by a Common Upper Ontology where local ontologies can only be defined once the global ontology has been defined.

One approach that has gained interest recently is the introduction of a global semiotic ontology based on the work of Charles Sanders Peirce. Peirce’s semiotics includes a) a system of signs that describes how language functions in human understanding; b) a system of triadic relations which exist within the system of signs; and c) an approach to categorization through which one can abstract the function of tokens in an object language into a useful types in a metalanguage. By wide interest I mean leading thinkers such as Joseph Goguen, John Sowa, Robert Kent and many others. Presumably this approach offers no advantages or disadvantages in automation. The value is in Peirce’s specification of the semiotic domain, its function in explaining how humans understand the world and its corollary in building more sophisticated models of machine understanding.

To explain this approach I’ll work out an example in OWL, but there’s no dependency on OWL in using this approach. I just so happened to be working with OWL a few years ago when I first developed the example. I encourage you to work out an example in your favorite language and post a comment.

Shared Concept Alignment of Functional Decomposition The figure illustrates the approach. Consider two specifications: first, the Federal Enterprise Architecture Business Reference Model defined by the Office of Management and Budget and second, the Business Enterprise Architecture defined by the Office of the Secretary of Defense. The naive Enterprise Architect as the question: How are these specifications aligned?

Simply put, BusinessArea and MissionArea are both functional areas specific to the domain of discourse. The green circle FunctionalArea acts as a symbol that represents the blue circle FunctionalDecomposition which is a concept shared between the specifications. In the example, the concept serves as an interpretant, the symbol as its representation and the terms BusinessArea and MissionArea as objects. The large blue ellipse indicates the boundary that separates the terminological space from the conceptual space.

The astute observer will recognize that using OWL SameAs is a degenerate case of the approach defined here. The SameAs relation established between BusinessArea and MissionArea serves as a clear example of Peirce’s thirdness or mediation. In fact, Tim Berner’s Lee describes and approach that he calls shared concept here. Tim’s example focuses strongly on the value of the URI - what a surprise - and importing Dublin Core. Feel free to share how you think the approaches compare.

The files that I specified to prove this approach are:

http://www.rickmurphy.org/fea-osd.owl

http://www.rickmurphy.org/fea-brm.owl

http://www.rickmurphy.org/dod-bea.owl

http://www.rickmurphy.org/categories.owl

I used Swoop and Pellet as the reference implementation, but you can probably use tools like Top Braid Composer and others. In terms of instructions, just point Swoop here. This is the fea-osd ontology, one of two that contains assertions and axioms derived from Peirce’s semiotics. The other is the categories ontology based on Peirce’s On a New List of Categories. Before you run Pellet, notice the categories import and the assertion that FunctionalDecomposition is a subclass of UniversalConception from categories. Then select FunctionalArea and notice the axiom that says that FunctionalArea is defined as the intersection of FunctionalDecomposition and the property called representation restricted to the value symbol. As above, this axiom means that FunctionalArea is the symbol representing the concept FunctionalDecomposition. When you turn on the Pellet reasoner the standard reasoning services (classification, realization, subsumption and consistency checks) begin to execute. When execution terminates, the type of FunctionalArea is inferred as a subclass of FunctionalDecomposition. In addition to demonstrating the value of Peirce’s semiotics, this example also serves to illustrate the use of a Curry-Howard correspondence of sorts for the Semantic Web.

So, I hope you enjoy the approach I’ve shown here. It demonstrates the value of Perice’s semiotics in ontology alignment and unification. It shows that a) OWL SameAs is a degenerate case of using a global semiotic domain ontology based on Peirce’s thirdness or mediation; b) the semiotic domain provides deeper insights into how machine understanding can model human understanding; and c) how to use type inferencing with the Pellet description logic reasoner. There’s much more to be done than what this small example demonstrates. This example is just the beginning of developing a semiotic domain. The example should be extended and I look forward to your comments.

I’ll be discussing this approach at the Washington, DC Semantic Web Meetup on December 11, 2008 and I hope to see you there !

To Pragmaticism and Beyond: The Emergence of Meaning in Complex Systems

I’ve been active on the ontolog-forum for the past month where I’ve been engaged in dialog with the great folks there discussing, in part, topics related to my recent post called the State of the Semantic Web: Representation and Realism. The upshot of the discussion is that there’s a consensus that a model theory based on Tarski’s Semantic Conception of Truth does not provide a theory of meaning. It’s worth exploring the threads that start here and following through at least to here.

There’s much more to be said about what this means to Tim Berners-Lee and the great work going on in Web Science, but it’s fair to say what was originally and with the best of intention described as the web of meaning should more accurately be described as a web of truth.

Before I continue with the central topic of this post, I’ll reference a secondary point in the ontolog dialog: a) linked data representation of the world as a 303 redirect needs to be further developed and b) the RDF model theory should be revised to account for what’s implied by differentiating information and non-information resources in the linked data initiative. I’ll come back to this in a later post.

The central topic for this post is to sketch out a program for developing a new theory of meaning. This new theory of meaning will, surprisingly, extend Tarski’s Semantic Conception of Truth. Of course, at one post per month in my blog, it will take a few years to elaborate this new theory of meaning. I’ll develop the theory both informally and formally. In short, my hypothesis is this: a) semantics provides a theory of representation and truth; b) semiotics provides a theory signs and their interpretation; c) pragmaticism defines the effect of a conception on other objects as the whole of the conception; and d) meaning emerges in a complex system through the convergence of relational properties and domain knowledge.

So, the sketch of this program is as follows:

a) develop a structural model of the relational properties in both Tarski’s meta and object languages using the triangle of signification as suggested by Roland Barthes in Mythologies

b) illustrate emergent properties of the complex system between the meta and object language

c) develop domain knowledge based on Peirce’s ten trichotomies

d) illustrate convergence of the complex system over time

e) develop a proof of the hypothesis using inference rules in a deductive system

f) provide a revised definition of interpretation using semiotics

g) extend the relations used in determining meaning from equality to equivalence, isomorphism and adjointness

Hopefully you’re intrigued by this sketch. There’s much work to be done and the goal is to take the sketch from a thought experiment to a formal theory and worked example or two using lambda calculus, category theory and a computational infrastructure in Haskell.

Signs of the Singularity and Why Chris Anderson and Nicholas Carr Won’t Make the Next Cut

I noticed a similarity recently in posts from Chris Anderson and Nicholas Carr. Over the past few months both of these widely read authors published a thought provoking post that calls into question humanity’s stewardship of knowledge in today’s 2.0 world. And each post contains signs of the singularity. Read on brave traveler, but don’t forget to bring your towel !

Anderson, in The End of Theory: The Data Deluge Makes the Scientific Method Obsolete, postulates a world of technological utopianism without realism. Throughout his post Anderson challenges the scientific method with citations from authorities like Box and Turing. Despite the strength of each of his premises, the absurdity of Anderson’s challenge to the scientific method is surpassed only by his inability to reason. Anderson would do well to watch The Matrix again, where he’ll find Neo reading Baudriallard’s Simulacra and Simulation and hopefully recognize that he advocates a technological utopianism following the precession of the simulacra. Dangerous not only for Anderson, but also to those whose fascination with technology overwhelms their ability to think clearly.

As I mention in my previous post The State of the Semantic Web: Representation and Realism, despite its fragile foundation, model theory implies realism. The relation between a model and the world may be only one of approximation, but without realism, technological utopianism quickly precedes to simulacra and simulation. For those who are interested, John Sowa in Process and Causality, provides a very useful visualization in Figure 12 of the relation between the world, a model and a theory that Anderson would do well to better understand. Anderson’s claim that “[...] faced with massive data, this approach to science — hypothesize, model, test — is becoming obsolete” cannot be correct. Although he leads the reader to believe Google’s success is based solely on statistical induction, Google, a company that measures everything, has a well defined mechanism to validate the realism on which the models they derive from statistical induction are based. And that’s clearly Google’s income statement and its stock price. Currently Google’s page rank approach is holding up in the short term, but I recently had lunch with Vint Cerf and owe him and email about semantics. Semantics are a pressing issue for Google and the competition is increasing in semantic search with Microsoft’s acquisition of Powerset.

Anderson’s claim that statistical induction on large data sets will replace the scientific method is simply absurd. Induction, deduction and abduction all imply a scientific method through which either observer or participant embrace reality. Drew Conway’s The Hubris of the End of Theory provides useful insights on Anderson’s claims from a statistician’s perspective. It’s no small wonder that Nicholas Carr believes it essential to serve as a skeptic against technological utopians like Anderson.

Carr, in Is Google Making Us Stupid, postulates that there’s a behavior evolving in society: widely available information expressed in binary relations without transitive closure and the Internet as the medium through which is it conveyed is leading unwitting individuals to engage in habits that build cognitive pathways which reduce their attention span. And we can’t stop. Carr describes his own experience succumbing to this pernicious affectation as well as his unsettling feeling that he can neither control or reverse the process already underway in his own life. Ultimately, Carr concludes “That’s the essence of Kubrick’s dark prophecy: as we come to rely on computers to mediate our understanding of the world, it is our own intelligence that flattens into artificial intelligence.” According to Carr, our wills have somehow been overcome by a force stronger than reason or survival.

I’ll admit that I spend a good deal of time engaged in the behavior Carr describes. I call it surveying and I’ll claim that I’ve discovered some amazing things that I would not have other wise known: Enterprise Architecture and the Information Flow Framework are just two examples. Today, services like StumbleUpon propose to automate that process. Possibly Carr would benefit from a hobby like transcendental meditation or enlisting in the military where he might develop the discipline to walk away from the machine when he feels himself losing control. But most importantly Carr can overcome his condition by developing a complete theory to guide his surveying. And from this complete theory, possibly using one that Anderson has jettisoned, Carr will develop an intuitive sense of closure and put his conscience at ease.

Carr could also develop a sense of being in the long now. By being in the long now I mean a patience that values the experience of knowledge gained over time without fear of loss or the limitation implied by immediacy. A long time ago we called that wisdom. This sense of being in the long now allows someone to have the confidence to develop a multi-year project. Mick Goodrick, an avid follower of Gurdjieff and a guitar teacher of mine in what seems a so long ago, understood well what it means to be in the long now. Faced with a lifetime of mastering techniques to communicate emotion through sound, Goodrick advocated well defined projects with a bounded subject over a fixed time period and no fixed outcome. In music theory, there’s no shortage of subjects from which to choose and the ordering of subjects into patterns is just another project. And part of Goodrick’s technique is retrospective: look back on what you accomplish and allow that experience to further shape your experience. Carr could design a project in which he developed his own theory based on what he surveyed over a two year period, then retrospectively analyze his theory in the context of information theory, starting with Shannon’s Mathematical Theory of Communication, followed by Barwise and Seligman’s Information Flow: the Logic of Distributed Systems and finally Goguen’s Theory of Institutions.

By building on his premise that behavior builds habit reinforced by cognitive pathways, Carr perpetuates the myth of a technological distopia: the myth that our intelligence is becoming subservient to that of machines. Carr says “Still, their easy assumption that we’d all “be better off” if our brains were supplemented, or even replaced, by an artificial intelligence is unsettling. It suggests a belief that intelligence is the output of a mechanical process, a series of discrete steps that can be isolated, measured, and optimized. In Google’s world, the world we enter when we go online, there’s little place for the fuzziness of contemplation. Ambiguity is not an opening for insight but a bug to be fixed. The human brain is just an outdated computer that needs a faster processor and a bigger hard drive.”

Those who have invested decades in the advancement of artificial intelligence will attest that we’re not so close to the singularity that we can’t avoid a technological distopia. And no matter how much money Larry Page and Serge Brin have in the bank, the proofs left to us by Turing and Godel remain a considerable challenge to scientists and programmers alike, despite the science fiction of Vernor Vinge or the optimism of Ray Kurzweil. And despite the ongoing work in synthetic biology, the works of Allan Watts stand as a testament to understanding the fundamental challenge of modeling organism with mechanism.

So the similarity in the posts by Anderson and Carr are signs of the singularity. Anderson, a technological utopian, who claims that the scientific method no longer has a place in technology. And Carr, a technological distopian, who has no theory at the foundation of his surveying. Not the singularity of Vinge or Kurzweil, but a utilitarian singularity that is here today in the use of technology grounded in scientific discipline, not through the rejection of reason, but in the use of technology that shapes our daily lives and sets us free through signs that we sometimes understand and sometimes don’t. But, through deeper study we’ll better understand the signs of the singularity.

The State of the Semantic Web: Representation and Realism

Danny Ayers made a request for comments on the state of the semantic web a few weeks ago. I’ll preface this post by saying the state of the semantic web is very good by which I mean some very good design decisions were made early on that ensured a vibrant academic research base, a broad marketplace for technology transfer and an eager community of technology providers to realize the vision of a web of meaning. I’m personally very positive overall on the state of the semantic web. The semantic web is now pretty close to the best of all possible worlds. (ha, ha, ha) All that being said it’s time to sharpen my pencil a bit and offer a critique on a few of the finer points of the state of the semantic web: representation and realism.

The state of the semantic web depends on the assumptions underlying its model theory. As I describe below in my post titled Why Meaning Comes in 3s, the model theory underlying RDF semantics and therefore the semantic web is based on Alfred Tarski’s Semantic Conception of Truth. In the Semantic Conception of Truth, Tarski defines truth in terms of what he calls material adequacy. Material adequacy implies three things: 1) sentences are objects in the world, 2) formal languages fully interpret these sentences, and 3) truth is based on an equivalence between the world and its description. We know these assumptions serve the purpose of establishing a somewhat fragile framework for truth. Sentences are not objects in the world. They describe the world as we understand it which is already once removed. In general, the interpretation of natural language into formal language remains an open research question. And ironically, the dual meaning of the word model is based in the practice of abstracting details of the world away, such that our models are simpler than the world so we can understand it better. Equivalence between a model and the world requires recreating the world.

All this being said, the Semantic Conception of Truth, on which RDF model theory is based, is consistent with our mechanistic approach to science that we inherit from Descartes, Newton, etc. The general scenario is as follows: We need to understand the world, we observe phenomena, we build a model and we validate whether the model explains the phenomena. So, these assumptions, our Semantic Conception of Truth, are our mechanism for representation and realism.

Just a few more points on Tarski’s model theory.

1. In addition to material adequacy, Tarki’s model theory also defines truth in terms of what’s formally correct. Formal correctness is the foundation for inference and reasoning on the semantic web. Although this post is mostly about Tarski’s model theory in terms of material adequacy, I’ll comment that it’s important to differentiate the standard description logic reasoning services (classification, realization, subsumption and consistency checking) from the reasoning services of theorem provers: unification, resolution, skolemization and modus ponens. And hope for additional reasoning services such as defeasibility, incremental and probablistic reasoning to evolve in the description logic community to complement the standard services already available.

2. Tarski uses the term meaning throughout the Semantic Conception of Truth. He properly concludes that his definition of truth informs the theoretical foundation of semantics. But, he also acknowledges the need to better explain other semantic concepts like designates, satisfies, defines, consequence, synonymy and meaning.

3. RDF model theory also defers on establishing the meaning of an RDF term. The introduction reads: “Exactly what is considered to be the ‘meaning’ of an assertion in RDF or RDFS in some broad sense may depend on many factors, including social conventions, comments in natural language or links to other content-bearing documents. Much of this meaning will be inaccessible to machine processing and is mentioned here only to emphasize that the formal semantics described in this document is not intended to provide a full analysis of ‘meaning’ in this broad sense; that would be a large research topic. The semantics given here restricts itself to a formal notion of meaning which could be characterized as the part that is common to all other accounts of meaning, and can be captured in mechanical inference rules.”

4. Scott Soames argues that attempts to derive a theory of meaning from Tarski’s theory of truth have failed. In his paper titled Truth and Meaning in Perspective, Soames reviews the literature of Quine, Chomsky and Davidson and concludes the interpretation, not representation provides the foundation for a theory of meaning.

For some an interesting discussion regarding #3 and #4, see this thread on the ontolog-forum. Follow the thread for comments from Pat Hayes, Chris Menzel and John Sowa.

One more point on realism. Barry Smith in his Realism Approach to the Evolution of BioMedical Ontologies argues that concepts should not be used in ontologies, RDF or otherwise, because the intent of ontology is to describe things in the world: the subject of ontology is realism. What Smith argues is that introducing concepts into ontologies tends words idealism, that it is not possible to adequately define concepts and, but it is possible to define portions of reality. Smith then describes issues in information provenance and effectivity. First, I think defining concepts and reality are probably at the same level of magnitude. At least it is for me, so I think Smith’s argument against concepts should be seen in that light. Also, contrary to his assertion, there is substantial literature on defining concepts. Joseph Goguen’s What is a Concept stands as a good example and Smith’s approach directly contradicts the practice of the description logic community in which the term concept is used throughout the literature. Also if we follow Soames’s reasoning, representation and realism will not support a theory of meaning. A detailed analysis of Tarski’s Semantic Conception of Truth reveals enough challenges with representation and realism. If we follow Smith’s suggestion and disallow concepts in ontology, we disallow interpretation and a theory of meaning for the semantic web. So, I’m concerned that Smith’s realism would limit the development of the semantic web.

So, what does all this mean to the state of the semantic web ?

Again, good things in general. Good decisions were made early on and the conditions are right for success. The good news is RDF has a formal model theory through which we can can properly understand the implications of hard stuff like truth and meaning. Representation and realism are the order of the day in science as they are on the semantic web. But, there’s much work to be done in developing an approach to meaning by adding the notion of interpretation to our model theory and that’s the work of semiotics which I’ll describe in an upcoming post !

Complex Systems Road Trip

It was years in the making, but I finally made it to the New England Complex Systems Institute last week for CX201 - Complex Physical, Biological and Social Systems. My interest in complex systems started in or around 2000 when I was living near Cambridge and stumbled upon NECSI. At that time there was an active discussion of Peirce’s firstness, secondness and thirdness on the NECI listserv and this became my introduction to Peirce’s philosophy.

The science of complex systems provides clarity regarding the meaning of terms now coming into use as memes in the social media and executive leadership communities. For example, complex systems grounds the term collective intelligence, a term currently used at the Aspen Institute to denote the “co-creation of value”, as the behavior of the whole that one cannot observe from the behavior of the parts. As social networks become a larger part of our worklife, it’s essential that the social media community ground its memes in the science of complex systems.

Last week’s lectures were based on Yaneer’s Dynamics of Complex Systems and Making Things Work. Yaneer made an excellent choice by insisting that laptops went down during the lectures, so we were all engaged in the experience rather than intermediated by technology. I enjoyed getting to know Yaneer over the course if the week. He’s a consumate scientist, teacher and leader. My experience there was of the highest quality and I plan to return next year for CX202 - Complex Systems Modeling and Networks by which time I hope to have some Haskell chops and be ready to execute on the modeling phase of the curriculum.

Why Meaning Comes in 3s

A few days ago our team reviewed some ongoing work in which we’re developing a better approach to sharing information. Today, model driven architecture and the semantic web are widely accepted approaches to sharing information. Despite the acceptance of these approaches, their underlying model theory is not well understood as was evident from our review. So, I’ll spend the next few posts (or more)

Shared Concept Triangle explaining truth and meaning in model theory. I’ll explain why it’s important to differentiate truth from meaning, so you can better understand claims made about the semantic web and model driven architecture as approaches to sharing information. The semantic web has a formal model theory defined here. Giving them them the benefit of the doubt, the model driven architecture community works from an implicit model theory. The explanations I provide here can inform the model driven architecture community as it comes to recognize the need to develop a formal model theory.

The triangular figure above is a sign that allows us to understand meaning through its structure: in 3s. The figure has three nodes labeled N1, N2, N3 and three edges labeled E1, E2 and E3. The nodes can indicate either types or tokens in our theory of meaning. The edges represent the relations among the nodes. I’ll show how to incorporate this structure into a truth-based, or Tarskian, model theory to support Goguen’s relational theory of meaning which I reference in my post entitled Algebraic Semiotics: A Relational Theory of Meaning.

But first I’ll explain the truth-based, or Tarskian, model theory behind the semantic web.

Science of Consciousness Road Trip

I’m just back from Tucson, AZ where I spent the last few days at the 2008 Science of Consciousness conference. The conference is sponsored by the Center for Consciousness Studies (CCS) which is part of the University of Arizona Medical School. CCS presents itself as promoting open, rigorous discussion of all phenomena related to conscious experience.

The structure of the conference includes plenary, concurrent and poster sessions. The plenary sessions were heavily weighted towards normal science. (See Thomas Kuhn The Structure of Scientific Revolutions.) Presentations demonstrated a very high level of the scientific method within clinical psychology and neuro-science. Clinical psychology and neuro-science are not my area, but the results of the research I heard presented underwhelmed me.

Conclusion. Normal science does not explain consciousness very well at all.

A week before the conference, I received a recording of a talk given by Allan Watts at IBM way back in 1969. Yes, Allan Watts from The Way of Zen and TAO: The Watercourse Way. In this talk, Watts differentiates mechanism and organism. Watts reminds the IBM researchers that mechanism provides a very limited model of organism and advises that there are limits on what can be achieved through mechanism and normal science. I left the conference with the impression the science of consciousness needs a new approach based in organism. Our consciousness is an artifact of the human organism and normal science does not explain phenomena related to consciousness well at all.

Overall, a very nice conference. I’ll probably go back as the conference is scheduled to be held every two years, so I’ll offer these suggestions for 2010:

  1. Normal science should not drive the agenda. As suggested by Kuhn, identify anomaly and either known or emerging crises, that’s where the action is !
  2. Develop a pattern language rooted in organism, not mechanism. Structure the pattern language, the science will follow close behind.
  3. Have an open jam session, not a talent show and encourage everyone to participate. The performer/observer model of a talent show connotes entertainment, a consumer model. Jam sessions imply shared consciousness. Anyone can hit a drum.

Next Page »