Linked Data: Interpretants and Interpretation

Linked data got some attention over the past year. Both leading technologists and policy makers are coming to recognize that a smarter Web is a better Web. As I recently wrote in Linking Open Data: An Emerging Practice Area for the Semantic Web today’s open government initiatives in both the US and the UK share common values. When a technology becomes available to advance public policy great things can happen.

At O’Reilly Media’s recent Gov2.0 Summit Beth Noveck explained three areas of President Obama’s Open Government Directive: transparency, participation and collaboration. What Beth said that was especially relevant to Linked Data was to relate collaboration to platforms. Beth’s examples like iTunes were compelling, but as we know from Sir Tim Berners-Lee’s appointment to advise policy makers in the UK Cabinet Office on public information delivery, Linked Data is THE platform for internet-scale collaboration.

But this post isn’t about technology policy, it’s about interpretants and interpretation. If we expect Linked Data to become highly effective it will be essential to develop a much richer understanding of interpretants and interpretation on the Web. So in this post I’ll: 1) elaborate on the Triangle of Meaning (the Triangle) first clarifying its terminology by introducing the interpretant and explaining the meaning of each edge of the Triangle; 2) suggest a few refinements to the language used in W3C’s Architecture of the World Wide Web and Cool URIs for the Semantic Web that are especially relevant to Linked Data; 3) explain the relevance of the Triangle to current RDF Model Theory; and 4) propose further elaboration of the Triangle using Category Theory, Haskell and Higher Order Logic using Isabelle/HOL to advance the state of Linked Data. Sound ambitious? Read on brave traveler, but don’t forget to bring a towel.

For a primer on the following material read my recent post titled RDFS Idioms for the Working Semiotician in which I propose a useful idiom in the semiotic domain using the Typing Data by Usage and Mutual SubPropertyOf patterns from Dean Allemang and Jim Hendler’s Semantic Web for the Working Ontologist to infer that an Icon which is an instance of a Sign of an Object is the equivalent of an Icon which is an instance of its Conception.

Figure 1.

Triangle of Meaning with PropertiesThe term Conception implies the interpretation of a Sign by a human or animal, but Linked Data requires the interpretation of Signs by machines. In his later work instead of the term Conception, Peirce uses the term Interpretant: “I define a sign as something, A, which brings something, B, its interpretant, into the same sort of correspondence with something, C, its object, as that in which itself stands to C. In this definition I make no more reference to anything like the human mind than I do when I define a line as the place within which a particle lies during a lapse of time.” Figure 1 illustrates the Triangle with Interpretant substituted for Conception. It also elaborates on prior illustrations by listing its edges. Each edge is comprised of two inverse functions. The inverse functions form outer and inner paths. The clockwise outer path traces the metaphysics of the Triangle. The counter-clockwise inner path traces an existent.

I’ll return to the edges of the Triangle shortly. For now I’ll use the nodes of the Triangle substituting Interpretant for Conception to suggest refinements to the language used in W3C’s Architecture of the World Wide Web (AWWW) and Cool URIs for the Semantic Web (CUSW). The following suggestions will serve to inform a long standing discussion among members of W3C about URIs and resources.

There’s no doubt the URI serves as useful syntax for identification on the Web. But, the term resource does not serve us well. Because URIs serve various purposes on the Web, we need to understand them according to their purpose. AWWW and CUSW already do some of that, but it can be done better. Calling everything a resource doesn’t help. Here are a few important refinements stated in terms of the Triangle:

Information resources are really Objects: bits and bytes that exist in the machine. To precisely express their extent, they would be better called Information Objects. But before I continue here’s what I mean when I use the term extent. Extent defines the boundaries where an Interpretant, Sign or Object can exist. Extent can be either machine, external world or consciousness. So the extent of Information Objects is machine. Their metaphysics and existence is represented in the machine by both the outer and inner paths of the Triangle. Non-Information Resources are Objects too, but they cannot be materialized inside the machine. Their extent is external world: they exist in the external world. They can only be represented in the machine.

303 redirects do nothing to change the extent of objects. There’s no way to overcome our inability to materialize non-Information Objects inside the machine. A redirect to a description of an object is simply another Sign or representation of the Object. Science fiction intentionally blurs this distinction and that makes great entertainment, but fuzzy thinking. Remember Neo in The Matrix? Neo is shown to be reading Baudrillard’s Simulacra and Simulation (SS). In SS Baudrillard warns against what he calls the Precession of the Simulacra. The precession is more dangerous than fuzzy thinking. Failing to understand this distinction disconnects us from reality and truth.

The term Information Object should replace Information Resource, then we can call plain old objects just that: Objects. Also, this replacement allows us to drop the awkward term Indirect Identification which actually means represents which is precisely what the Sign does for the Object.

The description logic community has a long standing practice of using the term Concept in both constructors and language classification. However, this community neither differentiates concepts from signs or from objects. Nor does it distinguish concepts from interpretants. Of course the extent of concept is the Consciousness. Signs exist in the machine and the external world. Interpretants exist in the machine and the Consciousness.

Now that I’ve introduced interpretants and suggested refinements to the current language in AWWW and CUSW, how could the Triangle apply to an interpretation in RDF? First, the interpretation in model theory is not the interpretant of the Triangle. In RDF, as in classical logic, interpretation is defined as follows: An interpretation for a language L is a structure in a domain and a function that preserves truth between symbols in the language L and the objects to which they refer. For model theory, the objects to which the symbols refer are the objects in the formal system. See Tarski’s system T in his Semantical Conception of Truth. Because RDF model theory implies a denotational semantics with a boolean valuation.

To advance Linked Data we need to extend an interpretation based on a boolean valuation to include meaning. To extend truth with meaning we could use the symbols of logic – connectives, constants, predicates and function, etc – AND add the nodes of the Triangle – Interpretant, Sign and Object according to their proper extent AND add the edges of the Triangle.

However, even with this addition we require further analysis before we conclude that we’ve extended truth, or boolean valuation, with meaning. Does this extension refute Searle’s Chinese Room Argument? Does it provide something new in relation to the Turing Test? Does a signature enhanced with the Triangle allow Semiotic Morphisms under Goguen’s Theory of Institutions? I believe it is too early to answer these questions, but we can recognize the value of a separate semiotic domain and its role in recognizing well defined extents for Linked Data.

To wrap up this post, there are additional characteristics of the Triangle to be explained to satisfy Peirce’s definition. To paraphrase, A brings something B into correspondence with something C in which itself stands to C. So, these additional characteristics are most likely commutativity of the Triangle and the composition of the relations that make up the edges. Specifically, each edge is the composition of the two complementary edges of the Triangle. Category Theory provides a useful mechanism to explore these characteristics. Over the next few months I will develop exercises in Haskell to demonstrate these characteristics of the Triangle. Isabelle/HOL looks like a great prover and it now comes with a new utility called Haskabelle that translates Haskell into ML.

That pretty much covers interpretants and interpretation. Stay tuned for interim results in Category Theory, Haskell and Isabelle/HOL.

12 Comments

  1. Cory Casanave says:

    Rick,
    Interesting post! I think this relates to a classic data integration issue and points to an approach to addressing it. As you know, we have been working for years on various standards for languages and domains and have been frustrated by the “stove piped” nature of these languages and models. Many people are frustrated by the data integration issues in general.
    The core of the problem is that when we create an information object (at any meta level) we are creating an interpretation of the real world object. This is hard enough when we are creating information objects to represent real physical things but is even more challenging when we are creating information objects to represent abstract concepts such as types, classes or predicates.
    If we take UML modeling for example and we create the inevitable “person” class. We know in our hearts that this doesn’t really represent a person – it represents some particular viewpoint of a person biased by our own perspectives and needs. It is, essentially, one opinion about “person”. Of course what happens is that we have millions of “person classes” in hundreds of modeling languages, implementation languages and even paper records. We have little in the way of a correlation between these representations or instances of them.
    While we know that these information objects aren’t “really the person” or isn’t even a complete representation of the person, we tend to ignore all the other interpretations and have no way to connect them. That we have our own interpretations is inevitable, that we don’t recognize their connection is unforgivable.
    So what is the connection? Is there a “one true person”? Well, we know that has all kinds of theoretical and practical problems. But, what we may be able to agree on is that all of these information objects represent an aspect of person – we would like a connection between the information object and the object. But since the real objects exists outside of the system, and may even be an abstraction, what is the connection point?
    Perhaps that connection point is one you mentioned “represents”. The information object represents the object. But if we have stripped away all of the information objects, what is the “object” in the system, what do we know about it? Perhaps all we know is that it has identity. The identity of the object is all that we can talk about – everything else is an “opinion” about one aspect of the object.
    Since we also don’t have any way to know “true identity”, all we can know is identifiers (arbitrary signs). We can have identifiers for the object, identifiers that are different from the identifiers of the information objects. Now we have a hook – we can say “information object identifier” “represents” “object identifier”. That, with one additional assertion “object identifier” sameAs “object Identifier” provides us with the crucial link between information objects and objects in the world.
    So, lets say that every DBMS row, XML element and model element had a “represents” link. We could then have a handle to know that these very different information objects represent the SAME THING, We have a way to link our data, our models and our abstractions – even if the identity of the object is arbitrary.
    Of course the “represents” assertion is as much an opinion as anything else, but where those assertions are trusted we can connect the dots between information objects about the same thing. We also need to STOP confusing the identity of information objects with the identity of objects.
    So my suggestion, which I think builds on your triangle, is that we recognize both information objects and objects as having identity, but different identities, and that we have a pervasive link type for information objects “represents” that is the pivot point between information objects representing the same thing.
    So a very simple generic addition “represents” and a convention for object identity (perhaps with a new URI type like “id://corycasanave.modeldriven.org”) we can start to build the connections between our stovepipes. With that capability we can also then imagine identity registries popping up that contain nothing but identifiers for these real word objects and even abstract concepts. These would be the subject of “represents” assertions and provide a link between very diverse systems.

  2. Cory,

    You state:
    “Now we have a hook – we can say “information object identifier” “represents” “object identifier”.

    Then you state:
    “So a very simple generic addition “represents” and a convention for object identity (perhaps with a new URI type like “id://corycasanave.modeldriven.org”

    But, what’s wrong with:
    ?

    Where:

    Identifies the Information Object (bearer of constellation of characteristics that coalesce around the Identifier of a Real World Object); Where, the format of the Information Object structure is only constrained by the representation manifestation capabilities of the medium of perception, e.g., mime types re. the Internet medium.

    And:
    is the Real World Object Identifier, the conduit to the Information Object that bears its description (metadata).

    Rick:
    It would be nice if you could place Referent, Identifier, and Representation in the Triangles above, so as to add to the overall clarity of this nice post.

    Kingsley

  3. Cory Casanave says:

    Rick,
    Interesting post! I think this relates to a classic data integration issue and points to an approach to addressing it. As you know, we have been working for years on various standards for languages and domains and have been frustrated by the “stove piped” nature of these languages and models. Many people are frustrated by the data integration issues in general.
    The core of the problem is that when we create an information object (at any meta level) we are creating an interpretation of the real world object. This is hard enough when we are creating information objects to represent real physical things but is even more challenging when we are creating information objects to represent abstract concepts such as types, classes or predicates.
    If we take UML modeling for example and we create the inevitable “person” class. We know in our hearts that this doesn’t really represent a person – it represents some particular viewpoint of a person biased by our own perspectives and needs. It is, essentially, one opinion about “person”. Of course what happens is that we have millions of “person classes” in hundreds of modeling languages, implementation languages and even paper records. We have little in the way of a correlation between these representations or instances of them.
    While we know that these information objects aren’t “really the person” or isn’t even a complete representation of the person, we tend to ignore all the other interpretations and have no way to connect them. That we have our own interpretations is inevitable, that we don’t recognize their connection is unforgivable.
    So what is the connection? Is there a “one true person”? Well, we know that has all kinds of theoretical and practical problems. But, what we may be able to agree on is that all of these information objects represent an aspect of person – we would like a connection between the information object and the object. But since the real objects exists outside of the system, and may even be an abstraction, what is the connection point?
    Perhaps that connection point is one you mentioned “represents”. The information object represents the object. But if we have stripped away all of the information objects, what is the “object” in the system, what do we know about it? Perhaps all we know is that it has identity. The identity of the object is all that we can talk about – everything else is an “opinion” about one aspect of the object.
    Since we also don’t have any way to know “true identity”, all we can know is identifiers (arbitrary signs). We can have identifiers for the object, identifiers that are different from the identifiers of the information objects. Now we have a hook – we can say “information object identifier” “represents” “object identifier”. That, with one additional assertion “object identifier” sameAs “object Identifier” provides us with the crucial link between information objects and objects in the world.
    So, lets say that every DBMS row, XML element and model element had a “represents” link. We could then have a handle to know that these very different information objects represent the SAME THING, We have a way to link our data, our models and our abstractions – even if the identity of the object is arbitrary.
    Of course the “represents” assertion is as much an opinion as anything else, but where those assertions are trusted we can connect the dots between information objects about the same thing. We also need to STOP confusing the identity of information objects with the identity of objects.
    So my suggestion, which I think builds on your triangle, is that we recognize both information objects and objects as having identity, but different identities, and that we have a pervasive link type for information objects “represents” that is the pivot point between information objects representing the same thing.
    So a very simple generic addition “represents” and a convention for object identity (perhaps with a new URI type like “id://corycasanave.modeldriven.org”) we can start to build the connections between our stovepipes. With that capability we can also then imagine identity registries popping up that contain nothing but identifiers for these real word objects and even abstract concepts. These would be the subject of “represents” assertions and provide a link between very diverse systems.

  4. Cory,

    You state:
    “Now we have a hook – we can say “information object identifier” “represents” “object identifier”.

    Then you state:
    “So a very simple generic addition “represents” and a convention for object identity (perhaps with a new URI type like “id://corycasanave.modeldriven.org”

    But, what's wrong with:
    ?

    Where:

    Identifies the Information Object (bearer of constellation of characteristics that coalesce around the Identifier of a Real World Object); Where, the format of the Information Object structure is only constrained by the representation manifestation capabilities of the medium of perception, e.g., mime types re. the Internet medium.

    And:
    is the Real World Object Identifier, the conduit to the Information Object that bears its description (metadata).

    Rick:
    It would be nice if you could place Referent, Identifier, and Representation in the Triangles above, so as to add to the overall clarity of this nice post.

    Kingsley

  5. Kingsley,
    My assumption about the relationship between real world objects and information object is that:
    • There may be any number of information objects about a particular real world object
    • These different information objects may represent different conceptions of the real world object and may not have the same subset of concepts, terms, signs, data or metadata
    • The deferent information objects come from different sources and different authorities in different graphs and may not agree on the facts about the real world object
    • But, they agree it is the same real world object

    So,
    • We can’t use “the Real World Object Identifier, the conduit to the Information Object that bears its description (metadata)” since they (the information objects) may not share the same metadata
    • We can’t use “constellation of characteristics that coalesce around the Identifier of a Real World Object” because we may or may not agree on those characteristics.
    • We can’t use “?” as the binding point because it can’t be referenced across graphs.

    While there will probably be a subset of characteristics and metadata that we share as the basis for forming our assertion that they represent the same real world object we should not assume any such correspondence without it being asserted or computed is a manor we trust.

    So what we are recognizing is that any model is just an opinion about the world. What we do, constantly, is compare and negotiate these various opinions. What most current representations & logics have not done is recognize these information objects as such opinions such that we can compare and leverage different opinions about the same thing. “represents” is then a way to link an opinion with the subject of the opinion. This also gives us a hook for understanding the source and trust of each such opinion.

  6. Kingsley,
    My assumption about the relationship between real world objects and information object is that:
    • There may be any number of information objects about a particular real world object
    • These different information objects may represent different conceptions of the real world object and may not have the same subset of concepts, terms, signs, data or metadata
    • The deferent information objects come from different sources and different authorities in different graphs and may not agree on the facts about the real world object
    • But, they agree it is the same real world object

    So,
    • We can’t use “the Real World Object Identifier, the conduit to the Information Object that bears its description (metadata)” since they (the information objects) may not share the same metadata
    • We can’t use “constellation of characteristics that coalesce around the Identifier of a Real World Object” because we may or may not agree on those characteristics.
    • We can’t use “?” as the binding point because it can’t be referenced across graphs.

    While there will probably be a subset of characteristics and metadata that we share as the basis for forming our assertion that they represent the same real world object we should not assume any such correspondence without it being asserted or computed is a manor we trust.

    So what we are recognizing is that any model is just an opinion about the world. What we do, constantly, is compare and negotiate these various opinions. What most current representations & logics have not done is recognize these information objects as such opinions such that we can compare and leverage different opinions about the same thing. “represents” is then a way to link an opinion with the subject of the opinion. This also gives us a hook for understanding the source and trust of each such opinion.

  7. Cory,

    At no point have I implied that an HTTP URI is anything more that an sign post into its creators World View :-)

    We are all going to observe aspects of the same object differently, we pack our observations into representations that are associated with the Identifiers that we mint.

    No absolute truths of any kind, just opinions and claims packaged in digital form :-)

    Kingsley

  8. Cory,

    At no point have I implied that an HTTP URI is anything more that an sign post into its creators World View :-)

    We are all going to observe aspects of the same object differently, we pack our observations into representations that are associated with the Identifiers that we mint.

    No absolute truths of any kind, just opinions and claims packaged in digital form :-)

    Kingsley

  9. Kingsley,
    Exactly “just opinions and claims packaged in digital form”, I like that way of explaining it.

    So what I am suggesting, abive, is a way to relate various opinions about the same things. “SameAs” is commits to much, represents is more like the concept we would like to say that the multiple opinions represent the same thing in the world.

    -Cory

  10. Kingsley,
    Exactly “just opinions and claims packaged in digital form”, I like that way of explaining it.

    So what I am suggesting, abive, is a way to relate various opinions about the same things. “SameAs” is commits to much, represents is more like the concept we would like to say that the multiple opinions represent the same thing in the world.

    -Cory

  11. [...] Linked Data: Interpretants and Interpretation (phaneron.rickmurphy.org) [...]

  12. [...] Linked Data: Interpretants and Interpretation (phaneron.rickmurphy.org)(Cross-posted @ The Cloud of Data)Posted Under : Cloud Technology Tags Semantic Web Creative Commons Open Data Linked Data ISWC licencing ISWC2009 International Semantic Web Conference Tim O’Reilly Tom Heath Leigh Dodds Jordan Hatcher web2summit Licensing Tim Berners-Lee Open Data Commons Share this article:          Permalink TrackBack No one has commented yet! Be the first one to comment! Your comment is awaiting moderation! Post Comment [...]