High Assurance Functional RDF in Haskell

As 2011 winds to a close I want to share the results of my recent code writing project.  The purpose of the project is to demonstrate each of the capabilities suggested by the title above in a working solution and to develop a set of practices to deliver such a solution. After quite some time in the Haskell woodshed, I finally have sufficient tacit knowledge to run the interpreter and compiler and productively diagnose their feedback. This post provides the opportunity to get additional feedback from the Haskell community on how to improve both the code and practices.

The simple use case is to display to the operator of a terminal the elapsed time in seconds to load a file of RDF data. The single assumption is that the data is encoded in JSON according to this Talis specification. The single constraint is that the load time must be less than 10 seconds for a 2.5MB load.

The non-functional capabilities of the solution are more interesting. I use the encodings in Paul Tarau’s Isomorphism library both to encode the data as numbers then later recover the original text. The functions to reduce the data to a directed graph (r2dig), the function that finds the indices (ndcs) and those that recover the text from the encoding (recover and decode) are the most original work. The solution includes a graph matching function indirectly related to the use case. This solution could be developed further as a triple store with various functions that take advantage of the encodings.

The high assurance reference speaks to use of the GHC type system as well the practice of verification through QuickCheck properties. This practice implies more than coding properties. It implies devising a verification approach that both avoids selection bias and proves mathematical properties specific to the solution. No doubt the Haskell type system supports the high assurance claim. With some excellent advice from David McBride, the Talis specification of the JSON encoding provided sufficient guidance to construct then refactor data types and associated instances of the FromJSON type class in Bryan O’Sullivan’s Aeson library.

The functional reference speaks to evolving my Haskell style.  The functional style became more natural to me as I further developed the solution. Haskell is the fourth programming language I’ve adopted in my career.

As important as the language are the practices in delivering a high assurance, high(er) performing solution. In addition to documentation with HUnit and verification with QuickCheck, those practices include the following: examining symmetric multiprocessor parallelism (smp), profiling, exploring heuristics (-K and -H flags) in the compiler and run time system, inlining, optimizing for fitness with a genetic algorithm.

Inlining and heuristics were the most useful practices to reduce execution time. A manual binary search, above and below abductive thresholds allowed the use of heuristics to reduced execution time.  Inlining the r2dig and ndcs functions reduced execution time from 10s to a little more than 5s on my Quadcore i7. Pure SMP was not fruitful. Early indications were that an SMP approach would  be comparable to a single thread. I did not explore Parallel IO. I welcome any suggestion on how to incorporate SMP.  I ran a half dozen simply configured Acovea sessions with some optimization. See configuration and reports.

You will find the files on github/hafr. I welcome feedback on refactoring, style, performance tuning, etc.

Looking ahead to 2012, I will be extending my high assurance capabilities by investing more time in Hets. The functional target is Information Flow. I’m still waffling on the Isabelle v. Agda decision.  You can do that in Washington.

Values, Types and Foundations

I took the opportunity to step out for the first time in quite a while and present Values, Types and Foundations at the Object Management Group’s (OMG) September Technical Meeting. Many thanks to my friends there, especially Larry Johnson and John Butler, Co-Chairs of the GovDTF, for the opportunity to do so.

Values, Types and Foundations is a broad survey presentation and demonstration. The presentation very generally describes a theoretical foundation for OMG standards such as the Meta Object Facility and the Object Constraint Language. By general I mean a computational, logical and philosophical overview through System F, the Calculus of Constructions, Martin-Lof Type Theory and dependent types. My goal was to give the talk that Philip Wadler would give to the OMG. The presentation also contains an informational brief on Vladimir Voevodsky’s Univalent Foundations program at the Institute for Advanced Studies.

The first four demonstrations use most of Oleg Kiseylov’s Interpreting Types as Abstract Values, Constructive Law of Excluded Middle, and Impredicativity Bites. I don’t add much of anything new to the demos. I compare and contrast the test cases Oleg provides and dump the stack traces to speed up the demonstration time. That’s way more than enough.

The last demonstration is based on Anton Setzer’s Interactive Theorem Proving for Agda Users.

Other than that I review some background on Types, Classes and Sets based on Peter Aczel’s What is a Set? Thanks to Cory Casanave for pointing out the importance of providing this background.

During the voice over and at the conference I contrasted the approach in Values, Types and Foundations with that of ISO 24707 Common Logic (CL) which is a type free logic. Pat Hayes describes CL as first order syntax and higher order semantics. Hmmm. I recommend Chris Menzel’s  Knowledge Representation, the World Wide Web and the Evolution of Logic for those  interested in the design goals that motivated ISO 24707.

I plan to step out a bit more in 2012. I will be connecting Values, Types and Foundations back to some topics I wrote about here on the Phaneron a few years ago such as  the Heterogeneous Tool Set and the Information Flow Framework. The wood shedding has paid off and I’m looking forward to catching up with a few old friends and mixing it up with some new folks.

Quiet Nights and Quiet Stars

Quiet chords on my guitar … and quiet from Rick for the past few months.

Quiet on the Phaneron means there will be much to say. As advertised, the first six months of 2011 have been about thinking small. Yes, I’ve been in the woodshed again and truly enjoy the solitude. Much like an earlier period in my life when internalizing fingerings, modes and inversions required the quiet to allow careful concentration, so have these past six months allowed for examining nightly releases of the Glasgow Haskell Compiler, its Regular Shape Polymorphic Parallel Array (REPA) library and dependent types in a AGDA.

After having taken a pretty close look at topics in software development and information technology architecture over the past few decades I’m very excited to get under the hood with GHC and its back-ends. I hope to share some experiences by year end on optimizing GHC back-end selection and configuration management on my 4-core i7 using genetic algorithms as described by Don Stewart in Evolving Faster Haskell Programs (now with LLVM). I also hope to get access to some cooler hardware, possibly through some academic connections.

Closely examining REPA seems a good fit for my development in Haskell. Converting Paul Tarau’s list-based implementation of this bijective mapping between Godel numberings and Term Algebras in REPA is a valuable exercise in both learning this library and the tacit knowledge I need to become a journeyman on GHC. The next stage of this exercise will include verification with Quickcheck. I’ve also taken the opportunity to do a retrospective on REPA by examining some of the early work on parallel list notation [:e:] and state transformer monads.  There’s no shortage of great papers out there on Haskell. See my del.icio.us bookmarks. This retrospective will provide the right foundation for later investigation in Data Parallel Haskell. Seems there’s time before the next release.

It looks like I’ll drop my plans to build expertise in Isabelle/HOL in favor of Agda. I just don’t have the bandwidth to internalize that much. Agda allows me to leverage my Haskell investment.

If you read A Winter of Philosophy and Logic, you’ll also be interested to know Valdimir Voevodsky recently announced a special program to develop the Univalent Foundations of Mathematics along with Steve Awodey and Thierry Coquand. Voevodsky has taken an interest in Martin-Lof type theory. He’s demoed some proofs in Coq, but he’s also active on the Agda listserv. And Coquand & Co seem to be pro-Agda. Voevodsky’s stature in math circles could do a lot for this subject. I’ll be tracking this all pretty closely.