I gave a talk (video, slides, and paper) at ICFP last month, arguing that it can be easy to build a high-assurance compiler. I gave a similar talk as a keynote a couple weeks later at the very enjoyable Midwest Verification Day, hosted by Kansas University this year (thanks Andy Gill and Perry Alexander for inviting me!). This paper wraps up the Copilot project. I had a great time (I mean, how often do formal methods engineers get to be around NASA subscale jet aircraft?!?).
My Copilot project, sponsored by NASA, was featured in the Aerospace America magazine “Year in Review” edition (December 2011). We recently finished up the project, which focused on solving the problem of failure in complex embedded systems (like flight control systems in commercial aircraft). Our approach was a language called Copilot for runtime monitoring of embedded systems corroborated with experiments on airspeed sensor systems and an autopilot communication link.
Here’s a link to the page from the magazine mentioning us from the author’s page. (The relevant paragraph is in the middle of the right column).
Thanks to Galois and the National Institute of Aerospace where the work was done, the NASA folks we collaborated with, and especially Alwyn Goodloe, Nis Wegmann, Sebastian Niller, and Robin Morisset who worked on the project with me.
Stable names in GHC “are a way of performing fast (O(1)), not-quite-exact comparison between objects.” Andy Gill showed how to use them to extract the explicit graph from writing recursive functions in his Data.Reify package (and associated paper). It’s a great idea and very practical for embedded domain-specific languages—we’ve used the idea in Copilot to recover sharing.
However, consider this example, with three tests executed in GHCI.
For a function with type constraints, stable names fails to “realize” that we are pointing to the same object. As a couple of my colleagues pointed out, the cause is the dictionary being passed around causing new closures to be created. Simon Marlow noted that if you compile with
-O, the explicit dictionaries get optimized away.
Here are the solutions I have to “fixing” the problem, in the context of a DSL:
- Tell your users that recursive expressions must be monomorphic—only “pure functions” over the expressions of your DSL can be polymorphic.
- Implement a check in your reifier to see how many stable names have been created—if some upper-bound is violated, then the user has created an infinite expression, the expression is extremely large (in which case the user should try to use some sharing mechanism, such as let-expressions inside the language), or we’ve hit a stable-names issue.
- Ensure your DSL programs are always compiled.
- Of course, you can always take another approach, like Template Haskell or not using recursion at the Haskell level; also check out Andy Gill’s paper for other solutions to the observable sharing problem.
I don’t see how to use (deep)seq to fix the problem, at least as it’s presented in the example above, but I’d be keen to know if there are other solutions.
I’ve been working on a domain-specific language that is embedded in Haskell (an “eDSL”) that essentially takes a set of Haskell stream (infinite list) equations and turns them into a real-time C program implementing the state-machine defined by the streams. It’s called Copilot, and in fact, it’s built on top of another Haskell eDSL called Atom,1 which actually does the heavy lifting in generating the C code.
For example, here’s the Fibonacci sequence in Copilot:
fib = do let f = varW64 "f" f .= [0,1] ++ f + (drop 1 f)
I’ve been writing Copilot libraries recently, and I’ve had the following realization about eDSLs and meta-programming (let me know if someone has had this idea already!). Many languages treat meta-programming as a second-class feature—I’m thinking of macros used by the C preprocessor, for example (it’s probably generous even to call the C preprocessor ‘meta-programming’). One reason why Lisp-like languages were exciting is that the language is a first-class datatype, so meta-programming is on par with programming. In my experience, you don’t think twice about meta-programming in Lisp. (Haskell is more like C in this regard—you do think twice before using Template Haskell.)
So languages generally treat meta-programming as either second-class or first-class. What’s interesting about eDSLs, I think, is that they treat meta-programming as first-class, but programming as second-class! This isn’t surprising, since an eDSL is a first-class datatype, but the language is not first-class—its host language is.
Practically, what this means is that you spend very little time actually writing eDSL programs but rather generating eDSL programs. It is natural to layer eDSLs on top of other eDSLs.
This is just how Copilot came about: I was writing various Atom programs and realized that for my purposes, I just needed a restricted set of behaviors provided by Atom that are naturally represented by stream equations (and make some other tasks, like writing an interpreter, easier).
But as soon as Copilot was written, we2 started writing libraries implementing past-time linear temporal logic (LTL) operators, bounded LTL operators, fault-tolerant voting algorithms, regular expressions, and so on.
You wouldn’t think about combining the intermediate languages of a compiler into the same program. The idea of a language is more fluid, more organic in the context of eDSLs, where now libraries can be quickly written and different levels can be easily combined.
1Tom Hawkins wrote Atom.
2Credit for Copilot also goes to Sebastian Niller, Robin Morisset, Alwyn Goodloe.