Copilot featured in Aerospace America

My Copilot project, sponsored by NASA, was featured in the Aerospace America magazine “Year in Review” edition (December 2011). We recently finished up the project, which focused on solving the problem of failure in complex embedded systems (like flight control systems in commercial aircraft).  Our approach was a language called Copilot for runtime monitoring of embedded systems corroborated with experiments on airspeed sensor systems and an autopilot communication link.

Here’s a link to the page from the magazine mentioning us from the author’s page. (The relevant paragraph is in the middle of the right column).

Thanks to Galois and the National Institute of Aerospace where the work was done, the NASA folks we collaborated with, and especially Alwyn Goodloe, Nis Wegmann, Sebastian Niller, and Robin Morisset who worked on the project with me.


Stable Names in Haskell

Stable names in GHC “are a way of performing fast (O(1)), not-quite-exact comparison between objects.”  Andy Gill showed how to use them to extract the explicit graph from writing recursive functions in his Data.Reify package (and associated paper).  It’s a great idea and very practical for embedded domain-specific languages—we’ve used the idea in Copilot to recover sharing.

However, consider this example, with three tests executed in GHCI.

For a function with type constraints, stable names fails to “realize” that we are pointing to the same object. As a couple of my colleagues pointed out, the cause is the dictionary being passed around causing new closures to be created. Simon Marlow noted that if you compile with -O, the explicit dictionaries get optimized away.

Here are the solutions I have to “fixing” the problem, in the context of a DSL:

  • Tell your users that recursive expressions must be monomorphic—only “pure functions” over the expressions of your DSL can be polymorphic.
  • Implement a check in your reifier to see how many stable names have been created—if some upper-bound is violated, then the user has created an infinite expression, the expression is extremely large (in which case the user should try to use some sharing mechanism, such as let-expressions inside the language), or we’ve hit a stable-names issue.
  • Ensure your DSL programs are always compiled.
  • Of course, you can always take another approach, like Template Haskell or not using recursion at the Haskell level; also check out Andy Gill’s paper for other solutions to the observable sharing problem.

I don’t see how to use (deep)seq to fix the problem, at least as it’s presented in the example above, but I’d be keen to know if there are other solutions.


FM@Galois Talks

As I posted over on the Galois blog, I recently gave a talk a couple of times on formal methods at Galois over the years (pdf). It’s been fun putting theory into practice!


Making your Ubuntu life better

I’ve had a lot of trouble with Ubuntu 11.04 (Natty Narwhal) on a laptop (ThinkPad 420), and I’ve had problems including:

  • Not being able to use dual monitors,
  • Random logoffs.

I tentatively think I was able to solve them with two easy fixes:

  • Turn off Unity. You can do this in the login screen.
  • In the Update Manager, under “Settings”, check “Proposed updates”.

With the proposed updates, Unity may be working; try that at your own risk. With that, I have a reasonably stable system.


Meta-Programming and eDSLs

I’ve been working on a domain-specific language that is embedded in Haskell (an “eDSL”) that essentially takes a set of Haskell stream (infinite list) equations and turns them into a real-time C program implementing the state-machine defined by the streams. It’s called Copilot, and in fact, it’s built on top of another Haskell eDSL called Atom,1 which actually does the heavy lifting in generating the C code.

For example, here’s the Fibonacci sequence in Copilot:

fib = do let f = varW64 "f" f .= [0,1] ++ f + (drop 1 f) 

I’ve been writing Copilot libraries recently, and I’ve had the following realization about eDSLs and meta-programming (let me know if someone has had this idea already!). Many languages treat meta-programming as a second-class feature—I’m thinking of macros used by the C preprocessor, for example (it’s probably generous even to call the C preprocessor ‘meta-programming’). One reason why Lisp-like languages were exciting is that the language is a first-class datatype, so meta-programming is on par with programming. In my experience, you don’t think twice about meta-programming in Lisp. (Haskell is more like C in this regard—you do think twice before using Template Haskell.)

So languages generally treat meta-programming as either second-class or first-class. What’s interesting about eDSLs, I think, is that they treat meta-programming as first-class, but programming as second-class! This isn’t surprising, since an eDSL is a first-class datatype, but the language is not first-class—its host language is.

Practically, what this means is that you spend very little time actually writing eDSL programs but rather generating eDSL programs. It is natural to layer eDSLs on top of other eDSLs.

This is just how Copilot came about: I was writing various Atom programs and realized that for my purposes, I just needed a restricted set of behaviors provided by Atom that are naturally represented by stream equations (and make some other tasks, like writing an interpreter, easier).

But as soon as Copilot was written, we2 started writing libraries implementing past-time linear temporal logic (LTL) operators, bounded LTL operators, fault-tolerant voting algorithms, regular expressions, and so on.

You wouldn’t think about combining the intermediate languages of a compiler into the same program. The idea of a language is more fluid, more organic in the context of eDSLs, where now libraries can be quickly written and different levels can be easily combined.

1Tom Hawkins wrote Atom.
2Credit for Copilot also goes to Sebastian Niller, Robin Morisset, Alwyn Goodloe.


Haskell and Hardware for the Holidays

Looking to make a statement this holiday season?  You could try to win the office “ugly holiday sweater” contest.  Or, you could play “Jingle Bells” on your Arduino microcontroller, using Haskell.  This post is about the latter.

We’re going to write this small program using the Copilot embedded domain-specific language (on Hackage and the source on Github).  Copilot is a stream language that allows you to generate embedded C code from programs written essentially as Haskell lists (using Atom as a backend for the C code generation).  This post is about how to use Copilot/Haskell (v. 1.0) to make your embedded C programming easier and more likely to be correct.  Here’s what we’re going to do—please don’t look too closely at my soldering, and turn the volume up, since a piezo speaker isn’t loud:

(For the impatient, the Haskell file is here, and the generated .c and .h files are here and here, respectively.)

We’re going to essentially recreate this C/Wiring program, plus flash some LEDs, but hopefully in a easier, safer way.  We need to manage three tasks:

  1. Determine the note and number of beats to play.
  2. Play the piezo speaker.
  3. Flash the LEDs.

We’ll start by defining which pins control what function:

-- pin numbers
speaker, red, green :: Spec Int32
speaker = 13
red     = 12
green   = 11

The type Spec Int32 takes an Int32 and lifts it into a Copilot expression.

We’ll call the program cycleSong. The type of a Copilot program is Streams, which is a collection of Spec a`s, and it resides within the Writer Monad. First, we’ll declare some variables.

cycleSong :: Streams
cycleSong = do
  -- Copilot vars
  let idx       = varI32 "idx"
      notes     = varI32 "notes"
      duration  = varI32 "duration"
      odd       = varI32 "odd"
      even      = varI32 "even"
      playNote  = varB   "playNote"
  -- external vars
      note = extArrI32 "notes" idx
      beat = extArrI32 "beats" idx

There are two classes of variables: Copilot variables that will refer to streams (infinite lists), and external variables, which can refer to data from C (including the return values of functions, global variables, and arrays). The constructors are mnemonics for the type of the variables; for example, varI32 is a variable that will refer to a stream of Int32s. Similarly, extArrI32 is an external variable referring to a C array of Int32s (i.e., int32_t). Notice the idx argument—it is the stream of values from which the index into the array is drawn (constants can also be used for indexes).

Now for the actual program:

 idx      .= [0] ++ (idx + 1) `mod` (fromIntegral $ length notesLst)
 notes    .= note
 duration .= beat * 300
 odd      .= mux (idx `mod` 2 == 1) green red
 even     .= mux (idx `mod` 2 == 1) red green
 playNote .= true
 -- triggers
 trigger playNote "digitalWrite" (odd <> true)
 trigger playNote "digitalWrite" (even <> false)
 trigger playNote "playtone" (speaker <> notes <> duration)

And that’s basically it.  There are two parts to the program, the definition of Copilot streams, which manage data-flow and control, and triggers, which call an external C function when some property is true.  Copilot streams look pretty much like Haskell lists, except that functions are automatically lifted to the stream level for convenience.  Thus, instead of writing,

 x = [0] ++ map (+1) x

in Copilot, you simply write

 x .= [0] ++ x + 1

Similarly for constants, so the Copilot stream definition

playNote .= true

lifts the constant true to an infinite stream of true values. The function mux is if then elsemux refers to a 2-to-1 multiplexer. So that means that the stream odd takes the value of green when idx is odd, and red otherwise, where green and red refer to the pins controlling the respective LEDs.

Just to round out the description of the other defined streams, idx is the index into the C arrays containing the notes and beats, respectively—that’s why we perform modular arithmetic. The stream duration tells us how long to hold a note; 300 is a magic “tempo” constant.

Now for the triggers. Each of our triggers “fires” whenever the stream playNote is true; in our case, because it is a constant stream of trues, this happens on each iteration. So on each iteration, the C functions digitalWrite and playTone are called with the respective function arguments (‘<>‘ separates arguments). The function digitalWrite is a function that is part of the Wiring language, which is basically C with some standard libraries, from which digitalWrite comes. We’ll write playTone ourselves in a second.

The C Code

We need a little C code now.  We could write this directly, but we’ll just do this in Haskell, since there’s so little we need to write—the Copilot program handles most of the work.  But a caveat: it’s a little ugly, since we’re just constructing Haskell strings. Here are a few functions (included with Copilot) to make this easier, and here are some more. (If someone properly writes a package to write ad-hoc C code from Haskell, please leave a comment!)

First, we need more magic constants to give the frequency associated with notes (a space is a rest).

freq :: Char -> Int32
freq note  =
  case note of
    'c' -> 1915
    'd' -> 1700
    'e' -> 1519
         ...

and here are the notes of the song and the beats per note:

jingleBellsNotes = "eeeeeeegcdefffffeeeddedgeeeeeeegcdefffffeeggfdc"
jingleBellsBeats =
  [ 1,1,2  , 1,1,2, 1,1,1,1, 4
  , 1,1,1,1, 1,1,2, 1,1,1,1, 2,2
  , 1,1,2  , 1,1,2, 1,1,1,1, 4
  , 1,1,1,1, 1,1,2, 1,1,1,1, 4
  ]

The other main piece of C code we need to write is the function playtone. The piezo speaker is controlled by pulse width modulation, basically meaning we’ll turn it on and off really fast to simulate an analogue signal. Here is it’s definition (using a little helper Haskell function to construct C functions):

    [ function "void" "playtone" ["int32_t speaker", "int32_t tone", "int32_t duration"] P.++ "{"
    , "#ifdef CBMC"
    , "  for (int32_t i = 0; i < 1; i ++) {"
    , "#else"
    , "  for (int32_t i = 0; i < duration * 1000; i += tone * 2) {"
    , "#endif"
    , "    digitalWrite(speaker, HIGH);"
    , "    delayMicroseconds(tone);"
    , "    digitalWrite(speaker, LOW);"
    , "    delayMicroseconds(tone);"
    , "  }"
    , "}"
    ]

HIGH, LOW, digitalWrite, and delayMicroseconds are all part of the Wiring standard library.  That ifdef is for verification purposes, which we’ll describe in just a bit.

Besides a little more cruft, that’s it!

Test, Build, Verify

“Jersey Shore” may have introduced you to the concept of gym, tan, laundry, but here we’ll stick to test, build, verify.  That is, first we’ll test our program using the Copilot interpreter, then we’ll build it, then we’ll prove the memory safety of the generated C program.

  • Interpret. We define a function that calls the Copilot interpreter:
    interpreter =
      interpret cycleSong 20
        $ setE (emptySM {i32Map = fromList [ ("notes", notesLst)
                                           , ("beats", beatsLst)]})
        baseOpts

    This calls the Copilot interpreter, saying to unroll cycleSong 20 times. Because the Copilot program samples some external C values, we need to provide that data to the interpreter. Fortunately, we have those arrays already defined as Haskell lists. Executing this, we get the following:

    period: 0   duration: 300   even: 11   idx: 0   notes: 1519   odd: 12   playNote: 1
    period: 1   duration: 300   even: 12   idx: 1   notes: 1519   odd: 11   playNote: 1
    period: 2   duration: 600   even: 11   idx: 2   notes: 1519   odd: 12   playNote: 1
    period: 3   duration: 300   even: 12   idx: 3   notes: 1519   odd: 11   playNote: 1
    period: 4   duration: 300   even: 11   idx: 4   notes: 1519   odd: 12   playNote: 1
    period: 5   duration: 600   even: 12   idx: 5   notes: 1519   odd: 11   playNote: 1
    period: 6   duration: 300   even: 11   idx: 6   notes: 1519   odd: 12   playNote: 1
                                                   . . .

    Good, it looks right. (period isn’t a Copilot variable but just keeps track of what round we’re on.)

  • Build. To build, we generate the C code from the Copilot program, then we’ll use a cross-compiler targeting the ATmega328. The easiest way (I’ve found) is via Homin Lee’s Arscons. Arscons is based on Scons, a Python-based build system. Arscons makes three assumptions: (1) the program is written as a Wiring program (e.g., there’s a loop() function instead of a main() function is the main difference), (2) the extension of the Wiring program is .pde, and (3) the directory containing the XXX.pde is XXX. For us, all that really means is that we have to change the extension of the generated program from .c to .pde. So we define
    main :: IO ()
    main = do
      compile cycleSong name
        $ setPP cCode  -- C code for above/below the Copilot program
        $ setV Verbose -- Verbose compilation
        baseOpts
      copyFile (name P.++ ".c") (name P.++ ".pde") -- SConstruct expects .pde

    and then execute

    > runhaskell CopilotSong.hs

    to do this.

    To build the executable, we issue

    > scons

    then

    scons upload

    when we’re ready to flash the microcontroller.

  • Verify. Is the generated C program memory safe?  Wait… What do I mean by ‘memory safe’?  I’ll consider the program to be memory safe if the following hold:
    • No arithmetic underflows or overflows.
    • No floating-point not-a-numbers (NaNs).
    • No division by zero.
    • No array bounds underflows or overflows.
    • No Null pointer dereferences.

    Of course this is an approximates memory-safety, but it’s a pretty good start. If the compiler is built correctly, we should be pretty close to a memory-safe program. But we want to check the compiler, even though Haskell’s type system gives us some evidence of guarantees already. Furthermore, the compiler knows nothing about arbitrary C functions, and it doesn’t know how large external C arrays are.

    We can prove that the program is memory safe. We call out to CBMC, a C model-checker developed primarily by Daniel Kröning. This is whole-program analysis, so we have to provide the location of the libraries. We define

    verifying :: IO ()
    verifying =
      verify (name P.++ ".c") (length notesLst * 4 + 3)
        (     "-DCBMC -I/Applications/Arduino.app/Contents/Resources/Java/hardware/arduino/cores/arduino/ "
         P.++ "-I/Applications/Arduino.app/Contents/Resources/Java/hardware/tools/avr/avr-4/include "
         P.++ "--function cbmc")

    which calls cbmc on our generated C program. Let me briefly explain the arguments. First we give the name of the C program.

    Then we say how many times to unroll the loops. This requires a little thinking. We want to unroll the loops enough times to potentially get into a state where we might have an out of bounds array access (recall that the Copilot stream idx generates indexes into the arrays). The length of the C arrays are given by length notesLst. When compiling the Copilot program (calling the module’s main function, a periodic schedule is generated for the program). From the schedule, we can see that idx is updated every fourth pass through the loop. So we unwind it enough loop passes for the counter to have the opportunity to walk off the end of the array, plus a few extra passes for setup. This is a minimum bound; you could of course over-approximate and unroll the loop, say, 1000 times.

    Regarding loop unrolling, remember that #ifdef from the definition of playtone()? We include that to reduce the difficulty of loop unrolling. playtone() gets called on every fourth pass through the main loop, and unrolling both loops is just too much for symbolic model-checking (at least on my laptop). So we give ourselves an informal argument that the loop in playtone() won’t introduce any memory safety violations, and the #ifdef gives us one iteration through the loop if we’re verifying the system. A lot of times with embedded code, this is a non-issue, since loops can just be completely unrolled.

    The -D flag defines a preprocessor macro, and the -I defines a include path. Finally, the --function flag gives the entry point into the program. Because we generated a Wiring program which generates a while(1) loop for us through macro magic, we have to create an explicit loop ourselves for verification purposes.

    If you’re interested in seeing what things look like when they fail, change the idx stream to be

      idx .= [0] ++ (idx + 1)
    

    and cbmc will complain

    Violated property:
      file CopilotSing.c line 180 function __r11
      array `beats' upper bound
      (long int)__1 < 47
    
    VERIFICATION FAILED
    

So that’s it. Happy holidays!


Backseat Driving: Copilot Updates

A lot has been going on since the announcement of Copilot, a Haskell DSL for generating hard real-time C monitors. We’ve presented Copilot a few times, including at Runtime Verification 2010, at a Galois Technical Seminar (video of the talk is here), and at a recent NASA Technical Interchange.

Copilot has had five releases since we originally open-sourced the project.  Recent work has focused on making the language more straightforward and improving Copilot libraries.  But first, let me remind you how to use Copilot: you can compile specs to hard real-time C code, you can interpret them, you can model-check them, and you can generate specs to test the compiler and interpreter—you can see a bit about usage here.

For example, here’s a Copilot specification that generates the Fibonacci sequence (over Word64s) and tests for even numbers:

fib :: Streams
fib = do
  let f = varW64 "f"
  let t = varB "t"
  f .= [0,1] ++ f + (drop 1 f)
  t .= even f
  where even :: Spec Word64 -> Spec Bool
            even w' = w' `mod` 2 == 0

Notice that lists look almost exactly like Haskell lists.

What about something a little more complicated?  Consider the property:

If the temperature rises more than 2.3 degrees within 2 seconds, then the engine has been shut off.

We might use a Copilot specification like the following to express it, assuming that temp and shutoff are C variables being sampled at phases 1 and 2 respectively, and the period of execution is 1 second:

engine :: Streams
engine = do
  -- external vars
  let temp     = extF "temp" 1
  let shutoff  = extB "shutoff" 2
  -- Copilot vars
  let temps    = varF "temps"
  let overTemp = varB "overTemp"
  let trigger  = varB "trigger"
  -- Copilot specification
  temps    .= [0, 0, 0] ++ temp
  overTemp .= drop 2 temps > 2.3 + temps
  trigger  .= overTemp ==> shutoff

Here’s something that I think shows why you want to write your DSLs in Haskell: Haskell gives you a macro language for your DSL… for free.  For example, consider the following (more complicated) property:

“If the engine temperature exeeds 250 degrees, then the engine is shut off within one second, and in the 0.1 second following the shutoff, the cooler is engaged and remains engaged.”

We can more succinctly specify this property using past-time linear temporal logic (ptLTL).  There’s a Copilot library for writing those kind of specs, which can be interspersed with normal Copilot streams—the ptLTL specs are highlighted in blue below.  Again, assume a period of execution of 1 second:

engine :: Streams
engine = do
  -- external vars
  let engineTemp = extW8 "engineTemp" 1
  let engineOff  = extB "engineOff" 1
  let coolerOn   = extB "coolerOn" 1
  -- Copilot vars
  let cnt        = varW8 "cnt"
  let temp       = varB "temp"
  let cooler     = varB "cooler"
  let off        = varB "off"
  let monitor    = varB "monitor"
  -- Copilot specification
  temp    `ptltl` (alwaysBeen (engineTemp > 250))
  cnt     .=      [0] ++ mux (temp && cnt < 10) (cnt + 1) cnt
  off     .=      cnt >= 10 ==> engineOff
  cooler  `ptltl` (coolerOn `since` engineOff)
  monitor .=      off && cooler

Today, I finished updating another feature of Copilot: the ability to send stream values over ports to other components in a distributed system. We had an implementation of this, but it was a bit hacky. Hopefully, it’s a bit less hacky now. For example, consider the following specification:

distrib :: Streams
distrib = do
  -- Copilot vars
  let a = varW8 "a"
  let b = varB "b"
  -- Copilot spec
  a .= [0,1] ++ a + 1
  b .= mod a 2 == 0 
  -- send commands
  send "portA" (port 2) a 1
  send "portB" (port 1) b 2

The blue commands are send commands.  For example, the first command says, “call the C function portA(str, num), where argument str is the value of stream a and num is port number 1.” The port number says who to send it to.

These are just a few of the recent updates. We’re still working on Copilot, so let me know if you have questions or comments.

Interested? Get Copilot on Hackage or GitHub.


Follow

Get every new post delivered to your Inbox.