Archive for the ‘Uncategorized’ Category

Max: Phase 1 Report

May 18, 2016

I sent the following R&D report to my colleagues, but a few other folks outside Galois have expressed interest in the project, so I’m sharing it more broadly.


Subject: Max: Phase 1 Report

As some of you know, about nine months ago, I started a skunk-works R&D project with Brooke. Autonomous systems are all the rage these days, so we decided to try to create one. First, I have to be honest and say that although I participated in the initial project kickoff, I mostly played a supporting role after that. Brooke did all the major software and hardware development. (If you’ve worked with me on a project, this should sound pretty familiar.) Once Brooke started development, she really threw herself into it. She seemed to be working on things day and night, and it even looked a bit painful at times. She sensed she was getting close to an alpha release a few days ago, and after a final four hour sprint, she made the release at 2:30am on Mothers Day! We are officially out of stealth mode.

We call our project “Machine Awareness X”, or Max for short. The current system is capable of basic knowledge discovery and processing. Now, don’t get too excited; we expect at least a few years before it’s able to do something interesting with the knowledge acquired. I won’t go into the technical details, but the programming model is very biological—think “Game of Life” on steroids. At this point, we’ll have to continue to provide guidance and some rules, but its basically self-programming.

Following a “Johnny Ive” approach to design, Max has basically one notification method. It’s a fairly piercing audio signal used whenever his power supply is running low or there’s a hardware problem. (Frankly, sometimes it seems to just go off for no reason at all.) We designed it to be loud enough to hear across the house, but I wish it had a volume control. Next time! Otherwise, the package is quite attractive, in my opinion, even cute. Unfortunately, at 7lbs. 8oz., the hardware is heavier than even a decade-old laptop, and we expect new versions to require an even larger form factor. Fortunately, we designed the system to be self-propelling, although it’ll take a few years before that hardware is developed (the software isn’t ready for it anyways).

There’s still quite a bit of work to do. Our back-of-the-envelope estimate is that we’ll have to spend just short of two decades caring for Max before he’s fully autonomous. Even more disappointingly, we’re estimating having to spend up to a quarter million (in today’s dollars) in upkeep and maintenance! (Sadly, while others are interested in playing with the system intermittently, nobody seems that interested in joining us as early investors.) Despite all the training we’re planning to provide, the system seems too complicated to guarantee certain behaviors. For example, while more general than an autonomous car, it may take more than 15 years of training before his software is capable of piloting a conventional automobile.

I’m guessing some of you are wondering about commercialization opportunities. The good news: we expect Max to be useful to society (we haven’t found a killer app yet, though) and to generate quite a bit of revenue over its lifetime. The bad news: we don’t expect it to start producing reliable revenue for more than 20 years. What’s more, it has a lot of upkeep expenses that will only grow with time. This might sound like science fiction, but we imagine he might even replicate himself in the distant future, and will likely pour his revenues into his own replicants. In short, we don’t expect to make a dime from the project.

More seriously, we had a kid; mom and baby are doing fine. See you soon.

Papa Pike

15 Minutes

February 9, 2015

Some of the work I lead at Galois was highlighted in the initial story on 60 Minutes last night, a spot interviewing Dan Kaufman at DARPA. I’m Galois’ principal investigator for the HACMS program, focused on building more reliable software for automobiles and aircraft and other embedded systems. The piece provides a nice overview for the general public on why software security matters and what DARPA is doing about it; HACMS is one piece of that story.

I was busy getting married when filming was scheduled, but two of my colleagues (Dylan McNamee and Pat Hickey) appear in brief cameos in the segment (don’t blink!). Good work, folks! I’m proud of my team and the work we’ve accomplished so far.

You can see more details about how we have been building better programming languages for embedded systems and using them to build unpiloted air vehicle software here.

All Hammers are Terrible

December 29, 2014

“All computers are terrible.”

On social media, this turn of phrase is almost a meme. The sentiment is reasonable at some level—computers are complicated, buggy, and insecure. But the general public is largely indifferent, or least, they accept the situation for what it is: Computer freezes? turn it on then off again. Smart phone insecure? Well, the convenience of digital wallets outweighs the risk. My parents, who are not computer scientists, never say that all computers are terrible.

Rather, this is a complaint from within, by computer scientists themselves.  What drives this thinking? (While not specifically addressing the cause of the sentiment, this blog post provides a nice counterpoint.) No immediately obvious hypotheses seem particularly explanatory:

  • Ludditism: perhaps despite (or because of) our work in technology, computer scientists abhor it. But I find this explanation wanting, particularly because the focus of contempt is specifically about computers, and not about technological advancement generally.
  • Pessimism: perhaps there is a correlation between computer scientists and pessimists. Faced with the challenge of building good systems, they focus on the flaws rather than the progress. But it does not adequately explain the focus of the pessimism on computers rather than life in general—I don’t see posts stating that “all relationships are terrible”.
  • Realism: perhaps computers are terrible. Computer science history is measured in decades, and the approach to building systems is immature compared to other engineering disciplines. I tore down my old garage this summer. The city required a blueprint, erosion plan, a drainage plan, and a few hundred dollars for me to do so. They inspected the work afterwards. Contrast this with software: how much oversight does writing security-critical code get? Just about anyone can throw code up that purports to solve a problem. Still, given that programs are among the most complex artifacts built by humans, we’re doing pretty well.
  • Narcissism: finally, perhaps a tinge of narcissism drives these comments, something along the lines of, “where others have failed, I will succeed.” It is given as a tongue-in-cheek put-down regarding the work of fellow professionals. The motivation is there: if we thought that the tools of computer science were “good enough”, then the world wouldn’t need new programming languages, tools, or libraries. On the other hand, these kind of complaints are not just about directly competing work, and there will always be new inventions to make.

I tend not to hear the same kind of complaints from members of other professions. Yes, we all complain about our jobs, but not necessarily about our tools. Carpenters don’t say, “All hammers are terrible,” even if they complain about their company, their coworkers, the architect, the inspectors. Doctors don’t complain about X-ray machines, MRIs, or scalpels (but they might complain about their patients, insurance, and administration). Farmers don’t complain about the design of combines and tractors. Now, they might complain about a specific instance—“My tractor is terrible; it breaks down all the time.” But not about tractors, in general.

Other professions largely take their tools as a given. Indeed, most practitioners know so little about how the tools they use are built, it is difficult to critique them. A carpenter does not know how a skill saw is built and a doctor does not know how an X-rays are taken. Just whether they work or not and if they are useful or not.

Computer science is unique insofar as the tool makers are so closely connected with the tool users. There’s another glib quip in computer science: “patches welcome” meaning that if you (the user) thinks a tool should be improved, then it is your responsibility to make those improvements yourself. In other fields, the saying does not exist; it would be laughable! A taxi driver can’t be expected to invent a more fuel-efficient automobile; an architect can’t be expected to build better modeling software.

Computer science is unique in this respect, and it has philosophical, psychological, and sociological implications. Imagine, if you will, you are a carpenter, and you have some understanding of how the tools that you have are designed and built. Your skill saw is right-handed, but you are left-handed; with a few modifications, you realize its design can be abstracted, to accommodate either handedness. When your drill dies, instead of simply replacing it, you open it up, find the faulty circuit, and realize there’s a weakness in the wiring that allows it to overheat. When each of your tools fails you in some way, you uncover some flaw and realize they could be improved. Finally, you come to expect failure from every tool you might ever use: “All hammers are terrible.”

We all know that technology—whether we understand how it is built or not—has flaws. We know tradeoffs are made between cost, generality, performance, and reliability. There are more and less reliable, performant, and costly cars, saws, X-ray machines.

The general public certainly has opinions and a sense of the tradeoffs between the tools they use, e.g., Windows vs. Mac OS, Android vs. iOS. But the sentiment is that these are tools, with flaws and lifespans, just like a cars, appliances, and power tools.

I don’t have a strong opinion about whether computer scientists should complain or not about their tools, except, obviously, that general complaints don’t effect much change. My point is more philosophical: we find ourselves in an interesting profession, where we can open our very tools up, peer inside, and change them.

It’s a fascinating prospect.

SmartChecking Matt Might’s Red-Black Trees

August 20, 2014

Matt Might gave a nice intro to QuickCheck via testing red-black trees recently. Of course, QuickCheck has been around for over a decade now, but it’s still useful (if underused–why aren’t you QuickChecking your programs!?).

In a couple of weeks, I’m presenting a paper on an alternative to QuickCheck called SmartCheck at the Haskell Symposium.

SmartCheck focuses on efficiently shrinking and generalizing large counterexamples. I thought it’d be fun to try some of Matt’s examples with SmartCheck.

The kinds of properties Matt Checked really aren’t in the sweet spot of SmartCheck, since the counterexamples are so small (Matt didn’t even have to define instances for shrink!). SmartCheck focuses on shrinking and generalizing large counterexamples.

Still, let’s see what it looks like. (The code can be found here.)

SmartCheck is only interesting for failed properties, so let’s look at an early example in Matt’s blog post where something goes wrong. A lot of the blog post focuses on generating sufficiently constrained arbitrary red-black trees. In the section entitled, “A property for balanced black depth”, a property is given to check that the path from the root of a tree to every leaf passes through the same number of black nodes. An early generator for trees fails to satisfy the property.

To get the code to work with SmartCheck, we derive Typeable and Generic instances for the datatypes, and use GHC Generics to automatically derive instances for SmartCheck’s typeclass. The only other main issue is that SmartCheck doesn’t support a `forall` function like in QuickCheck. So instead of a call to QuickCheck such as

> quickCheck (forAll nrrTree prop_BlackBalanced)

We change the arbitrary instance to be the nrrTree generator.

Because it is so easy to find a small counterexample, SmartCheck’s reduction algorithm does a little bit of automatic shrinking, but not too much. For example, a typical minimal counterexample returned by SmartCheck looks like

T R E 2 (T B E 5 E)

which is about as small as possible. Now onto generalization!

There are three generalization phases in SmartCheck, but we’ll look at just one, in which a formula is returned that is universally quantified if every test case fails. For the test case above, SmartCheck returns the following formula:

forall values x0 x1:
T R E 2 (T B x1 5 x0)

Intuitively, this means that for any well-typed trees chosen that could replace the variables x0 and x1, the resulting formula is still a counterexample.

The benefit to developers is seeing instantly that those subterms in the counterexample probably don’t matter. The real issue is that E on the left is unbalanced with (T B E 5 E) on the right.

One of the early design decisions in SmartCheck was focus on structurally shrinking data types and essentially ignore “base types” like Int, char, etc. The motivation was to improve efficiency on shrinking large counterexamples.

But for a case like this, generalizing base types would be interesting. We’d hypothetically get something like

forall values (x0, x1 :: RBSet Int) (x2, x3 :: Int):
T R E x2 (T B x1 x3 x0)

further generalizing the counterexample. It may be worth adding this behavior to SmartCheck.

SmartCheck’s generalization begins to bridge the gap from specific counterexamples to formulas characterizing counterexamples. The idea is related to QuickSpec, another cool tool developed by Claessen and Hughes (and SmallBone). Moreover, it’s a bridge between testing and verification, or as Matt puts it, from the 80% to the 20%.

SmartCheck: Redux

June 16, 2014

A few years ago, I started playing with the idea of making counterexamples from failed tests easier to understand. (If you’re like me, you spend a lot of time debugging.) Specifically, I started from QuickCheck, a test framework for Haskell, which has since been ported to many other languages. From this, a tool called SmartCheck was born.

In this post, I’m not going to describe the technical details of SmartCheck. There’s a paper I wrote for that, and it was just accepted for publication at the 2014 Haskell Symposium (warning: the paper linked to will change slightly as I prepare the final version).

Rather, I want to give a bit of perspective on the project.

As one reviewer for the paper noted, the paper is not fundamentally about functional programming but about testing. This is exactly right. From the perspective of software testing in general, I believe there are two novel contributions (correct me ASAP if I’m wrong!):

  1. It combines counterexample generation with counterexample reduction, as opposed to delta-debugging-based approaches, which performs deterministic reduction, given a specific counterexample. One possible benefit is SmartCheck can help avoid stopping at local minima during reduction, since while shrinking, new random values are generated.  Update: as John Regehr points out in the comments below, his group has already done this in the domain of C programs.  See the paper.
  2. Perhaps the coolest contribution is generalizing sets of counterexamples into formula that characterize them.

I’d be interested to see how the work would be received in the software testing world, but I suppose first it’d need to be ported to a more mainstream language, like C/C++/Java.

Compared to QuickCheck specifically, QuickCheck didn’t used to have generics for implementing shrinking functions; recent versions include it, and it’s quite good. In many cases, SmartCheck outperforms QuickCheck, generating smaller counterexamples faster.  Features like counterexample generalization are unique to SmartCheck, but being able to test functions is unique to QuickCheck. Moreover, I should say that SmartCheck uses QuickCheck in the implementation to do some of the work of finding a counterexample, so thanks, QuickCheck developers!

When you have a tool that purports to improve on QuickCheck in some respects, it’s natural to look for programs/libraries that use QuickCheck to test it. I found that surprisingly few Hackage packages have QuickCheck test suites, particularly given how well known QuickCheck is. The one quintessential program that does contain QuickCheck tests is Xmonad, but I challenge you to think of a few more off the top of your head! This really is a shame.

The lack of (public) real-world examples is a challenge when developing new testing tools, especially when you want to compare your tool against previous approaches. More generally, in testing, it seems there is a lack of standardized benchmarks. What we really need is analogue of the SPEC CPU2000 performance benchmarks for property-based testing, in Haskell or any other language for that matter.  I think this would be a great contribution to the community.

In 1980, Boyer and Moore developed a linear-time majority vote algorithm and verified an implementation of it. It took until 1991 to publish it after many failed tries. Indeed, in the decade between invention and publication, others had generalized their work, and it being superseded was one of the reasons reviewers gave for rejecting it! (The full story can be found here.)

To a small extent, I can empathize. I submitted a (slightly rushed) version of the paper a couple years ago to ICFP in what was a year of record submissions. One reviewer was positive, one luke-warm, and one negative. I didn’t think about the paper much over the following year or two, but I got a couple of requests to put the project on Hackage, a couple of reports on usage, and a couple of inquiries about how to cite the draft paper. So after making a few improvements to the paper and implementation, I decided to try again to publish it, and it finally will be.

As I noted above, this is not particularly a Haskell paper. However, an exciting aspect of the Haskell community, and more generally, the functional programming community, is that it is often exploring the fringes of what is considered to be core programming language research at the time. I’m happy to be part of the fringe.

Embedded Security in the Mainstream

April 1, 2014

Science Friday had an interesting podcast with Bruce Schneier worth a listen.  It discussed security in embedded systems (or as is hip to say now, the Internet of Things).  The idea of security in embedded systems seemed pretty obscure just a few years ago, so it’s nice to see it being featured on a general-audience radio show.

So who’s going to listen to it from their phone, connected to their car’s audio over Bluetooth, on the way to the store to buy a Nest smoke alarm?

Stop Regression Testing

December 15, 2013

Some of my colleagues and I have been building a large software application, the open-source SMACCMPilot autopilot system, using Haskell-based embedded domain-specific languages (EDSLs) to generate embedded C code.  Our thesis is that EDSLs both increase our productivity and increase the assurance of our application; so far, we believe our thesis is holding up: we’ve created a complex 50k lines-of-code application in under two engineer years that is (mostly) free of undefined and memory-unsafe behavior.

Of course, there are always bugs.  Even with a fully formally-verified system, there are likely bugs in the specification/requirements, bugs in the hardware, or bugs in the users.  But our hope is that most of our bugs now are logical bugs rather than, say, arithmetic bugs (e.g., underflow/overflow, division by zero) or resource bugs (e.g., null-pointer dereference or buffer overflows).

However, since the EDSL we are using, Ivory, is relatively young—we’ve been writing SMACCMPilot in the language while developing the language—there have been cases where we have not checked for unsafe programs as stringently as possible.  For example, while we insert arithmetic underflow/overflow checks in general, we (ahem, I) neglected to do so for initializers for local variables, so an initializer might overflow with an unexpected result.

If we were writing our application in a typical compiled language, even a high-level one, and found such a bug in the compiler, we would perhaps file a bug report with the developers.  If it were an in-house language, things would be simpler, but modifying a large stand-alone compiler can be difficult (and introduce new bugs).  Most likely, the compiler would not change, and we’d either make some ad-hoc work-around or introduce regression tests to make sure that the specific bug found isn’t hit again.

But with an EDSL, the situation is different.  With a small AST, it’s easy to write new passes or inspect passes for errors.  Rebuilding the compiler takes seconds.  Fixing the bug with initializers took me a few minutes.

So introducing regression tests about the behavior of initializers isn’t necessary; we’ve changed the compiler to eliminate that class of bugs in the future.

More generally, we have a different mindset programming in an EDSL: if a kind of bug occurs a few times, we change the language/compiler to eliminate it (or at least to automatically insert verification conditions to check for it).  Instead of a growing test-suite, we have a growing set of checks in the compiler, to help eliminate both our bugs and the bugs in all future Ivory programs.

Edit (16 Dec. 2013): a few folks have noted that the title is a bit strong: it absolutely is, that’s probably why you read it. :)  Yes, you should (regression) test your applications.  You might consider testing your compiler, too.  Indeed, the point is that modifying the compiler can sometimes eliminate testing for a class of bugs and sometimes simply supports testing by inserting checks automatically into your code.


Book Review: Automate This

May 27, 2013

A family member gave me a copy of Automate This by Christopher Steiner as a gift a few months ago.  The subtitle of the book is “How algorithms came to rule our world.”  The book is a non-technical, fast, and easy read.  I did read a few interesting stories, such as NASA’s use of personality categorization algorithms in the 70s to predict which potential astronauts would work well together, or a math professor’s algorithmic dissection of Beatles songs.  The book particularly emphasizes algorithmic trading on Wall Street.  The somewhat non sequitur conclusion is that we need more science, engineering, and math graduates.

The main point of the book is that algorithms are pervasive and will become more so.  Toward this point, I think the author could have cast an even wider net, mentioning that algorithms are implemented in everything from elevator controllers to autopilots.  There is a cursory pre-computing history of algorithms at the beginning of the book that is (tenuously) tied to a Wall Street trading.

Rather than focus on algorithms broadly, the focus is patchy, hitting on machine learning, high-speed trading, and game theory.  Some mention of algorithms as might be taught in a “Analysis of Algorithms” course, covering basic topics like time-space complexity and decidability, would help prevent the general public from having a narrow-minded interpretation of algorithms.  Computer scientists invent, analyze, and implement algorithms every day, but much of that work is perhaps not sexy enough for a popular science book.

Incidentally, for the topics Steiner does cover, an excellent companion treating similar topics, but with a different focus, is Nate Silver’s (of 538 fame) book, The Signal and the Noise.

A Dane Goes to NASA

November 3, 2012

Last year, Nis Wegmann came from the University of Copenhagen to work with me on a NASA-sponsored project.  He was recently interviewed about his experiences.

Nis worked at the National Institute of Aerospace in Virginia, just outside the NASA Langley gates.  Since Galois is on the other side of the U.S., we mostly collaborated remotely.  (For a taste of what its like to intern at Galois itself, check out Ed Yang’s post.)  Nis is an outstanding programmer and great writer, so it made working together effortless, despite the distance.

Good luck in your future endeavors, Nis!

Lowering the Bar

October 2, 2012

I gave a talk (video, slides, and paper) at ICFP last month, arguing that it can be easy to build a high-assurance compiler. I gave a similar talk as a keynote a couple weeks later at the very enjoyable Midwest Verification Day, hosted by Kansas University this year (thanks Andy Gill and Perry Alexander for inviting me!). This paper wraps up the Copilot project. I had a great time (I mean, how often do formal methods engineers get to be around NASA subscale jet aircraft?!?).

A Fistful of Dollars

March 7, 2012

A while back I attended an excellent DARPA Proposers’ Day. It got me thinking, “If I had a fistful of (a few million) dollars to provide researchers, what sort of program might I start?” Two ideas struck me:

Getting software right, fast

Software is usually developed in a world with incomplete requirements and with users (and attackers) that try to break your assumptions.  Furthermore, more often than not, developers themselves (perhaps with a QA team, if they’re lucky) largely decide when they’re “done”—when a version is bug-free enough, secure enough, and meets the specifications to release.

There are a dizzying number of approaches to improving quality: testing (e.g., unit testing, penetration testing, black-box and white-box testing), compiler tools (e.g., type-checking, warnings), formal verification (e.g., model-checking, static analysis, decision procedures), programming methodologies (e.g., pair-programming, extreme programming), etc.  The question is, what combinations of approaches provide benefits and by what factor?

A major portion of the challenge would be to fund experiments that are large enough to draw statistical conclusions about different approaches.  Teams would get points for finishing faster, but lose points for bugs, security vulnerabilities, unimplemented features, etc. uncovered by a separate “read team”.  To make the experience more realistic, I’d have career programmers (not students in an undergrad course) to participate in the experiments.  A major portion of the program would be just designing the experiments to ensure statistical conclusions could be drawn from them, but I imagine the results might be enlightening.

A Believe Search Engine

Imagine a “search” engine in which you could search not for facts, but beliefs.  For example, I might query the likelihood that someone believes that the Earth revolves around the Sun in Russia in 1600.  Or I might ask what the likelihood of someone believing in global warming in Japan in 2012.  Or whatever else.  The engine would return some probability, perhaps with a confidence interval.  I’d imagine being able to query some believe based on geography, time period, religion, nationality, sex, and so forth.  Of course this is a hard, hard problem, and would likely involve search as well technology akin to IBM’s Watson.  I’d be curious to know if it were feasible at all.

Meta-Programming and eDSLs

January 30, 2011

I’ve been working on a domain-specific language that is embedded in Haskell (an “eDSL”) that essentially takes a set of Haskell stream (infinite list) equations and turns them into a real-time C program implementing the state-machine defined by the streams. It’s called Copilot, and in fact, it’s built on top of another Haskell eDSL called Atom,1 which actually does the heavy lifting in generating the C code.

For example, here’s the Fibonacci sequence in Copilot:

fib = do let f = varW64 "f" f .= [0,1] ++ f + (drop 1 f) 

I’ve been writing Copilot libraries recently, and I’ve had the following realization about eDSLs and meta-programming (let me know if someone has had this idea already!). Many languages treat meta-programming as a second-class feature—I’m thinking of macros used by the C preprocessor, for example (it’s probably generous even to call the C preprocessor ‘meta-programming’). One reason why Lisp-like languages were exciting is that the language is a first-class datatype, so meta-programming is on par with programming. In my experience, you don’t think twice about meta-programming in Lisp. (Haskell is more like C in this regard—you do think twice before using Template Haskell.)

So languages generally treat meta-programming as either second-class or first-class. What’s interesting about eDSLs, I think, is that they treat meta-programming as first-class, but programming as second-class! This isn’t surprising, since an eDSL is a first-class datatype, but the language is not first-class—its host language is.

Practically, what this means is that you spend very little time actually writing eDSL programs but rather generating eDSL programs. It is natural to layer eDSLs on top of other eDSLs.

This is just how Copilot came about: I was writing various Atom programs and realized that for my purposes, I just needed a restricted set of behaviors provided by Atom that are naturally represented by stream equations (and make some other tasks, like writing an interpreter, easier).

But as soon as Copilot was written, we2 started writing libraries implementing past-time linear temporal logic (LTL) operators, bounded LTL operators, fault-tolerant voting algorithms, regular expressions, and so on.

You wouldn’t think about combining the intermediate languages of a compiler into the same program. The idea of a language is more fluid, more organic in the context of eDSLs, where now libraries can be quickly written and different levels can be easily combined.

1Tom Hawkins wrote Atom.
2Credit for Copilot also goes to Sebastian Niller, Robin Morisset, Alwyn Goodloe.

Copilot: a DSL for Monitoring Embedded Systems

September 25, 2010

In case you missed all the excitement on the Galois blog, what follows is a re-post.

Introducing Copilot

Can you write a list in Haskell? Then you can write embedded C code using Copilot. Here’s a Copilot program that computes the Fibonacci sequence (over Word 64s) and tests for even a numbers:

fib :: Streams
fib = do
  "fib" .= [0,1] ++ var "fib" + (drop 1 $ varW64 "fib")
  "t" .= even (var "fib")
    where even :: Spec Word64 -> Spec Bool
          even w = w `mod` const 2 == const 0

Copilot contains an interpreter, a compiler, and uses a model-checker to check the correctness of your program. The compiler generates constant time and constant space C code via Tom Hawkin’s Atom Language (thanks Tom!). Copilot is specifically developed to write embedded software monitors for more complex embedded systems, but it can be used to develop a variety of functional-style embedded code.


> compile fib "fib" baseOpts

generates fib.c and fib.h (with a main() for simulation—other options change that). We can then run

> interpret fib 100 baseOpts

to check that the Copilot program does what we expect. Finally, if we have CBMC installed, we can run

> verify "fib.c"

to prove a bunch of memory safety properties of the generated program.

Galois has open-sourced Copilot (BSD3 licence). More information is available on the Copilot homepage. Of course, it’s available from Hackage, too.

Flight of the Navigator

Aberdeen Farms entrance

View of the James River.

Pitot tube on the test aircraft.

Our testbed stack: 4 STM32 microcontrollers (ARM Cortex M3s), an SD card for logging data, air pressure sensor, and voltage regulator.

Sebastian installing the stack.

Copilot took its maiden flight in August 2010 in Smithfield, Virginia. NASA rents a private airfield for test flights like this, but you have to get past the intimidating sign posted upon entering the airfield. However, once you arrive, there’s a beautiful view of the James River.

We were flying on a RC aircraft that NASA Langley uses to conduct a variety of Integrated Vehicle Health Management (IVHM) experiments. (It coincidentally had Galois colors!) Our experiments for Copilot were to determine its effectiveness at detecting faults in embedded guidance, navigation, and control software. The test-bed we flew was a partially fault-tolerant pitot tube (air pressure) sensor. Our pitot tube sat at the edge of the wing. Pitot tubes are used on commercial aircraft and they’re a big deal: a number of aircraft accidents and mishaps have been due, in part, to pitot tube failures.

Our experiment consisted of a beautiful hardware stack, crafted by Sebastian Niller of the Technische Universität Ilmenau. Sebastian also led the programming for the stack. The stack consisted of four STM32 ARM Cortex M3 microprocessors. In addition, there was an SD card for writing flight data, and power management. The stack just fit into the hull of the aircraft. Sebastian installed our stack in front of another stack used by NASA on the same flights.

The microprocessors were arranged to provide Byzantine fault-tolerance on the sensor values. One microprocessor acted as the general, receiving inputs from the pitot tube and distributing those values to the other microprocessors. The other microprocessors would exchange their values and perform a fault-tolerant vote on them. Granted, the fault-tolerance was for demonstration purposes only: all the microprocessors ran off the same clock, and the sensor wasn’t replicated (we’re currently working on a fully fault-tolerant system). During the flight tests, we injected (in software) faults by having intermittently incorrect sensor values distributed to various nodes.

The pitot sensor system (including the fault-tolerance code) is a hard real-time system, meaning events have to happen at predefined deadlines. We wrote it in a combination of Tom Hawkin’s Atom, a Haskell DSL that generates C, and C directly.

Integrated with the pitot sensor system are Copilot-generated monitors. The monitors detected

  • unexpected sensor values (e.g., the delta change is too extreme),
  • the correctness of the voting algorithm (we used Boyer-Moore majority voting, which returns the majority only if one exists; our monitor checked whether a majority indeed exists), and
  • whether the majority votes agreed.

The monitors integrated with the sensor system without disrupting its real-time behavior.

We gathered data on six flights. In between flights, we’d get the data from the SD card.

We took some time to pose with the aircraft. The Copilot team from left to right is Alwyn Goodloe, National Institute of Aerospace; Lee Pike, Galois, Inc.; Robin Morisset, École Normale Supérieure; and Sebastian Niller, Technische Universität Ilmenau. Robin and Sebastian are Visiting Scholars at the NIA for the project. Thanks for all the hard work!

There were a bunch of folks involved in the flight test that day, and we got a group photo with everyone. We are very thankful that the researchers at NASA were gracious enough to give us their time and resources to fly our experiments. Thank you!

Finally, here are two short videos. The first is of our aircraft’s takeoff during one of the flights. Interestingly, it has an electric engine to reduce the engine vibration’s effects on experiments.

The second is of AirStar, which we weren’t involved in, but that also flew the same day. AirStar is a scaled-down jet (yes, jet) aircraft that was really loud and really fast. I’m posting its takeoff, since it’s just so cool. That thing was a rocket!

More Details

Copilot and the flight test is part of a NASA-sponsored project (NASA press-release) led by Lee Pike at Galois. It’s a 3 year project, and we’re currently in the second year.

Even More Details

Besides the language and flight test, we’ve written a few papers:

  • Lee Pike, Alwyn Goodloe, Robin Morisset, and Sebastian Niller. Copilot: A Hard Real-Time Runtime Monitor. To appear in the proceedings of the 1st Intl. Conference on Runtime Verification (RV’2010), 2010. Springer.

This paper describes the Copilot language.

Byzantine faults are fascinating. Here’s a 2-page paper that shows one reason why.

At the beginning of our work, we tried to survey prior results in the field and discuss the constraints of the problem. This report is a bit lengthy (almost 50 pages), but it’s a gentle introduction to our problem space.

Yes, QuickCheck can be used to test low-level protocols.

A short paper motivating the need for runtime monitoring of critical embedded systems.

You’re Still Interested?

We’re always looking for collaborators, users, and we may need 1-2 visiting scholars interested in embedded systems & Haskell next summer. If any of these interest you, drop Lee Pike a note (hint: if you read any of the papers or download Copilot, you can find my email).