Author Archive

SmartChecking Matt Might’s Red-Black Trees

August 20, 2014

Matt Might gave a nice intro to QuickCheck via testing red-black trees recently. Of course, QuickCheck has been around for over a decade now, but it’s still useful (if underused–why aren’t you QuickChecking your programs!?).

In a couple of weeks, I’m presenting a paper on an alternative to QuickCheck called SmartCheck at the Haskell Symposium.

SmartCheck focuses on efficiently shrinking and generalizing large counterexamples. I thought it’d be fun to try some of Matt’s examples with SmartCheck.

The kinds of properties Matt Checked really aren’t in the sweet spot of SmartCheck, since the counterexamples are so small (Matt didn’t even have to define instances for shrink!). SmartCheck focuses on shrinking and generalizing large counterexamples.

Still, let’s see what it looks like. (The code can be found here.)

SmartCheck is only interesting for failed properties, so let’s look at an early example in Matt’s blog post where something goes wrong. A lot of the blog post focuses on generating sufficiently constrained arbitrary red-black trees. In the section entitled, “A property for balanced black depth”, a property is given to check that the path from the root of a tree to every leaf passes through the same number of black nodes. An early generator for trees fails to satisfy the property.

To get the code to work with SmartCheck, we derive Typeable and Generic instances for the datatypes, and use GHC Generics to automatically derive instances for SmartCheck’s typeclass. The only other main issue is that SmartCheck doesn’t support a `forall` function like in QuickCheck. So instead of a call to QuickCheck such as

> quickCheck (forAll nrrTree prop_BlackBalanced)

We change the arbitrary instance to be the nrrTree generator.

Because it is so easy to find a small counterexample, SmartCheck’s reduction algorithm does a little bit of automatic shrinking, but not too much. For example, a typical minimal counterexample returned by SmartCheck looks like

T R E 2 (T B E 5 E)

which is about as small as possible. Now onto generalization!

There are three generalization phases in SmartCheck, but we’ll look at just one, in which a formula is returned that is universally quantified if every test case fails. For the test case above, SmartCheck returns the following formula:

forall values x0 x1:
T R E 2 (T B x1 5 x0)

Intuitively, this means that for any well-typed trees chosen that could replace the variables x0 and x1, the resulting formula is still a counterexample.

The benefit to developers is seeing instantly that those subterms in the counterexample probably don’t matter. The real issue is that E on the left is unbalanced with (T B E 5 E) on the right.

One of the early design decisions in SmartCheck was focus on structurally shrinking data types and essentially ignore “base types” like Int, char, etc. The motivation was to improve efficiency on shrinking large counterexamples.

But for a case like this, generalizing base types would be interesting. We’d hypothetically get something like

forall values (x0, x1 :: RBSet Int) (x2, x3 :: Int):
T R E x2 (T B x1 x3 x0)

further generalizing the counterexample. It may be worth adding this behavior to SmartCheck.

SmartCheck’s generalization begins to bridge the gap from specific counterexamples to formulas characterizing counterexamples. The idea is related to QuickSpec, another cool tool developed by Claessen and Hughes (and SmallBone). Moreover, it’s a bridge between testing and verification, or as Matt puts it, from the 80% to the 20%.

SmartCheck: Redux

June 16, 2014

A few years ago, I started playing with the idea of making counterexamples from failed tests easier to understand. (If you’re like me, you spend a lot of time debugging.) Specifically, I started from QuickCheck, a test framework for Haskell, which has since been ported to many other languages. From this, a tool called SmartCheck was born.

In this post, I’m not going to describe the technical details of SmartCheck. There’s a paper I wrote for that, and it was just accepted for publication at the 2014 Haskell Symposium (warning: the paper linked to will change slightly as I prepare the final version).

Rather, I want to give a bit of perspective on the project.

As one reviewer for the paper noted, the paper is not fundamentally about functional programming but about testing. This is exactly right. From the perspective of software testing in general, I believe there are two novel contributions (correct me ASAP if I’m wrong!):

  1. It combines counterexample generation with counterexample reduction, as opposed to delta-debugging-based approaches, which performs deterministic reduction, given a specific counterexample. One possible benefit is SmartCheck can help avoid stopping at local minima during reduction, since while shrinking, new random values are generated.  Update: as John Regehr points out in the comments below, his group has already done this in the domain of C programs.  See the paper.
  2. Perhaps the coolest contribution is generalizing sets of counterexamples into formula that characterize them.

I’d be interested to see how the work would be received in the software testing world, but I suppose first it’d need to be ported to a more mainstream language, like C/C++/Java.

Compared to QuickCheck specifically, QuickCheck didn’t used to have generics for implementing shrinking functions; recent versions include it, and it’s quite good. In many cases, SmartCheck outperforms QuickCheck, generating smaller counterexamples faster.  Features like counterexample generalization are unique to SmartCheck, but being able to test functions is unique to QuickCheck. Moreover, I should say that SmartCheck uses QuickCheck in the implementation to do some of the work of finding a counterexample, so thanks, QuickCheck developers!

When you have a tool that purports to improve on QuickCheck in some respects, it’s natural to look for programs/libraries that use QuickCheck to test it. I found that surprisingly few Hackage packages have QuickCheck test suites, particularly given how well known QuickCheck is. The one quintessential program that does contain QuickCheck tests is Xmonad, but I challenge you to think of a few more off the top of your head! This really is a shame.

The lack of (public) real-world examples is a challenge when developing new testing tools, especially when you want to compare your tool against previous approaches. More generally, in testing, it seems there is a lack of standardized benchmarks. What we really need is analogue of the SPEC CPU2000 performance benchmarks for property-based testing, in Haskell or any other language for that matter.  I think this would be a great contribution to the community.

In 1980, Boyer and Moore developed a linear-time majority vote algorithm and verified an implementation of it. It took until 1991 to publish it after many failed tries. Indeed, in the decade between invention and publication, others had generalized their work, and it being superseded was one of the reasons reviewers gave for rejecting it! (The full story can be found here.)

To a small extent, I can empathize. I submitted a (slightly rushed) version of the paper a couple years ago to ICFP in what was a year of record submissions. One reviewer was positive, one luke-warm, and one negative. I didn’t think about the paper much over the following year or two, but I got a couple of requests to put the project on Hackage, a couple of reports on usage, and a couple of inquiries about how to cite the draft paper. So after making a few improvements to the paper and implementation, I decided to try again to publish it, and it finally will be.

As I noted above, this is not particularly a Haskell paper. However, an exciting aspect of the Haskell community, and more generally, the functional programming community, is that it is often exploring the fringes of what is considered to be core programming language research at the time. I’m happy to be part of the fringe.

Embedded Security in the Mainstream

April 1, 2014

Science Friday had an interesting podcast with Bruce Schneier worth a listen.  It discussed security in embedded systems (or as is hip to say now, the Internet of Things).  The idea of security in embedded systems seemed pretty obscure just a few years ago, so it’s nice to see it being featured on a general-audience radio show.

So who’s going to listen to it from their phone, connected to their car’s audio over Bluetooth, on the way to the store to buy a Nest smoke alarm?

Stop Regression Testing

December 15, 2013

Some of my colleagues and I have been building a large software application, the open-source SMACCMPilot autopilot system, using Haskell-based embedded domain-specific languages (EDSLs) to generate embedded C code.  Our thesis is that EDSLs both increase our productivity and increase the assurance of our application; so far, we believe our thesis is holding up: we’ve created a complex 50k lines-of-code application in under two engineer years that is (mostly) free of undefined and memory-unsafe behavior.

Of course, there are always bugs.  Even with a fully formally-verified system, there are likely bugs in the specification/requirements, bugs in the hardware, or bugs in the users.  But our hope is that most of our bugs now are logical bugs rather than, say, arithmetic bugs (e.g., underflow/overflow, division by zero) or resource bugs (e.g., null-pointer dereference or buffer overflows).

However, since the EDSL we are using, Ivory, is relatively young—we’ve been writing SMACCMPilot in the language while developing the language—there have been cases where we have not checked for unsafe programs as stringently as possible.  For example, while we insert arithmetic underflow/overflow checks in general, we (ahem, I) neglected to do so for initializers for local variables, so an initializer might overflow with an unexpected result.

If we were writing our application in a typical compiled language, even a high-level one, and found such a bug in the compiler, we would perhaps file a bug report with the developers.  If it were an in-house language, things would be simpler, but modifying a large stand-alone compiler can be difficult (and introduce new bugs).  Most likely, the compiler would not change, and we’d either make some ad-hoc work-around or introduce regression tests to make sure that the specific bug found isn’t hit again.

But with an EDSL, the situation is different.  With a small AST, it’s easy to write new passes or inspect passes for errors.  Rebuilding the compiler takes seconds.  Fixing the bug with initializers took me a few minutes.

So introducing regression tests about the behavior of initializers isn’t necessary; we’ve changed the compiler to eliminate that class of bugs in the future.

More generally, we have a different mindset programming in an EDSL: if a kind of bug occurs a few times, we change the language/compiler to eliminate it (or at least to automatically insert verification conditions to check for it).  Instead of a growing test-suite, we have a growing set of checks in the compiler, to help eliminate both our bugs and the bugs in all future Ivory programs.

Edit (16 Dec. 2013): a few folks have noted that the title is a bit strong: it absolutely is, that’s probably why you read it. :)  Yes, you should (regression) test your applications.  You might consider testing your compiler, too.  Indeed, the point is that modifying the compiler can sometimes eliminate testing for a class of bugs and sometimes simply supports testing by inserting checks automatically into your code.


Printf No More

October 23, 2013

I wondered to a colleague if there is a way to get GDB to print the value of a variable when it changes or when a certain program location is reached without breaking; in effect, placing printfs in my program automatically.

His Google-foo was better than mine he and found a stackoverflow solution.  For example, something like

  set pagination off
  break foo.c:42
  print x

prints the value of x whenever line 42 in file foo.c is reached, but just keeps running the program.  You can stick a little script like this in your .gdbinit file, which you can reload with

  source .gdbinit

This is particularly useful if you’re debugging a remote target that doesn’t support print statements. It’s a cool trick for embedded systems debugging.


October 7, 2013


Over on the Galois blog is a post about my current project, building a secure high-assurance autopilot called SMACCMPilot.  SMACCMPilot is open-source; is the project’s landing page that describes the technologies going into the project with links to the software repositories.  Check it out!

Galois’ main approach to building SMACCMPilot is to use embedded domain-specific languages (EDSLs), embedded in Haskell that compile down to restricted versions of C.  (If you’re not familiar with the “EDSL approach”, and particularly how it might improve the assurance of your systems, check out this paper we wrote about our experiences on a separate NASA-sponsored project.)  The project is quickly approaching one of the larger EDSL projects I know about, and it’s still relatively early in its development.  The EDSLs we’re developing for SMACCMPilot, Ivory (for embedded C code generation) and Tower (for OS task setup and communication), are suitable for other embedded systems projects as well.

Like many young and active open-source projects, the code is hot off the presses and under flux, and the documentation always lags the implementation, so let me know if you try using any of the artifacts and have problems.  We’re happy to have new users and contributors!

Book Review: Automate This

May 27, 2013

A family member gave me a copy of Automate This by Christopher Steiner as a gift a few months ago.  The subtitle of the book is “How algorithms came to rule our world.”  The book is a non-technical, fast, and easy read.  I did read a few interesting stories, such as NASA’s use of personality categorization algorithms in the 70s to predict which potential astronauts would work well together, or a math professor’s algorithmic dissection of Beatles songs.  The book particularly emphasizes algorithmic trading on Wall Street.  The somewhat non sequitur conclusion is that we need more science, engineering, and math graduates.

The main point of the book is that algorithms are pervasive and will become more so.  Toward this point, I think the author could have cast an even wider net, mentioning that algorithms are implemented in everything from elevator controllers to autopilots.  There is a cursory pre-computing history of algorithms at the beginning of the book that is (tenuously) tied to a Wall Street trading.

Rather than focus on algorithms broadly, the focus is patchy, hitting on machine learning, high-speed trading, and game theory.  Some mention of algorithms as might be taught in a “Analysis of Algorithms” course, covering basic topics like time-space complexity and decidability, would help prevent the general public from having a narrow-minded interpretation of algorithms.  Computer scientists invent, analyze, and implement algorithms every day, but much of that work is perhaps not sexy enough for a popular science book.

Incidentally, for the topics Steiner does cover, an excellent companion treating similar topics, but with a different focus, is Nate Silver’s (of 538 fame) book, The Signal and the Noise.

A Dane Goes to NASA

November 3, 2012

Last year, Nis Wegmann came from the University of Copenhagen to work with me on a NASA-sponsored project.  He was recently interviewed about his experiences.

Nis worked at the National Institute of Aerospace in Virginia, just outside the NASA Langley gates.  Since Galois is on the other side of the U.S., we mostly collaborated remotely.  (For a taste of what its like to intern at Galois itself, check out Ed Yang’s post.)  Nis is an outstanding programmer and great writer, so it made working together effortless, despite the distance.

Good luck in your future endeavors, Nis!

The Worst Offenders

October 13, 2012

I listened to a Freakonomics podcast the other day in which a story about hand-washing at Cedars-Sinai Medical Center was told. It turns out that of all the staff, the doctors had the lowest rate of hand-washing (around 65%, if I recall correctly) in the hospital. This was surprising, since the doctors supposedly have the best education and so should know the most about the dangers of bacterial infection.

The chief of the staff tried different means to increase hygiene. Incentives like gift cards were tried. Then he tried another approach: shame. Doctors had their hands cultured so that they could see the bacteria on their hands. Images of the cultures were even used as a hospital-wide screensaver on the computers.

I listened to this wondering if there are any parallels to formal verification engineers and researchers. Ostensibly, we are the experts on the perils of software bugs. Academic papers often begin with some story about bugs in the wild, motivating the new formal verification technique described therein.

But to what extent do we practice what we preach? How many of us might write a Python script with no unit tests? C/C++ programs without running Lint/CppCheck? Compile without fixing all the -Wall warnings the compiler emits?

These kinds of activities represent the lowest rung of software assurance; they’re the “hand washing of software assurance”, if you will. I’m certainly guilty myself of not always practicing good “software hygiene”. The justifications you might give for failing to do so is that you’re just writing a program for personal use, or you’re writing prototype software, “just for research”. I could imagine doctors telling themselves something similar: “I always scrub thoroughly before a major surgery, but it’s not such a big deal for a simple office visit.” But this justification can become a slippery slope.

There are a few examples in which verification-tool assurance shoots straight for the top. For example, John Harrison verified a model of HOL Light in HOL Light. Filip Maríc verified a SAT solver. But these are more intellectual curiosities than standard operating procedure.

It could be an interesting empirical study to analyze the software quality of open-source formal verification tools to see just how buggy they are (spoiler alert: they are buggy). What I’m interested in are not deep flaws that might be caught via theorem-proving, but simple ones that could be caught with a little better test coverage or lightweight static analysis.

For now, I’m just happy my colleagues’ screensaver isn’t a bug tracker for software that I’ve written.

Lowering the Bar

October 2, 2012

I gave a talk (video, slides, and paper) at ICFP last month, arguing that it can be easy to build a high-assurance compiler. I gave a similar talk as a keynote a couple weeks later at the very enjoyable Midwest Verification Day, hosted by Kansas University this year (thanks Andy Gill and Perry Alexander for inviting me!). This paper wraps up the Copilot project. I had a great time (I mean, how often do formal methods engineers get to be around NASA subscale jet aircraft?!?).


July 26, 2012

I’ve been working on a Haskell library for testing Haskell programs I call SmartCheck. SmartCheck is focused on testing algebraic data and generalizing counterexamples found. Below is the README for SmartCheck, which I have located on GitHub (I haven’t put it on Hackage yet). The following is a high-level explanation that doesn’t go into details about the algorithms or implementation (that’s another post!).

I’d be interested in feedback on

  • Real-world examples to try SmartCheck on.
  • Whether there are other interesting ways to generalize counterexamples.
  • If there’s similar work out there I should know about (in addition to QuickCheck and SmallCheck.
  • Your experiences, if you try the library out.



SmartCheck is a smarter QuickCheck, a powerful testing library for Haskell. The purpose of SmartCheck is to help you more quickly get to the heart of a bug and to quickly discover each possible way that a property may fail.

SmartCheck is useful for debugging programs operating on algebraic datatypes. When a property is true, SmartCheck is just like QuickCheck (SmartCheck uses QuickCheck as a backend). When a property fails, SmartCheck kicks into gear. First, it attempts to find a minimal counterexample to the property is a robust, systematic way. (You do not need to define any custom shrink instances, like with QuickCheck, but if you do, those are used. SmartCheck usually can do much better than even custom shrink instances.) Second, once a minimal counterexample is found, SmartCheck then attempts to generalize the failed value d by replacing d‘s substructures with new values to make d', and QuickChecking each new d'. If for each new d' generated, the property also fails, we claim the property fails for any substructure replaced here (of course, this is true modulo the coverage of the tests).

SmartCheck executes in a real-eval-print loop. In each iteration, all values that have the “same shape” as the generalized value are removed from possible created tests. The loop can be iterated until a fixed-point is reached, and SmartCheck is not able to create any new values that fail the property.

A typical example

In the package there is an examples directory containing a number of examples. Let’s look at the simplest, Div0.hs.

> cd SmartCheck/examples
> ghci -Wall Div0.hs

Div0 defines a toy language containing constants (C), addition (A), and division (D):

data M = C Int
       | A M M
       | D M M
  deriving (Read, Show, Typeable, Generic)

Because SmartCheck performs data-generic operations using GHC.Generics we have to derive Typeable and Generic. To use GHC.Generics, you also need the following pragmas: and the single automatically-derived instance:

{-# LANGUAGE DeriveDataTypeable #-}
{-# LANGUAGE DeriveGeneric #-}

instance SubTypes M 

Let’s say we have a little interpreter for the language that takes care to return Nothing if there is a division by 0:

eval :: M -> Maybe Int
eval (C i) = Just i
eval (A a b) = do
  i <- eval a 
  j <- eval b
  return $ i + j
eval (D a b) = 
  if eval b == Just 0 then Nothing 
    else do i <- eval a 
            j <- eval b
            return $ i `div` j

Now suppose we define a set of values of M such that they won’t result in division by 0. We might try the following:

divSubTerms :: M -> Bool
divSubTerms (C _)       = True
divSubTerms (D _ (C 0)) = False
divSubTerms (A m0 m1)   = divSubTerms m0 && divSubTerms m1
divSubTerms (D m0 m1)   = divSubTerms m0 && divSubTerms m1

So our property (tries) to state that so long as a value satisfies divSubTerms, then we won’t have division by 0 (can you spot the problem in divSubTerms?):

div_prop :: M -> Property
div_prop m = divSubTerms m ==> eval m /= Nothing

Assuming we’ve defined an Arbitrary instance for M (just like in QuickCheck—however, we just have to implement the arbitrary method; the shrink method is superfluous), we are ready to run SmartCheck.

divTest :: IO ()
divTest = smartCheck args div_prop
  args = scStdArgs { qcArgs   = stdArgs 
                   , treeShow = PrintString }

In this example, we won’t redefine any of QuickCheck’s standard arguments, but it’s certainly possible. the treeShow field tells SmartCheck whether you want generalized counterexamples shown in a tree format or printed as a long string (the default is the tree format).

Ok, let’s try it. First, SmartCheck just runs QuickCheck:

*Div0> divTest 
*** Failed! Falsifiable (after 7 tests):
D (D (D (A (C (-20)) (D (D (C 2) (C (-19))) (C (-8)))) (D (D (C (-23)) (C 32)) (C (-7)))) (A (A (C 2) (C 10)) (A (C (-2)) (C 13)))) (D (A (C 12) (C (-7))) (D (A (C (-29)) (C 19)) (C 30)))

Oh, that’s confusing, and for such a simple property and small datatype! SmartCheck takes the output from QuickCheck and tries systematic shrinking for the one failed test-case, kind of like SmallCheck might. We get the following reduced counterexample:

*** Smart Shrinking ... 
*** Smart-shrunk value:
D (C 0) (D (C 0) (C (-1)))

Ok, that’s some progress! Now SmartCheck attempts to generalize this (local) minimal counterexample. SmartCheck has two generalization steps that we’ll explain separately although SmartCheck combines their results in practice (you can turn off each kind of generalization in the flags). First, SmartCheck tries to generalize values in the shrunk counterexample. SmartCheck returns

*** Extrapolating values ...
*** Extrapolated value:
forall x0:

D x0 (D (C 0) (C (-1)))

Ahah! We see that for any possible subvalues x0, the above value fails. Our precondition divSubTerms did not account for the possibility of a non-terminal divisor evaluating to 0; we only pattern-matched on constants.

In addition, SmartCheck tries to do something I call constructor generalization. For a datatype with a finite number of constructors, the idea is to see if for each subvalue in the counterexample, there is are subvalues that also fail the property, using every possible constructor in the datatype. So for example, for our counterexample above

*** Extrapolating constructors ...
*** Extrapolated value:
forall C0:
  there exist arguments s.t.

D (C 0) (D C0 (C (-1)))

So in the hole C0, SmartCheck was able to build a value using each of the constructors C, A, and D (well, it already knew there was a value using CC 0.

SmartCheck asks us if we want to continue:

Attempt to find a new counterexample? ('Enter' to continue; any character
then 'Enter' to quit.)

SmartCheck will omit any term that has the “same shape” as D (C 0) (D (C 0) (C (-1))) and try to find a new counterexample.

*** Failed! Falsifiable (after 9 tests):  
A (A (D (C (-20)) (A (C (-5)) (C (-32)))) (D (A (C 6) (C 19)) (A (C (-3)) (A (C (-16)) (C (-13)))))) (D (C 29) (D (C (-11)) (D (C 11) (C 23))))

*** Smart Shrinking ... 
*** Smart-shrunk value:
A (C (-1)) (D (A (C 1) (C 1)) (D (C 1) (C 2)))

*** Extrapolating values ...

*** Extrapolating Constructors ...

*** Extrapolated value:
forall values x0 x1:

A x1 (D x0 (D (C 1) (C 2)))

We find another counterexample; this time, the main constructor is addition.

We might ask SmartCheck to find another counterexample:


*** Extrapolating ...
*** Could not extrapolate a new value; done.

At this point, SmartCheck can’t find a newly-shaped counterexample. (This doesn’t mean there aren’t more—you can try to increase the standard arguments to QuickCheck to allow more failed test before giving up (maxDiscard) or increasing the size of tests (maxSize). Or you could simply just keep running the real-eval-print loop.)

A Fistful of Dollars

March 7, 2012

A while back I attended an excellent DARPA Proposers’ Day. It got me thinking, “If I had a fistful of (a few million) dollars to provide researchers, what sort of program might I start?” Two ideas struck me:

Getting software right, fast

Software is usually developed in a world with incomplete requirements and with users (and attackers) that try to break your assumptions.  Furthermore, more often than not, developers themselves (perhaps with a QA team, if they’re lucky) largely decide when they’re “done”—when a version is bug-free enough, secure enough, and meets the specifications to release.

There are a dizzying number of approaches to improving quality: testing (e.g., unit testing, penetration testing, black-box and white-box testing), compiler tools (e.g., type-checking, warnings), formal verification (e.g., model-checking, static analysis, decision procedures), programming methodologies (e.g., pair-programming, extreme programming), etc.  The question is, what combinations of approaches provide benefits and by what factor?

A major portion of the challenge would be to fund experiments that are large enough to draw statistical conclusions about different approaches.  Teams would get points for finishing faster, but lose points for bugs, security vulnerabilities, unimplemented features, etc. uncovered by a separate “read team”.  To make the experience more realistic, I’d have career programmers (not students in an undergrad course) to participate in the experiments.  A major portion of the program would be just designing the experiments to ensure statistical conclusions could be drawn from them, but I imagine the results might be enlightening.

A Believe Search Engine

Imagine a “search” engine in which you could search not for facts, but beliefs.  For example, I might query the likelihood that someone believes that the Earth revolves around the Sun in Russia in 1600.  Or I might ask what the likelihood of someone believing in global warming in Japan in 2012.  Or whatever else.  The engine would return some probability, perhaps with a confidence interval.  I’d imagine being able to query some believe based on geography, time period, religion, nationality, sex, and so forth.  Of course this is a hard, hard problem, and would likely involve search as well technology akin to IBM’s Watson.  I’d be curious to know if it were feasible at all.

When Formal Systems Kill

February 18, 2012

What happens when a philosophy professor (with a M.S. in Computer Science) and a formal methods engineer (like me) collaborate?  The outcome in my case is a paper entitled, “When Formal Systems Kill: Computer Ethics and Formal Methods” (pdf). In the paper, we give an ethical treatment of the use of formal methods. Our goal in the paper is not so much to make definitive claims on the subject but to raise awareness of the issue for further philosophical treatment. The subject has been neglected by ethics philosophers.

Darren (my coauthor) and I had a hard time deciding where to try to publish this. We ended up submitting it to the American Philosophical Association’s (APA) newsletter on Philosophy and Computers, where it was accepted for the Fall 2011 edition. (The APA is the “ACM of Philosophy” within the United States.)

The paper was a number of years in the making, being a side-project for me. I can’t say I agree with everything we wrote, but I hope it serves as a useful starting point for the dialogue between ethics and formal methods.

Who’s Afraid of Software?

January 19, 2012

Who’s afraid of software?  I mean viscerally, stomach-knotting afraid.  Afraid like you might be when you come across a snake or a bear, or when you are mugged.  Do you obsess about a phishing attack each time you open your email?  Do you worry there’s an eavesdropper when you join the open wifi access point in a coffee shop?  Do you worry your software will fail in your modern automobile or aircraft?

I listened to a Freakonomics podcast about risk, uncertainty, and beliefs.  One point made during the show was that our fears are shaped by evolution—to our ancestors, it made sense to be afraid of threatening animals.  In modern life, however, our fears don’t match risks—we’d be much better off being afraid of cheeseburgers, as pointed out in the show.  Some people are afraid of modern risks.  I know people afraid of cancer, for example.

That got me thinking about fearing software.  Software is certainly among the most complex artifacts created by humans.  Modern cars contain 100+ million lines of code.  Nearly every day there is a story about a large corporation being hacked and of cyber-warfare between nations.

My question is serious—I really do wonder if people are genuinely afraid of software.  I work in the area of software assurance, and while I take precautions against viruses, phishing attacks, etc., I don’t particularly worry about software failures, even when my life might depend on it.  This is despite issues just last year like this and this in automotive software.  I get to see a somewhat how the sausage is made, and in general, we only exercise a small fraction of the state-space of deployed software in validation and in actual usage.  There are legitimate risks, but there seems to be very little fear.

Perhaps like a medical doctors stereotypically neglecting their own health, I don’t worry day-to-day about software assurance despite working in the field.  But it seems nobody else really fears software, either.

In the podcast, the topic of polarizing claims, like global warming, is discussed.  Outside of academic circles, one’s view on the risks of software are not so polarizing—your views on the topic won’t cause your friends or colleagues to disparage you (indeed, if anything, the main risk is likely boring others in discussing the topic!).  I wonder just what the “global warming” of software might be in the future.

Copilot featured in Aerospace America

December 10, 2011

My Copilot project, sponsored by NASA, was featured in the Aerospace America magazine “Year in Review” edition (December 2011). We recently finished up the project, which focused on solving the problem of failure in complex embedded systems (like flight control systems in commercial aircraft).  Our approach was a language called Copilot for runtime monitoring of embedded systems corroborated with experiments on airspeed sensor systems and an autopilot communication link.

Here’s a link to the page from the magazine mentioning us from the author’s page. (The relevant paragraph is in the middle of the right column).

Thanks to Galois and the National Institute of Aerospace where the work was done, the NASA folks we collaborated with, and especially Alwyn Goodloe, Nis Wegmann, Sebastian Niller, and Robin Morisset who worked on the project with me.

Stable Names in Haskell

November 26, 2011

Stable names in GHC “are a way of performing fast (O(1)), not-quite-exact comparison between objects.”  Andy Gill showed how to use them to extract the explicit graph from writing recursive functions in his Data.Reify package (and associated paper).  It’s a great idea and very practical for embedded domain-specific languages—we’ve used the idea in Copilot to recover sharing.

However, consider this example, with three tests executed in GHCI.

For a function with type constraints, stable names fails to “realize” that we are pointing to the same object. As a couple of my colleagues pointed out, the cause is the dictionary being passed around causing new closures to be created. Simon Marlow noted that if you compile with -O, the explicit dictionaries get optimized away.

Here are the solutions I have to “fixing” the problem, in the context of a DSL:

  • Tell your users that recursive expressions must be monomorphic—only “pure functions” over the expressions of your DSL can be polymorphic.
  • Implement a check in your reifier to see how many stable names have been created—if some upper-bound is violated, then the user has created an infinite expression, the expression is extremely large (in which case the user should try to use some sharing mechanism, such as let-expressions inside the language), or we’ve hit a stable-names issue.
  • Ensure your DSL programs are always compiled.
  • Of course, you can always take another approach, like Template Haskell or not using recursion at the Haskell level; also check out Andy Gill’s paper for other solutions to the observable sharing problem.

I don’t see how to use (deep)seq to fix the problem, at least as it’s presented in the example above, but I’d be keen to know if there are other solutions.

FM@Galois Talks

November 10, 2011

As I posted over on the Galois blog, I recently gave a talk a couple of times on formal methods at Galois over the years (pdf). It’s been fun putting theory into practice!

Making your Ubuntu life better

June 18, 2011

I’ve had a lot of trouble with Ubuntu 11.04 (Natty Narwhal) on a laptop (ThinkPad 420), and I’ve had problems including:

  • Not being able to use dual monitors,
  • Random logoffs.

I tentatively think I was able to solve them with two easy fixes:

  • Turn off Unity. You can do this in the login screen.
  • In the Update Manager, under “Settings”, check “Proposed updates”.

With the proposed updates, Unity may be working; try that at your own risk. With that, I have a reasonably stable system.

Meta-Programming and eDSLs

January 30, 2011

I’ve been working on a domain-specific language that is embedded in Haskell (an “eDSL”) that essentially takes a set of Haskell stream (infinite list) equations and turns them into a real-time C program implementing the state-machine defined by the streams. It’s called Copilot, and in fact, it’s built on top of another Haskell eDSL called Atom,1 which actually does the heavy lifting in generating the C code.

For example, here’s the Fibonacci sequence in Copilot:

fib = do let f = varW64 "f" f .= [0,1] ++ f + (drop 1 f) 

I’ve been writing Copilot libraries recently, and I’ve had the following realization about eDSLs and meta-programming (let me know if someone has had this idea already!). Many languages treat meta-programming as a second-class feature—I’m thinking of macros used by the C preprocessor, for example (it’s probably generous even to call the C preprocessor ‘meta-programming’). One reason why Lisp-like languages were exciting is that the language is a first-class datatype, so meta-programming is on par with programming. In my experience, you don’t think twice about meta-programming in Lisp. (Haskell is more like C in this regard—you do think twice before using Template Haskell.)

So languages generally treat meta-programming as either second-class or first-class. What’s interesting about eDSLs, I think, is that they treat meta-programming as first-class, but programming as second-class! This isn’t surprising, since an eDSL is a first-class datatype, but the language is not first-class—its host language is.

Practically, what this means is that you spend very little time actually writing eDSL programs but rather generating eDSL programs. It is natural to layer eDSLs on top of other eDSLs.

This is just how Copilot came about: I was writing various Atom programs and realized that for my purposes, I just needed a restricted set of behaviors provided by Atom that are naturally represented by stream equations (and make some other tasks, like writing an interpreter, easier).

But as soon as Copilot was written, we2 started writing libraries implementing past-time linear temporal logic (LTL) operators, bounded LTL operators, fault-tolerant voting algorithms, regular expressions, and so on.

You wouldn’t think about combining the intermediate languages of a compiler into the same program. The idea of a language is more fluid, more organic in the context of eDSLs, where now libraries can be quickly written and different levels can be easily combined.

1Tom Hawkins wrote Atom.
2Credit for Copilot also goes to Sebastian Niller, Robin Morisset, Alwyn Goodloe.

Haskell and Hardware for the Holidays

December 18, 2010

Looking to make a statement this holiday season?  You could try to win the office “ugly holiday sweater” contest.  Or, you could play “Jingle Bells” on your Arduino microcontroller, using Haskell.  This post is about the latter.

We’re going to write this small program using the Copilot embedded domain-specific language (on Hackage and the source on Github).  Copilot is a stream language that allows you to generate embedded C code from programs written essentially as Haskell lists (using Atom as a backend for the C code generation).  This post is about how to use Copilot/Haskell (v. 1.0) to make your embedded C programming easier and more likely to be correct.  Here’s what we’re going to do—please don’t look too closely at my soldering, and turn the volume up, since a piezo speaker isn’t loud:

(For the impatient, the Haskell file is here, and the generated .c and .h files are here and here, respectively.)

We’re going to essentially recreate this C/Wiring program, plus flash some LEDs, but hopefully in a easier, safer way.  We need to manage three tasks:

  1. Determine the note and number of beats to play.
  2. Play the piezo speaker.
  3. Flash the LEDs.

We’ll start by defining which pins control what function:

-- pin numbers
speaker, red, green :: Spec Int32
speaker = 13
red     = 12
green   = 11

The type Spec Int32 takes an Int32 and lifts it into a Copilot expression.

We’ll call the program cycleSong. The type of a Copilot program is Streams, which is a collection of Spec a`s, and it resides within the Writer Monad. First, we’ll declare some variables.

cycleSong :: Streams
cycleSong = do
  -- Copilot vars
  let idx       = varI32 "idx"
      notes     = varI32 "notes"
      duration  = varI32 "duration"
      odd       = varI32 "odd"
      even      = varI32 "even"
      playNote  = varB   "playNote"
  -- external vars
      note = extArrI32 "notes" idx
      beat = extArrI32 "beats" idx

There are two classes of variables: Copilot variables that will refer to streams (infinite lists), and external variables, which can refer to data from C (including the return values of functions, global variables, and arrays). The constructors are mnemonics for the type of the variables; for example, varI32 is a variable that will refer to a stream of Int32s. Similarly, extArrI32 is an external variable referring to a C array of Int32s (i.e., int32_t). Notice the idx argument—it is the stream of values from which the index into the array is drawn (constants can also be used for indexes).

Now for the actual program:

 idx      .= [0] ++ (idx + 1) `mod` (fromIntegral $ length notesLst)
 notes    .= note
 duration .= beat * 300
 odd      .= mux (idx `mod` 2 == 1) green red
 even     .= mux (idx `mod` 2 == 1) red green
 playNote .= true
 -- triggers
 trigger playNote "digitalWrite" (odd <> true)
 trigger playNote "digitalWrite" (even <> false)
 trigger playNote "playtone" (speaker <> notes <> duration)

And that’s basically it.  There are two parts to the program, the definition of Copilot streams, which manage data-flow and control, and triggers, which call an external C function when some property is true.  Copilot streams look pretty much like Haskell lists, except that functions are automatically lifted to the stream level for convenience.  Thus, instead of writing,

 x = [0] ++ map (+1) x

in Copilot, you simply write

 x .= [0] ++ x + 1

Similarly for constants, so the Copilot stream definition

playNote .= true

lifts the constant true to an infinite stream of true values. The function mux is if then elsemux refers to a 2-to-1 multiplexer. So that means that the stream odd takes the value of green when idx is odd, and red otherwise, where green and red refer to the pins controlling the respective LEDs.

Just to round out the description of the other defined streams, idx is the index into the C arrays containing the notes and beats, respectively—that’s why we perform modular arithmetic. The stream duration tells us how long to hold a note; 300 is a magic “tempo” constant.

Now for the triggers. Each of our triggers “fires” whenever the stream playNote is true; in our case, because it is a constant stream of trues, this happens on each iteration. So on each iteration, the C functions digitalWrite and playTone are called with the respective function arguments (‘<>‘ separates arguments). The function digitalWrite is a function that is part of the Wiring language, which is basically C with some standard libraries, from which digitalWrite comes. We’ll write playTone ourselves in a second.

The C Code

We need a little C code now.  We could write this directly, but we’ll just do this in Haskell, since there’s so little we need to write—the Copilot program handles most of the work.  But a caveat: it’s a little ugly, since we’re just constructing Haskell strings. Here are a few functions (included with Copilot) to make this easier, and here are some more. (If someone properly writes a package to write ad-hoc C code from Haskell, please leave a comment!)

First, we need more magic constants to give the frequency associated with notes (a space is a rest).

freq :: Char -> Int32
freq note  =
  case note of
    'c' -> 1915
    'd' -> 1700
    'e' -> 1519

and here are the notes of the song and the beats per note:

jingleBellsNotes = "eeeeeeegcdefffffeeeddedgeeeeeeegcdefffffeeggfdc"
jingleBellsBeats =
  [ 1,1,2  , 1,1,2, 1,1,1,1, 4
  , 1,1,1,1, 1,1,2, 1,1,1,1, 2,2
  , 1,1,2  , 1,1,2, 1,1,1,1, 4
  , 1,1,1,1, 1,1,2, 1,1,1,1, 4

The other main piece of C code we need to write is the function playtone. The piezo speaker is controlled by pulse width modulation, basically meaning we’ll turn it on and off really fast to simulate an analogue signal. Here is it’s definition (using a little helper Haskell function to construct C functions):

    [ function "void" "playtone" ["int32_t speaker", "int32_t tone", "int32_t duration"] P.++ "{"
    , "#ifdef CBMC"
    , "  for (int32_t i = 0; i < 1; i ++) {"
    , "#else"
    , "  for (int32_t i = 0; i < duration * 1000; i += tone * 2) {"
    , "#endif"
    , "    digitalWrite(speaker, HIGH);"
    , "    delayMicroseconds(tone);"
    , "    digitalWrite(speaker, LOW);"
    , "    delayMicroseconds(tone);"
    , "  }"
    , "}"

HIGH, LOW, digitalWrite, and delayMicroseconds are all part of the Wiring standard library.  That ifdef is for verification purposes, which we’ll describe in just a bit.

Besides a little more cruft, that’s it!

Test, Build, Verify

“Jersey Shore” may have introduced you to the concept of gym, tan, laundry, but here we’ll stick to test, build, verify.  That is, first we’ll test our program using the Copilot interpreter, then we’ll build it, then we’ll prove the memory safety of the generated C program.

  • Interpret. We define a function that calls the Copilot interpreter:
    interpreter =
      interpret cycleSong 20
        $ setE (emptySM {i32Map = fromList [ ("notes", notesLst)
                                           , ("beats", beatsLst)]})

    This calls the Copilot interpreter, saying to unroll cycleSong 20 times. Because the Copilot program samples some external C values, we need to provide that data to the interpreter. Fortunately, we have those arrays already defined as Haskell lists. Executing this, we get the following:

    period: 0   duration: 300   even: 11   idx: 0   notes: 1519   odd: 12   playNote: 1
    period: 1   duration: 300   even: 12   idx: 1   notes: 1519   odd: 11   playNote: 1
    period: 2   duration: 600   even: 11   idx: 2   notes: 1519   odd: 12   playNote: 1
    period: 3   duration: 300   even: 12   idx: 3   notes: 1519   odd: 11   playNote: 1
    period: 4   duration: 300   even: 11   idx: 4   notes: 1519   odd: 12   playNote: 1
    period: 5   duration: 600   even: 12   idx: 5   notes: 1519   odd: 11   playNote: 1
    period: 6   duration: 300   even: 11   idx: 6   notes: 1519   odd: 12   playNote: 1
                                                   . . .

    Good, it looks right. (period isn’t a Copilot variable but just keeps track of what round we’re on.)

  • Build. To build, we generate the C code from the Copilot program, then we’ll use a cross-compiler targeting the ATmega328. The easiest way (I’ve found) is via Homin Lee’s Arscons. Arscons is based on Scons, a Python-based build system. Arscons makes three assumptions: (1) the program is written as a Wiring program (e.g., there’s a loop() function instead of a main() function is the main difference), (2) the extension of the Wiring program is .pde, and (3) the directory containing the XXX.pde is XXX. For us, all that really means is that we have to change the extension of the generated program from .c to .pde. So we define
    main :: IO ()
    main = do
      compile cycleSong name
        $ setPP cCode  -- C code for above/below the Copilot program
        $ setV Verbose -- Verbose compilation
      copyFile (name P.++ ".c") (name P.++ ".pde") -- SConstruct expects .pde

    and then execute

    > runhaskell CopilotSong.hs

    to do this.

    To build the executable, we issue

    > scons


    scons upload

    when we’re ready to flash the microcontroller.

  • Verify. Is the generated C program memory safe?  Wait… What do I mean by ‘memory safe’?  I’ll consider the program to be memory safe if the following hold:
    • No arithmetic underflows or overflows.
    • No floating-point not-a-numbers (NaNs).
    • No division by zero.
    • No array bounds underflows or overflows.
    • No Null pointer dereferences.

    Of course this is an approximates memory-safety, but it’s a pretty good start. If the compiler is built correctly, we should be pretty close to a memory-safe program. But we want to check the compiler, even though Haskell’s type system gives us some evidence of guarantees already. Furthermore, the compiler knows nothing about arbitrary C functions, and it doesn’t know how large external C arrays are.

    We can prove that the program is memory safe. We call out to CBMC, a C model-checker developed primarily by Daniel Kröning. This is whole-program analysis, so we have to provide the location of the libraries. We define

    verifying :: IO ()
    verifying =
      verify (name P.++ ".c") (length notesLst * 4 + 3)
        (     "-DCBMC -I/Applications/ "
         P.++ "-I/Applications/ "
         P.++ "--function cbmc")

    which calls cbmc on our generated C program. Let me briefly explain the arguments. First we give the name of the C program.

    Then we say how many times to unroll the loops. This requires a little thinking. We want to unroll the loops enough times to potentially get into a state where we might have an out of bounds array access (recall that the Copilot stream idx generates indexes into the arrays). The length of the C arrays are given by length notesLst. When compiling the Copilot program (calling the module’s main function, a periodic schedule is generated for the program). From the schedule, we can see that idx is updated every fourth pass through the loop. So we unwind it enough loop passes for the counter to have the opportunity to walk off the end of the array, plus a few extra passes for setup. This is a minimum bound; you could of course over-approximate and unroll the loop, say, 1000 times.

    Regarding loop unrolling, remember that #ifdef from the definition of playtone()? We include that to reduce the difficulty of loop unrolling. playtone() gets called on every fourth pass through the main loop, and unrolling both loops is just too much for symbolic model-checking (at least on my laptop). So we give ourselves an informal argument that the loop in playtone() won’t introduce any memory safety violations, and the #ifdef gives us one iteration through the loop if we’re verifying the system. A lot of times with embedded code, this is a non-issue, since loops can just be completely unrolled.

    The -D flag defines a preprocessor macro, and the -I defines a include path. Finally, the --function flag gives the entry point into the program. Because we generated a Wiring program which generates a while(1) loop for us through macro magic, we have to create an explicit loop ourselves for verification purposes.

    If you’re interested in seeing what things look like when they fail, change the idx stream to be

      idx .= [0] ++ (idx + 1)

    and cbmc will complain

    Violated property:
      file CopilotSing.c line 180 function __r11
      array `beats' upper bound
      (long int)__1 < 47

So that’s it. Happy holidays!


Get every new post delivered to your Inbox.