I gave a talk (video, slides, and paper) at ICFP last month, arguing that it can be easy to build a high-assurance compiler. I gave a similar talk as a keynote a couple weeks later at the very enjoyable Midwest Verification Day, hosted by Kansas University this year (thanks Andy Gill and Perry Alexander for inviting me!). This paper wraps up the Copilot project. I had a great time (I mean, how often do formal methods engineers get to be around NASA subscale jet aircraft?!?).
Archive for the ‘Software’ Category
Lowering the Bar
October 2, 2012SmartCheck
July 26, 2012I’ve been working on a Haskell library for testing Haskell programs I call SmartCheck. SmartCheck is focused on testing algebraic data and generalizing counterexamples found. Below is the README for SmartCheck, which I have located on GitHub (I haven’t put it on Hackage yet). The following is a high-level explanation that doesn’t go into details about the algorithms or implementation (that’s another post!).
I’d be interested in feedback on
- Real-world examples to try SmartCheck on.
- Whether there are other interesting ways to generalize counterexamples.
- If there’s similar work out there I should know about (in addition to QuickCheck and SmallCheck.
- Your experiences, if you try the library out.
Thanks!
Synopsis
SmartCheck is a smarter QuickCheck, a powerful testing library for Haskell. The purpose of SmartCheck is to help you more quickly get to the heart of a bug and to quickly discover each possible way that a property may fail.
SmartCheck is useful for debugging programs operating on algebraic datatypes. When a property is true, SmartCheck is just like QuickCheck (SmartCheck uses QuickCheck as a backend). When a property fails, SmartCheck kicks into gear. First, it attempts to find a minimal counterexample to the property is a robust, systematic way. (You do not need to define any custom shrink instances, like with QuickCheck, but if you do, those are used. SmartCheck usually can do much better than even custom shrink instances.) Second, once a minimal counterexample is found, SmartCheck then attempts to generalize the failed value d by replacing d‘s substructures with new values to make d', and QuickChecking each new d'. If for each new d' generated, the property also fails, we claim the property fails for any substructure replaced here (of course, this is true modulo the coverage of the tests).
SmartCheck executes in a real-eval-print loop. In each iteration, all values that have the “same shape” as the generalized value are removed from possible created tests. The loop can be iterated until a fixed-point is reached, and SmartCheck is not able to create any new values that fail the property.
A typical example
In the package there is an examples directory containing a number of examples. Let’s look at the simplest, Div0.hs.
> cd SmartCheck/examples
> ghci -Wall Div0.hs
Div0 defines a toy language containing constants (C), addition (A), and division (D):
data M = C Int
| A M M
| D M M
deriving (Read, Show, Typeable, Generic)
Because SmartCheck performs data-generic operations using GHC.Generics we have to derive Typeable and Generic. To use GHC.Generics, you also need the following pragmas: and the single automatically-derived instance:
{-# LANGUAGE DeriveDataTypeable #-}
{-# LANGUAGE DeriveGeneric #-}
instance SubTypes M
Let’s say we have a little interpreter for the language that takes care to return Nothing if there is a division by 0:
eval :: M -> Maybe Int
eval (C i) = Just i
eval (A a b) = do
i <- eval a
j <- eval b
return $ i + j
eval (D a b) =
if eval b == Just 0 then Nothing
else do i <- eval a
j <- eval b
return $ i `div` j
Now suppose we define a set of values of M such that they won’t result in division by 0. We might try the following:
divSubTerms :: M -> Bool
divSubTerms (C _) = True
divSubTerms (D _ (C 0)) = False
divSubTerms (A m0 m1) = divSubTerms m0 && divSubTerms m1
divSubTerms (D m0 m1) = divSubTerms m0 && divSubTerms m1
So our property (tries) to state that so long as a value satisfies divSubTerms, then we won’t have division by 0 (can you spot the problem in divSubTerms?):
div_prop :: M -> Property
div_prop m = divSubTerms m ==> eval m /= Nothing
Assuming we’ve defined an Arbitrary instance for M (just like in QuickCheck—however, we just have to implement the arbitrary method; the shrink method is superfluous), we are ready to run SmartCheck.
divTest :: IO ()
divTest = smartCheck args div_prop
where
args = scStdArgs { qcArgs = stdArgs
, treeShow = PrintString }
In this example, we won’t redefine any of QuickCheck’s standard arguments, but it’s certainly possible. the treeShow field tells SmartCheck whether you want generalized counterexamples shown in a tree format or printed as a long string (the default is the tree format).
Ok, let’s try it. First, SmartCheck just runs QuickCheck:
*Div0> divTest
*** Failed! Falsifiable (after 7 tests):
D (D (D (A (C (-20)) (D (D (C 2) (C (-19))) (C (-8)))) (D (D (C (-23)) (C 32)) (C (-7)))) (A (A (C 2) (C 10)) (A (C (-2)) (C 13)))) (D (A (C 12) (C (-7))) (D (A (C (-29)) (C 19)) (C 30)))
Oh, that’s confusing, and for such a simple property and small datatype! SmartCheck takes the output from QuickCheck and tries systematic shrinking for the one failed test-case, kind of like SmallCheck might. We get the following reduced counterexample:
*** Smart Shrinking ...
*** Smart-shrunk value:
D (C 0) (D (C 0) (C (-1)))
Ok, that’s some progress! Now SmartCheck attempts to generalize this (local) minimal counterexample. SmartCheck has two generalization steps that we’ll explain separately although SmartCheck combines their results in practice (you can turn off each kind of generalization in the flags). First, SmartCheck tries to generalize values in the shrunk counterexample. SmartCheck returns
*** Extrapolating values ...
*** Extrapolated value:
forall x0:
D x0 (D (C 0) (C (-1)))
Ahah! We see that for any possible subvalues x0, the above value fails. Our precondition divSubTerms did not account for the possibility of a non-terminal divisor evaluating to 0; we only pattern-matched on constants.
In addition, SmartCheck tries to do something I call constructor generalization. For a datatype with a finite number of constructors, the idea is to see if for each subvalue in the counterexample, there is are subvalues that also fail the property, using every possible constructor in the datatype. So for example, for our counterexample above
*** Extrapolating constructors ...
*** Extrapolated value:
forall C0:
there exist arguments s.t.
D (C 0) (D C0 (C (-1)))
So in the hole C0, SmartCheck was able to build a value using each of the constructors C, A, and D (well, it already knew there was a value using C—C 0.
SmartCheck asks us if we want to continue:
Attempt to find a new counterexample? ('Enter' to continue; any character
then 'Enter' to quit.)
SmartCheck will omit any term that has the “same shape” as D (C 0) (D (C 0) (C (-1))) and try to find a new counterexample.
*** Failed! Falsifiable (after 9 tests):
A (A (D (C (-20)) (A (C (-5)) (C (-32)))) (D (A (C 6) (C 19)) (A (C (-3)) (A (C (-16)) (C (-13)))))) (D (C 29) (D (C (-11)) (D (C 11) (C 23))))
*** Smart Shrinking ...
*** Smart-shrunk value:
A (C (-1)) (D (A (C 1) (C 1)) (D (C 1) (C 2)))
*** Extrapolating values ...
*** Extrapolating Constructors ...
*** Extrapolated value:
forall values x0 x1:
A x1 (D x0 (D (C 1) (C 2)))
We find another counterexample; this time, the main constructor is addition.
We might ask SmartCheck to find another counterexample:
...
*** Extrapolating ...
*** Could not extrapolate a new value; done.
At this point, SmartCheck can’t find a newly-shaped counterexample. (This doesn’t mean there aren’t more—you can try to increase the standard arguments to QuickCheck to allow more failed test before giving up (maxDiscard) or increasing the size of tests (maxSize). Or you could simply just keep running the real-eval-print loop.)
Who’s Afraid of Software?
January 19, 2012Who’s afraid of software? I mean viscerally, stomach-knotting afraid. Afraid like you might be when you come across a snake or a bear, or when you are mugged. Do you obsess about a phishing attack each time you open your email? Do you worry there’s an eavesdropper when you join the open wifi access point in a coffee shop? Do you worry your software will fail in your modern automobile or aircraft?
I listened to a Freakonomics podcast about risk, uncertainty, and beliefs. One point made during the show was that our fears are shaped by evolution—to our ancestors, it made sense to be afraid of threatening animals. In modern life, however, our fears don’t match risks—we’d be much better off being afraid of cheeseburgers, as pointed out in the show. Some people are afraid of modern risks. I know people afraid of cancer, for example.
That got me thinking about fearing software. Software is certainly among the most complex artifacts created by humans. Modern cars contain 100+ million lines of code. Nearly every day there is a story about a large corporation being hacked and of cyber-warfare between nations.
My question is serious—I really do wonder if people are genuinely afraid of software. I work in the area of software assurance, and while I take precautions against viruses, phishing attacks, etc., I don’t particularly worry about software failures, even when my life might depend on it. This is despite issues just last year like this and this in automotive software. I get to see a somewhat how the sausage is made, and in general, we only exercise a small fraction of the state-space of deployed software in validation and in actual usage. There are legitimate risks, but there seems to be very little fear.
Perhaps like a medical doctors stereotypically neglecting their own health, I don’t worry day-to-day about software assurance despite working in the field. But it seems nobody else really fears software, either.
In the podcast, the topic of polarizing claims, like global warming, is discussed. Outside of academic circles, one’s view on the risks of software are not so polarizing—your views on the topic won’t cause your friends or colleagues to disparage you (indeed, if anything, the main risk is likely boring others in discussing the topic!). I wonder just what the “global warming” of software might be in the future.
Stable Names in Haskell
November 26, 2011Stable names in GHC “are a way of performing fast (O(1)), not-quite-exact comparison between objects.” Andy Gill showed how to use them to extract the explicit graph from writing recursive functions in his Data.Reify package (and associated paper). It’s a great idea and very practical for embedded domain-specific languages—we’ve used the idea in Copilot to recover sharing.
However, consider this example, with three tests executed in GHCI.
For a function with type constraints, stable names fails to “realize” that we are pointing to the same object. As a couple of my colleagues pointed out, the cause is the dictionary being passed around causing new closures to be created. Simon Marlow noted that if you compile with -O, the explicit dictionaries get optimized away.
Here are the solutions I have to “fixing” the problem, in the context of a DSL:
- Tell your users that recursive expressions must be monomorphic—only “pure functions” over the expressions of your DSL can be polymorphic.
- Implement a check in your reifier to see how many stable names have been created—if some upper-bound is violated, then the user has created an infinite expression, the expression is extremely large (in which case the user should try to use some sharing mechanism, such as let-expressions inside the language), or we’ve hit a stable-names issue.
- Ensure your DSL programs are always compiled.
- Of course, you can always take another approach, like Template Haskell or not using recursion at the Haskell level; also check out Andy Gill’s paper for other solutions to the observable sharing problem.
I don’t see how to use (deep)seq to fix the problem, at least as it’s presented in the example above, but I’d be keen to know if there are other solutions.
Making your Ubuntu life better
June 18, 2011I’ve had a lot of trouble with Ubuntu 11.04 (Natty Narwhal) on a laptop (ThinkPad 420), and I’ve had problems including:
- Not being able to use dual monitors,
- Random logoffs.
I tentatively think I was able to solve them with two easy fixes:
- Turn off Unity. You can do this in the login screen.
- In the Update Manager, under “Settings”, check “Proposed updates”.
With the proposed updates, Unity may be working; try that at your own risk. With that, I have a reasonably stable system.
Meta-Programming and eDSLs
January 30, 2011I’ve been working on a domain-specific language that is embedded in Haskell (an “eDSL”) that essentially takes a set of Haskell stream (infinite list) equations and turns them into a real-time C program implementing the state-machine defined by the streams. It’s called Copilot, and in fact, it’s built on top of another Haskell eDSL called Atom,1 which actually does the heavy lifting in generating the C code.
For example, here’s the Fibonacci sequence in Copilot:
fib = do let f = varW64 "f" f .= [0,1] ++ f + (drop 1 f)
I’ve been writing Copilot libraries recently, and I’ve had the following realization about eDSLs and meta-programming (let me know if someone has had this idea already!). Many languages treat meta-programming as a second-class feature—I’m thinking of macros used by the C preprocessor, for example (it’s probably generous even to call the C preprocessor ‘meta-programming’). One reason why Lisp-like languages were exciting is that the language is a first-class datatype, so meta-programming is on par with programming. In my experience, you don’t think twice about meta-programming in Lisp. (Haskell is more like C in this regard—you do think twice before using Template Haskell.)
So languages generally treat meta-programming as either second-class or first-class. What’s interesting about eDSLs, I think, is that they treat meta-programming as first-class, but programming as second-class! This isn’t surprising, since an eDSL is a first-class datatype, but the language is not first-class—its host language is.
Practically, what this means is that you spend very little time actually writing eDSL programs but rather generating eDSL programs. It is natural to layer eDSLs on top of other eDSLs.
This is just how Copilot came about: I was writing various Atom programs and realized that for my purposes, I just needed a restricted set of behaviors provided by Atom that are naturally represented by stream equations (and make some other tasks, like writing an interpreter, easier).
But as soon as Copilot was written, we2 started writing libraries implementing past-time linear temporal logic (LTL) operators, bounded LTL operators, fault-tolerant voting algorithms, regular expressions, and so on.
You wouldn’t think about combining the intermediate languages of a compiler into the same program. The idea of a language is more fluid, more organic in the context of eDSLs, where now libraries can be quickly written and different levels can be easily combined.
1Tom Hawkins wrote Atom.
2Credit for Copilot also goes to Sebastian Niller, Robin Morisset, Alwyn Goodloe.
New Group: Functional Programming for Embedded Systems
May 30, 2010Are you interested in how functional programming can be leveraged to make embedded-systems programming easier and more reliable? You are not alone. For example, check out what’s been happening in just the past couple of years.
- Programming the Arduino micro-controller with Atom/Haskell.
- The Atom eDSL language itself.
- The Feldspar language for DSP (embedded in Haskell).
- Writing a DO178B-compliant compiler in Ocaml (presented at ICFP’09).
- … And others.
Now Tom Hawkins (designer of Atom) has started a Google group, fp-embedded, to discuss these issues. Please join and post your projects & questions!
An Apologia for Formal Methods
March 14, 2010In the January 2010 copy of IEEE Computer, David Parnas published an article, “Really Rethinking ‘Formal Methods’” (sorry, you’ll need an IEEE subscription or purchase the article to access it), with the following abstract:
We must question the assumptions underlying the well-known current formal software development methods to see why they have not been widely adopted and what should be changed.
I found some of the opinions therein to be antiquated, so I wrote a letter to the editor (free content!), which appears in the March 2010 edition. IEEE also published a response from David Parnas, which you can also access at the letter link above.
I’ll refrain from visiting this debate here, but please have a look at the letters, enjoy the controversy, and do not hesitate to leave a comment!
Writer’s Unblock
September 30, 2009I’ve recently got a few technical papers out the door involving Haskell, physical-layer protocols, SMT, security modeling, and run-time verification of embedded systems (phew!). One of the benefits of industrial research is getting your hands involved in a lot of different research projects.
- This paper is about using Haskell to model physical-layer protocols and using QuickCheck to test them. Physical-layer protocols are used to transmit bits from one clock-domain to another and are used in ethernet, credit card swipers, CD players, and so on. The gist of the paper is that even though Haskell is pure & lazy, it works great for modeling and testing real-time protocols and even for computing reliability statistics. I presented it at the Haskell Symposium in September ’09, which was associated with ICFP. (The talk video is online!) The paper is a short experience report—indeed, it is the only experience report that was accepted at the symposium. The Haskell Symposium was an entertaining and friendly environment for presenting.
- This paper actually precedes the Haskell paper, but it extends the results by describing how to formally verify physical-layer protocols using SMT solvers and k-induction (we use SRI’s SAL tool in this work). The paper is a journal article accepted at Formal Aspects of Computing. You’ll find at least two things interesting about this article: (1) For all the excitement about SMT, there don’t seem to be a lot of great examples demonstrating its efficacy—the problems described in this paper were (laboriously!) verified using theorem-provers by others previously, and our approach using SMT is much more automated. (2) We provide a nice general model of cross clock-domain circuits and particularly metastability.
So if you can verify physical-layer protocols, why model them in Haskell and QuickCheck them (as we did above)? There are at least two reasons. First, if you’re using SMT, then your timing constraints need to be linear inequalities to be decidable. For systems that with nonlinear constraints, QuickCheck might be your only recourse. Second, QuickCheck gives you concrete counterexamples and test-cases that you can use to test implementations (SMT solvers often return symbolic counterexamples).
- This paper describes a simple model for analyzing information flow in a system (where a “system” could be a program, a network, an OS, etc.). The main portion of the paper describes heuristics based on graph algorithms for deciding what sort of information flow policies you might want to enforce in your system. In general, there’s been a lot of work on analyzing access control policies but not so much work in figuring out what kind of policy you should have in the first place (if you know of such work, please tell me!). The paper isn’t deep, and it’s also preliminary insofar as I don’t describe building a complex system using the techniques. Still, there’s a small (Haskell) script available that implements the algorithms described; I’d love to see these analyses find their way into a tool to help system designers build secure systems.
- Finally, this report describes the field of run-time monitoring (or run-time verification) as it applies to safety-critical real-time embedded software. Run-time monitoring compliments formal verification since when a system is too complicated to verify a priori, it can be monitored at run-time to ensure it conforms to its specification. Not a lot of work has been done on monitoring software that’s hard real-time, distributed, or fault-tolerant—which ironically could benefit the most from run-time monitoring. The report should serve as a nice, gentle introduction. The report should be published soon as a NASA Contractor Report—the work was done under a NASA-sponsored project for which I’m the PI.
Don’t hesitate to give me feedback on any of these papers. Ok, time to fill up the queue again…
“Schrodinger’s Probability” for Error-Checking Codes
May 15, 2009In a previous post, I discussed the notion of Schrödinger CRCs, first described by Kevin Driscoll et al. in their paper Byzantine Fault Tolerance, from Theory to Reality. The basic idea is that error-detecting codes do not necessarily prevent two receivers from obtaining messages that are semantically different (i.e., different data) but syntactically valid (i.e., the CRC matches the respective data words received). The upshot is that even with CRCs, you can suffer Byzantine faults, with some probability.
… So what is that probability of a Schrödinger’s CRC? That’s the topic of this post—which cleans up a few of the ideas I presented earlier. I published a short paper on the topic, which I presented at Dependable Sensors and Networks, 2010, while Kevin Driscoll was in the audience! If you’d prefer to read the PDF or get the slides, they’re here. The simulation code (Haskell) is here.
N-Version Programming… For the nth Time
April 27, 2009Software contains faults. The question is how to cost-effectively reduce the number of faults. One approach that gained traction and then fell out of favor was N-version programming. The basic idea is simple: have developer teams implement a specification independent from one another. Then we can execute the programs concurrently and compare their results. If we have, say, three separate programs, we vote their results, and if one result disagrees with the others, we presume that program contained a software bug.
N-version programming rests on the assumption that software bugs in independently-implemented programs are random, statistically-uncorrelated events. Otherwise, multiple versions are not effective at detecting errors if the different versions are likely to suffer the same errors.
John Knight and Nancy Leveson famously debunked this assumption on which N-version programming rested in the “Knight-Leveson experiment” they published in 1986. In 1990, Knight and Leveson published a brief summary of the original experiment, as well as responses to subsequent criticisms made about it, in their paper, A Reply to the Criticisms of the Knight & Leveson Experiment.
The problem with N-version programming is subtle: it’s not that it provides zero improvement in reliability but that it provides significantly less improvement than is needed to make it cost-effective compared to other kinds of fault-tolerance (like architecture-level fault-tolerance). The problem is that even small probabilities of correlated faults lead to significant reductions in potential reliability improvements.
Lui Sha has a more recent (2001) IEEE Software article discussing N-version programming, taking into account that the software development cycle is finite: is it better to spend all your time and money on one reliable implementation or on three implementations that’ll be voted at runtime? His answer is almost always the former (even if we assume uncorrelated faults!).
But rather than N-versions of the same program, what about different programs compared at runtime? That’s the basic idea of runtime monitoring. In runtime monitoring, one program is the implementation and another is the specification; the implementation is checked against the specification at runtime. This is easier than checking before runtime (in which case you’d have to mathematically prove every possible execution satisfies the specification). As Sha points out in his article, the specification can be slow and simple. He gives the example of using the very simple Bubblesort as the runtime specification of the more complex Quicksort: if the Quicksort does its job correctly (in O(n log n), assuming a good pivot element), then checking its output (i.e., a hopefully properly sorted list) with Bubblesort will only take linear time (despite Bubble sort taking O(n2) in general).
The simple idea of simple monitors fascinates me. Of course, Bubblesort is not a full specification, though. Although Sha doesn’t suggest it, we’d probably like our monitor to compare the lengths of the input and output lists to ensure that the Quicksort implementation didn’t remove elements. And there’s still the possibility that the Quicksort implementation modifies elements, which is also unchecked by a Bubblesort monitor.
But instead of just checking the output, we could sort the same input with both Quickcheck and Bubblesort and compare the results. This is a “stronger” check insofar as different sorts would have to have exactly the same faults (e.g., not sorting, removing elements, changing elements) for an error not to be caught. The principal drawback is the latency of the slower Bubblesort check as compared to Quicksort. But sometimes, it may be ok to signal an error (shortly) after a result is provided.
Just like for N-version programming, we would like the faults in our monitor to be statistically uncorrelated with those in the monitored software. I am left wondering about the following questions:
- Is there research comparing the kinds of programming errors made in radically different paradigms, such as a Haskell and C? Are there any faults we can claim are statistically uncorrelated?
- Runtime monitoring itself is predicated on the belief that the implementations of different programs will fail in statistically independent ways, just like N-version programming is. While more plausible, does this assumption hold?
Programming Languages for Unpiloted Air Vehicles
April 20, 2009I recently presented a position paper at a workshop addressing software challenges for unpiloted air vehicles (UAVs). The paper is coauthored with Don Stewart and John Van Enk. From a software perspective, UAVs (and aircraft, in general) are fascinating. Modern aircraft are essentially computers that fly, with a pilot for backup. UAVs are computers that fly, without the backup.
Some day in the not-so-distant future, we may have UPS and FedEx cargo planes that are completely autonomous (it’ll be a while before people are comfortable with a pilot-less airplane). These planes will be in the commercial airspace and landing at commercial airports. Ultimately, UAVs must be transparent: an observer should not be able to discern whether an airplane is human or computer controlled by its behavior.
You won’t be surprised to know a lot of software is required to make this happen. To put things in perspective, Boeing’s 777 is said to contain over 2 million lines of newly-developed code; the Joint Strike Fighter aircraft is said to have over 5 million lines. Next-generation UAVs, with pilot AI, UAV-to-UAV and UAV-to-ground communications, and arbitrary other functionality, will have millions more lines of code. And the software must be correct.
In our paper, we argue that the only way to get a hold on the complexity of UAV software is through the use of domain-specific languages (DSLs). A good DSL relieves the application programmer from carrying out boilerplate activities and providers her with specific tools for her domain. We go on and advocate the need for lightweight DSLs (LwDSLs), also known as embedded DSLs. A LwDSL is one that is hosted in a mature, high-level language (like Haskell); it can be thought of as domain-specific libraries and domain-specific syntax. The big benefit of a LwDSL is that a new compiler and tools don’t need to be reinvented. Indeed, as we report in the paper, companies realizing the power of LwDSLs are quietly gaining a competitive advantage.
Safety-critical systems, like those on UAVs, combine multiple software subsystems. If each subsystem is programmed in its own LwDSL hosted in the same language, then it is easy to compose testing and validation across subsystem boundaries. (Only testing within each design-domain just won’t fly, pun intended.)
The “LwDSL approach” won’t magically make the problems of verifying life-critical software, but “raising the abstraction level” must be our mantra moving forward.


