SmartCheck

I’ve been working on a Haskell library for testing Haskell programs I call SmartCheck. SmartCheck is focused on testing algebraic data and generalizing counterexamples found. Below is the README for SmartCheck, which I have located on GitHub (I haven’t put it on Hackage yet). The following is a high-level explanation that doesn’t go into details about the algorithms or implementation (that’s another post!).

I’d be interested in feedback on

Real-world examples to try SmartCheck on.
Whether there are other interesting ways to generalize counterexamples.
If there’s similar work out there I should know about (in addition to QuickCheck and SmallCheck.
Your experiences, if you try the library out.

Thanks!

Synopsis

SmartCheck is a smarter QuickCheck, a powerful testing library for Haskell. The purpose of SmartCheck is to help you more quickly get to the heart of a bug and to quickly discover each possible way that a property may fail.

SmartCheck is useful for debugging programs operating on algebraic datatypes. When a property is true, SmartCheck is just like QuickCheck (SmartCheck uses QuickCheck as a backend). When a property fails, SmartCheck kicks into gear. First, it attempts to find a minimal counterexample to the property is a robust, systematic way. (You do not need to define any custom shrink instances, like with QuickCheck, but if you do, those are used. SmartCheck usually can do much better than even custom shrink instances.) Second, once a minimal counterexample is found, SmartCheck then attempts to generalize the failed value d by replacing d‘s substructures with new values to make d', and QuickChecking each new d'. If for each new d' generated, the property also fails, we claim the property fails for any substructure replaced here (of course, this is true modulo the coverage of the tests).

SmartCheck executes in a real-eval-print loop. In each iteration, all values that have the “same shape” as the generalized value are removed from possible created tests. The loop can be iterated until a fixed-point is reached, and SmartCheck is not able to create any new values that fail the property.

A typical example

In the package there is an examples directory containing a number of examples. Let’s look at the simplest, Div0.hs.

> cd SmartCheck/examples
> ghci -Wall Div0.hs

Div0 defines a toy language containing constants (C), addition (A), and division (D):

data M = C Int
       | A M M
       | D M M
  deriving (Read, Show, Typeable, Generic)

Because SmartCheck performs data-generic operations using GHC.Generics we have to derive Typeable and Generic. To use GHC.Generics, you also need the following pragmas: and the single automatically-derived instance:

{-# LANGUAGE DeriveDataTypeable #-}
{-# LANGUAGE DeriveGeneric #-}

instance SubTypes M

Let’s say we have a little interpreter for the language that takes care to return Nothing if there is a division by 0:

eval :: M -> Maybe Int
eval (C i) = Just i
eval (A a b) = do
  i <- eval a 
  j <- eval b
  return $ i + j
eval (D a b) = 
  if eval b == Just 0 then Nothing 
    else do i <- eval a 
            j <- eval b
            return $ i `div` j

Now suppose we define a set of values of M such that they won’t result in division by 0. We might try the following:

divSubTerms :: M -> Bool
divSubTerms (C _)       = True
divSubTerms (D _ (C 0)) = False
divSubTerms (A m0 m1)   = divSubTerms m0 && divSubTerms m1
divSubTerms (D m0 m1)   = divSubTerms m0 && divSubTerms m1

So our property (tries) to state that so long as a value satisfies divSubTerms, then we won’t have division by 0 (can you spot the problem in divSubTerms?):

div_prop :: M -> Property
div_prop m = divSubTerms m ==> eval m /= Nothing

Assuming we’ve defined an Arbitrary instance for M (just like in QuickCheck—however, we just have to implement the arbitrary method; the shrink method is superfluous), we are ready to run SmartCheck.

divTest :: IO ()
divTest = smartCheck args div_prop
  where 
  args = scStdArgs { qcArgs   = stdArgs 
                   , treeShow = PrintString }

In this example, we won’t redefine any of QuickCheck’s standard arguments, but it’s certainly possible. the treeShow field tells SmartCheck whether you want generalized counterexamples shown in a tree format or printed as a long string (the default is the tree format).

Ok, let’s try it. First, SmartCheck just runs QuickCheck:

*Div0> divTest 
*** Failed! Falsifiable (after 7 tests):
D (D (D (A (C (-20)) (D (D (C 2) (C (-19))) (C (-8)))) (D (D (C (-23)) (C 32)) (C (-7)))) (A (A (C 2) (C 10)) (A (C (-2)) (C 13)))) (D (A (C 12) (C (-7))) (D (A (C (-29)) (C 19)) (C 30)))

Oh, that’s confusing, and for such a simple property and small datatype! SmartCheck takes the output from QuickCheck and tries systematic shrinking for the one failed test-case, kind of like SmallCheck might. We get the following reduced counterexample:

*** Smart Shrinking ... 
*** Smart-shrunk value:
D (C 0) (D (C 0) (C (-1)))

Ok, that’s some progress! Now SmartCheck attempts to generalize this (local) minimal counterexample. SmartCheck has two generalization steps that we’ll explain separately although SmartCheck combines their results in practice (you can turn off each kind of generalization in the flags). First, SmartCheck tries to generalize values in the shrunk counterexample. SmartCheck returns

*** Extrapolating values ...
*** Extrapolated value:
forall x0:

D x0 (D (C 0) (C (-1)))

Ahah! We see that for any possible subvalues x0, the above value fails. Our precondition divSubTerms did not account for the possibility of a non-terminal divisor evaluating to 0; we only pattern-matched on constants.

In addition, SmartCheck tries to do something I call constructor generalization. For a datatype with a finite number of constructors, the idea is to see if for each subvalue in the counterexample, there is are subvalues that also fail the property, using every possible constructor in the datatype. So for example, for our counterexample above

*** Extrapolating constructors ...
*** Extrapolated value:
forall C0:
  there exist arguments s.t.

D (C 0) (D C0 (C (-1)))

So in the hole C0, SmartCheck was able to build a value using each of the constructors C, A, and D (well, it already knew there was a value using C—C 0.

SmartCheck asks us if we want to continue:

Attempt to find a new counterexample? ('Enter' to continue; any character
then 'Enter' to quit.)

SmartCheck will omit any term that has the “same shape” as D (C 0) (D (C 0) (C (-1))) and try to find a new counterexample.

*** Failed! Falsifiable (after 9 tests):  
A (A (D (C (-20)) (A (C (-5)) (C (-32)))) (D (A (C 6) (C 19)) (A (C (-3)) (A (C (-16)) (C (-13)))))) (D (C 29) (D (C (-11)) (D (C 11) (C 23))))

*** Smart Shrinking ... 
*** Smart-shrunk value:
A (C (-1)) (D (A (C 1) (C 1)) (D (C 1) (C 2)))

*** Extrapolating values ...

*** Extrapolating Constructors ...

*** Extrapolated value:
forall values x0 x1:

A x1 (D x0 (D (C 1) (C 2)))

We find another counterexample; this time, the main constructor is addition.

We might ask SmartCheck to find another counterexample:

...

*** Extrapolating ...
*** Could not extrapolate a new value; done.

At this point, SmartCheck can’t find a newly-shaped counterexample. (This doesn’t mean there aren’t more—you can try to increase the standard arguments to QuickCheck to allow more failed test before giving up (maxDiscard) or increasing the size of tests (maxSize). Or you could simply just keep running the real-eval-print loop.)

This entry was posted on July 26, 2012 at 11:54 am and is filed under Haskell, Software, Verification. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

11 Responses to “SmartCheck”

John Regehr Says:
July 26, 2012 at 12:01 pm | Reply
This looks really cool.

When I saw the title I figured that SmartCheck used a solver to create inputs maximizing some coverage metric, like DART, Klee, and SAGE. Hasn’t anyone done that yet? If not, do you know why?
- Lee Pike Says:
  July 26, 2012 at 5:53 pm | Reply
  Yes, the name ‘SmartCheck’ may have been a little over zealous. :)
  
  I don’t think there’s been much work in concolic-like testing for functional languages. There is some recent work in this direction out of NEU. One challenge is the liberal use of higher-order functions in languages like Haskell or Racket. For example, consider the simple (but contrived) Haskell function:
  
  h :: Eq a => (a -> a -> a) -> [a] -> Int h _ [] = 1 h f ls = if head ls == foldl1 f ls then 0 else 1
  
  To test each branch requires symbolically reasoning about the input f, which is a function. (Note though that QuickCheck can generate random monomorphic functions for testing.)
  
  But I take it that your point is that by limiting ourselves to inputs like algebraic functions, a symbolic solver might be possible. That’s an interesting idea, but I don’t know of work in this direction.
  - John Regehr Says:
    August 10, 2012 at 1:27 pm
    Thanks for the links Lee and Ranjit.
    
    I guess there’s an interesting tension where functional languages are both easier and harder targets for concolic testing than are imperative languages.
    
    Wouldn’t some of these issues become easier if testing were done at the whole-program level, rather than at the unit level? In that case inputs are just bits…
- Ranjit Jhala Says:
  July 27, 2012 at 10:19 am | Reply
  Hi John, I believe Suresh Jagannathan is/was also working on this… Ranjit.
Derek Says:
July 26, 2012 at 6:50 pm | Reply
Is this likely to do a better job avoiding exponential runtime explosions when generating recursive data structures? A major problem I have with QuickCheck is using sized and co to prevent it from running forever when generating things like trees.
- Lee Pike Says:
  July 26, 2012 at 8:02 pm | Reply
  Derek,
  
  Sorry, this won’t help there. SmartCheck uses QuickCheck as a back-end to generate random values, so it still depends on you using sized to bound the size of the values generated.
  
  SmartCheck can help though to see the problem with a failed test, so you don’t need to rerun QuickCheck to generate multiple failing values, so it can cut down the test/debug/rewrite/test loop.
- Brent Says:
  July 29, 2012 at 10:51 am | Reply
  Take a look at gencheck: http://hackage.haskell.org/package/gencheck
mcandre Says:
July 27, 2012 at 12:36 pm | Reply
Interesting! I’m still trying to port QuickCheck to languages without lambda and introspection, such as Pascal. The hardest part is often figuring out how to do (apply function-to-test list-of-arguments) without triggering compiler errors, due to the dynamic nature of the argument types.

I managed a hack for C, not pretty, but it works.

http://www.yellosoft.us/quickcheck
Daniel Velkov Says:
July 31, 2012 at 4:27 pm | Reply
I tried using the derive package to get the Arbitrary instance for your example. When I run the code I get:
*** Failed! Falsifiable (after 14 tests):
D (C 22) (D (D (D (D (A (C (-28)) (C (-26))) (C (-30))) (C (-3))) (C (-8))) (D (A (C 2) (A (C (-23)) (A (C 26) (C (-20))))) (C (-8))))

*** Smart Shrinking …
*** Smart-shrunk value:
D (C 22) (D (C 0) (C (-1)))

*** Extrapolating values …
: memory allocation failed (requested 1048576 bytes)

The code I use is the same except for removing the Arbitrary instance and adding:
{-# LANGUAGE TemplateHaskell #-}

import Data.DeriveTH

$( derive makeArbitrary ”M )
- Lee Pike Says:
  July 31, 2012 at 6:45 pm | Reply
  Thanks, Daniel. I tried it myself, and I get the same result. Do you know what the definition of arbitrary that is generated by derive is?
  - Daniel Velkov Says:
    August 1, 2012 at 12:27 pm
    I use ‘ghc -fforce-recomp -ddump-splices ../examples/Div0.hs’ and it gives:
    
    instance Arbitrary M where
    arbitrary
    = do { x do { x1 do { x1 <- arbitrary;
    x2 do { x1 <- arbitrary;
    x2 error “FATAL ERROR: Arbitrary instance, logic bug” } }
    
    Yup, there’s no size restriction. I’ll look at the code although it’s TH.