CryptoMiniSat

New version: 1.2.11 (30/10/09)

CryptoMiniSat is a program developed from MiniSat, primarily to use in cryptography, though its advantages are probably useful for other fields as well. It has the following advantages compared to standard MiniSat:

These advancements are partially described in the poster for Eurocrypt 2009 available here, and a research paper on how these advancements were used in the field of cryptography will be presented at the SAT 2009 conference in Swansea, Wales. The authors of the paper are: Mate Soos, Karsten Nohl, and Claude Castelluccia. Questions specifically about CryptoMiniSat should be addressed to Mate Soos.

The program is available as a source (ver 1.2.11) or as a windows executable (ver 1.2.11). If you download the source, please read the INSTALL and the README file. An example DIMACS file that uses XOR-clauses, clause groups and variable names , and describes the Crypto-1 stream cipher is available here (it is also in the .tar.gz archive under "cryptominisat-1.2.11/build/"). A detailed description of the enhancements' usage is detailed below.

During development, MiniSat's version solely written in C has also been used, and the native XORs have been back-ported to it. No other advancement has been back-ported. It is available here.

Usage

The overview of the usage is printed by:
USAGE: ./cryptominisat [options] <input-file> <result-output-file>

  where input may be either in plain or gzipped DIMACS.

OPTIONS:

  -polarity-mode = {true,false,rnd}
  -decay         = <num> [ 0 - 1 ]
  -rnd-freq      = <num> [ 0 - 1 ]
  -verbosity     = {0,1,2}
  -proof-log     = Logs the proof into files 'proofN.dot', where N is the
                   restart number. The log can then be visualized using
                   the 'dot' program from the graphviz package
  -grouping      = Lets you group clauses, and customize the groups' names.
                   This helps when printing statistics
  -stats         = Computes and prints statistics during the search
  -randomize     = <seed> [0 - 2^32-1] Randomly permutates the clauses. The
                   seed is later also used for picking decision variables
  -restrict      = <num> [1 - varnum] when picking random variables to branch
                   on, pick one that in the 'num' most active vars useful
                   for cryptographic problems, where the question is the key,
                   which is usually small (e.g. 80 bits)
  -restarts       = <num> [1 - 2^32-1] No more than the given number of
                   restarts will be performed during search

XOR functions

One of the most challenging problems when trying to describe cryptographic primitives in SAT solvers is the XOR function that is used very often in cryptography. MiniSat (and most DPLL-based SAT solvers) only understands the Conjunctive Normal Form or CNF for short. However, to describe a XOR function in CNF you need to have 2^{n-1} clauses, where n is the size of your XOR function. To overcome this exponential explosion, XOR functions are cut up into manageable sizes (6-7) and then implemented in 2^6 clauses each. This is not only tedious, uses a lot of memory and slow, but also makes it hard to know what is happening while solving, as the statistics will be very strange.

In CryptoMiniSat, the DIMACS standard format augmented with the following extension: having an "x" in front of a line makes that line a XOR function. For instance, the line

x10 2 7 0

means that

v10 + v2 + v7 = true    (where "+" is the binary XOR)

If we wish to invert the xor-clause, it is sufficient to invert one of the variables:

x10 2 -7 0

has now the XOR is inverted (i.e. it is equated to false)

v10 + v2 + v7 = false     (where "+" is the binary XOR)

Naturally, if two variables are inverted, their inversions cancel each other out, and the XOR is equated to true. There is no need to cut up XOR clauses to smaller sizes, they can be of any length. Using this representation instead of the cut-and-convert-CNF will give you approximately 2x the speed, lots of memory savings and cleaner statistics.

Clause grouping and naming

Clause groups are useful because most of the time a single clause does not describe the function that we need to describe, e.g. a filter function. So, all clauses that describe that one function can now be put into a group and named, so that they will show up in the statistics and the search graph as one entity. Clause grouping is turned on by the "-grouping" switch to CryptoMiniSat. In the DIMACS file, clauses can be grouped and named by adding a line after the clause:

20 3 5 0
c g 2 interesting function

This will put the clause "20 3 5" into group 2 and name it "interesting function". You can put multiple clauses into the same group:

20 3 5 0
c g 2 interesting function
-8 9 12 0
c g 2 interesting function

Now both clauses "20 3 5" and "-8 9 12" are in the same clause group 2, which has the name "interesting function"

Clause groups have no effect on the solving, they have an effect only on the search statistics and the generated search graph.

Variable naming

Variables are good to name so that when they show up in the statistics or in the search graph, they can covey a meaning: it's much more useful to see that "state[148]" caused 800 propagations, than to see that variable 875 caused 800 propagations. Variables can be named by simply putting a line at the very end of you DIMACS file:

c v 875 state[148]

which will name variable 875 as "state[148]". Note that variable 875 should exist in your DIMACS file, though it's not required.

Statistics generation

Giving the "-stats" switch to CryptoMiniSat will make it generate statistics during the solving. It is best to save these statistics to a file like "./cryptominisat -grouping -stats satfile > mystats". The generated statistics will then be in the file "mystats". Statistics are generated for each restart of MiniSat. It is best to look at the statistics generated after at least 4-5 restarts, as MiniSat needs time to learn the best variable ordering, and until then the statistics are quite unuseful.

The statistics generated are the following:

The most useful statistics will be generated if the clauses are grouped and named, and variables also have names.

Search randomization

Giving the "-randomize=X" option to CryptoMiniSat, you can randomize the clause order and the internal pseudo-random seed of MiniSat. This is useful if you want to see how much time a specific instance takes on average for CryptoMiniSat to solve. Since other than the order of clauses and the pseudo-random seed, there is no randomization whatsoever of the working of CryptoMiniSat, using this option will randomize the running to its full extent. It is a common misconception that MiniSat generates a seed from the order of the clauses. This is not the case. MiniSat simply inserts the clauses into memory according to their order, and will propagate on the first one found - which will be different for different clause orderings.

Notes: (1) Clause randomization uses the idea from Daniel J. Bernstein, and has a proof by induction. (2) If the "-randomize=X" option is not given, the clause order is not randomized, and the pseudo-random seed is 0. (3) MiniSat's randomization has been replaced with Mersenne twister

Detailed solving process visualization

Generates search graphs into the folder "proofs/" (this folder must exist), if you give the "-proof-log" option to CryptoMiniSat. Always use "-grouping" and name your clause groups and variables to get visual clues in the graph. The generated graph can be analyzed with the boost library, or visualized with graphviz's "dot" program. The proof files generated are of the from "X-proofY.dot", where X is a random number, and Y runs from 0 to the number of restarts made by CryptoMiniSat. Number 0 is special, it shows the running of MiniSat's simplify() method (MiniSat's pre-parser). Since the search tree is larger for each run, I suggest you only visualize the 2nd or the 3rd restart - the rest would take too much time to visualize by the "dot" program in graphviz. To generate the graph, just execute "dot -Tsvg X-proofY.dot > mygraph.svg", and then open the "mygraph.svg" with a good svg editor - I suggest inkscape. You can also generate PDF files by giving the "-Tpdf" option to dot. An example search graph is is presented below:

CryptoMiniSat search graph with named variables and named clause groups

The graph starts with a circle on the top, marked "BEGIN". Then it usually makes some unit propagations, moving straight down. This is the simplify() method of MiniSat. Then the solver makes some guesses moving to the left. Each of the boxes will contain the propagated/guess variable with the sign it was guessed/propagated to. After moving to the left, it goes down the search tree, usually finding a conflict marked by a regular pentagon with the generated conflict clause inside . The pentagon contains the literals in the generated conflict clause. From the pentagon it goes back up to one of the guesses, reverses the guess and then the search starts again. The search tree must be read from top to bottom, and from left to right. If the last restart (the last "X-proofY.dot" file) was used, at the bottom right of the tree there is a circle with either a "MODEL" or a "UNSAT" written in it. If not the last "X-proofY.dot" file was used, then there was a restart, and no conclusion was reached. If a model was found, by simply reading the decisions leading to this point you can see the satisfying variable bounding.

Squares correspond to guesses/propagations, e.g. "-sr[148]" written in a square means that the variable named sr[148] was bounded to false, and "sr[148]" means variable named sr[148] was bounded to true. All decisions have exactly one arrow pointing to them, indicating why that decision was taken: due to propagation (marked with the name of the clause group causing it), assumption, or a guess (marked with "guess" and drawn as a dotted line). Conflict clauses have numbers starting from 1, and their numbers are always preceded with a "**" (double star). Conflicts (pentagons) also have an arrow pointing to them: on these arrows are written the clause causing them.

Debug mode

CryptoMiniSat contains code to have debug output to the console while running. To activate it, you simply need to uncomment the line "//#define VERBOSE_DEBUG" in the "Solver.C" file and recompile the program. CryptoMiniSat will then run much slower, but will generate large quantities of internal debug output that can be used to either debug the program, or to see how the solving evolves over time.

Code cleanup

MiniSat's code is quite difficult to understand sometimes. To alleviate this problem, CryptoMiniSat's code is more clean, and tries to use more C++ to hide the complexity from the programmer.